HPCwire
 The global publication of record for High Performance Computing - LIVEwire Edition / November 10, 2004: Vol. 13, No. 45B

Previous Article   |  Table of Contents  |  

Features:

IBM TALKS BLUE GENE: JUST THE FIRST STEP IN A JOURNEY
by Tim Curns, Editor

IBM announced earlier this year that an IBM BlueGene/L supercomputer has surpassed NEC's Earth Simulator in Japan to become the world's most powerful supercomputer. In addition, at SC2004 this week, Blue Gene was officially named the fastest supercomputer in the world. HPCwire sat down with Dave Turek, IBM's VP of Deep Computing, and Tilak Agerwala, Vice President of Systems, to discuss what this announcement means for IBM, as well as the world of HPC.


HPCwire: Well, it's shaping up to be a historical year for IBM here at SC2004. Please tell us a bit about your presence here.

Dave Turek: I think that the notion of what SC is about for us is reflected in the demonstration of the reach of our broad portfolio. If you look in our booth, you see power based systems, you see Linux clusters, Opteron, Intel, PowerBlades, you see Blue Gene systems...really a portfolio of technologies that we think are appropriate to really cover properly the space of opportunity that exists here. We are firm believers in the notion that the marketplace is inherently heterogenous, that diverse applications and diverse kinds of customers requirements really map to a variety of solutions. The other major thing here which is not in our booth, is the effort we put into linking storage issues with classic server kinds of issues. In the StorCloud area, we've got 150 terabytes of storage set up. There will be benchmarks and tests run over the next two days. We're going to do very well with respect to our performance on that technology also.

Tilak Agerwala: And on Blue Gene, I think we're really delighted that we have won the number one position [of fastest computers in the world]. But I want to point out that this is just one milestone in a journey. We started three years ago, and there's a lot more to come. We are going to install 360 teraflops next year at Lawrence Livermore National Labs (LLNL). We already have plans in place to extend Blue Gene to the next level. And what I find really exciting is that our partnership with the Department of Energy has resulted in something that is extremely powerful, but extremely practical as well. And so, we are really moving into a new era now, I think. There is a lot of effort that is going to take place on the application of Blue Gene and understanding which things it will be really good for and which things it may not be so good for. There is a big focus really on working with customers at national labs to really understand the applicability of Blue Gene to a wide range of problems.

The combination of the power, and the fact that we get this power by radically cutting down on size and cost, is why it is so exciting and why I think a lot of people will be using this machine.

HPCwire: I spoke with someone this morning who suggested that SC was becoming more mainstream and that typically, the show always represented the "future" of computing. Do you feel that Blue Gene embodies the "future" of supercomputing?

TA: I believe so. If you look at what happens with supercomputing, and the whole supercomputing community, there is often a change in who is really at the top. And in a certain year, that's what it's all about. But in the next year, it's often about the application of these new machines. So I think, almost naturally, there will be a big focus on these two big machines at the top [Blue Gene/L, SGI/NASA/ Intel's Columbia] -- what was the track record in terms of usage, etc. I think we've gotten to a point now where there's such a significant improvement in cost/ performance that I feel like we are able to start seeing a real impact on science and business. I think Blue Gene will be part of that whole phenomenon.

HPCwire: Speaking of science, San Diego Supercomputer Center just announced that they will be the first university to purchase Blue Gene, right?

DT: The University of Edinburgh, in Scotland, will also take a system. But I have a perspective on this issue of mainstream. There is a misconception that has persisted over some number of years that supercomputing is this esoteric arena that only appeals to the privileged few in research and university environments. In our historical experience, going back even to the SP days, the proponents of our business was always driven by commercial deployments. I think what Tilak is talking about, in terms of not only the inherent efficiency of the Blue Gene design, coupled with the accommodation to the realities of space and power, is the possibility of dramatically opening the aperture of access to this kind of technology. Now the prices are coming down, that's great. That's what has driven Linux clusters over the last couple of years. But what the Linux cluster community has not dealt with is the issue of space and power. Those concepts tend to dominate.

And so, by virtue of what our research division has done and development organization at IBM, we've got an answer for that proposition. And all it does is simply make the technology accessible to a greater number of people.

HPCwire: Blue Gene seems to be popular for everything from its speed to even its packaging. What makes it different from other supercomputers?

TA: It's really a whole new approach to designing supercomputers. These days, we are running into technology limitations where performance is not just going to come just from frequency code. So we've got to find innovative ways of packaging things together, but also integrating a whole bunch of different technologies and Blue Gene does a very innovative job of that.

It starts with very low power processors, it builds a complete system-on-a- chip, then uses interconnects to connect these chips together, pushing the limits of scalability. Then some real innovations in system management allow you to manage all of this, which I think is extremely important. This whole focus on low power design, on integration at all levels, leads to a system that reduces cost, power, and size, but also one that's been balanced, and will apply to a large number of applications.

DT: The ratio of power from a Linux cluster, which is not even an exotic architecture like the Earth Simulator, to Blue Gene is 15:1. So even if they make significant improvements of 30-40%, you've got this 15:1 multiple you've got to overcome. It hasn't really reached the attention I feel it should at this point because the fact of the matter is the architectural approaches haven't, at this point, scaled to significant size where people have said, "look at the size of my electric bill." If you take the Earth Simulator from 2002 and extrapolate it forward to a petaflop style machine, you'd be looking at an electric bill of about $150 million a year. They wouldn't do that of course, and I think the SX-8 has reduced power consumption profile, but the point illustrates that power and the expense of power is beginning to dominate a lot of the issues that people have historically dealt with. And then you couple that with the space issues. The order of magnitude of space to cover Blue Gene at 360 teraflops is on the order of 100 sq. meters. The amount of square meters to cover something like the Earth Simulator is around 10,000 sq. meters. And you look at that and say, "Ok, where are my football fields? Where do I put this thing?" And if you begin to look at cost of ownership and operations, you see you've got build your own building, have your own generator, you're going to run electrical bills like we said, that constrains access to computing.

I think our strategy opens up access to computing and it gives rise to interesting possibilities that we haven't even anticipated yet.

HPCwire: Such as? Can you guess?

DT: What we've seen historically as prices come down, is that you get a blossoming out of different types of businesses who start to engage in efforts like this. So one of the things we've seen in the last year is this growth of interest in small to medium businesses to get their hands on this kind of computing. If you're a 20 person company and you can get access to 5 teraflops of computing inexpensively, well suddenly you can compete with the big guys. And so it changes the whole competitive profile in a variety of segments. It doesn't matter whether it's petroleum or life sciences or animation or digital media or financial services; it's an empowerment to smaller groups of people - - small companies, small departments, small divisions -- that by leveraging these design points and cost efficiency factors, they find themselves in a position of great competitiveness.

We have companies like this today, 20-30 person companies who leverage our technology and our On Demand centers, who are perfect candidates for technology like this.

HPCwire: Let's move on to speed as an issue. Please comment on the importance of the U.S. to have the "fastest" supercomputer. What does it really mean?

DT: Well the irony there is that we are an international company. Like most of our competitors, we are headquartered in the U.S. I think it has an impact of demonstrating the wherewithal of the American industry and economy to be innovative, creative, and to be able to do things like this. That goes a long way toward elevating people's attitudes. America is a "can-do" sort of country. I this reinforces the whole behavioral DNA of the country, and I think it does energize people quite dramatically.

TA: And just to underscore that point with respect to Blue Gene, I'm not sure the experiment could have been done anywhere else. Our researchers and scientists working with the application folks at DOE and Livermore, and just the expertise across the whole system, demonstrates the fact that you need expertise across the whole stack and be able to bring that expertise together in one place to attack the problem. That's the only way we've been able to make it to number one. You can't become number one by just attacking any single part of the problem.

DT: There's an implicit point in Tilak's comments about the aspects of our industry because as you look around, a lot of our competitors have abandoned microprocessor development activities, a lot have chosen to go to market more on a proposition of assembly of parts as opposed to innovation of parts. I think we've been, in the last four or five years, in this era of computing assembly. What this demonstrates is that such things do not extrapolate well for the future. We have to have research skills applied to these projects to make ourselves competitive.

There's a message here for the U.S. in general, but there's a real message for our industry in particular, in terms of the way we need to start thinking about how to compete. Blue Gene took us five years. You can't turn around tomorrow and say, "I want to do this." You've got to have the will, you've got to have the money, you've got to have the right kind of people. If you scan the industry, there are very few places where those exist together.

TA: There is this remarkable opportunity for innovation, for this practical supercomputer that we have to really make breakthroughs in science and business. This is not just a fancy invention, it's a highly innovative product that is going to drive breakthroughs.

HPCwire: So Blue Gene is not only fast, it's productive. What is the relationship between speed and productivity? Are they mutually inclusive?

DT: The easy answer it that it all depends. The world is sort of trained to think in terms of Moore's Law. That's been the predominant approach to how the PC business has behaved over time. So this notion of doubling of processors every 18 months has reflected itself in the way a lot of people thought about it. If you look at Blue Gene, you see that we're sort of in an era of hyper- productivity. If you compare Blue Gene today with what the top system was 5 years ago, the factor is 70. Moore's Law would have said: expect a factor of 4- 8. By next spring, the total will be a factor of 360 over 5 years. This presentation of speed really strikes at the heart of time to solution. Problems persist through time -- it doesn't matter if it is a logistics problem, oil discovery problem or drug design problem. The question is how fast can you get to the end of it. So ten years ago, if it took you three hundred days to do a reservoir simulation or something like that, and today you could do it in ten or fifteen or whatever, that's an important thing.

TA: When you get to certain levels of performance, and I would say cost performance, you can address problems that you couldn't address before. Dave mentioned drug design. This is one application that we're very focused. It by and large has to do with how proteins work. You understand how they work by modeling them. And typically the better the modeling, the better the understanding. We are running an application at our booth, on the 512-node Blue Gene, and nobody has ever applied that level of power. It's just too expensive for any average scientist to get hold of that level of power. But even on 512- node Blue Gene, we are able to provide levels of performance which has never been applied to these problems. Imagine what you could with 64,000, especially if it was accessible to a large number of people.

DT: Another example would be the field of animation. It doesn't let you finish an animated film in less amount of time, what it does is it enables the animators to make each individual frame that much richer. So the fidelity to reality becomes that much greater. So they're still going to have their cycle whenever they finish their animation, but the scenes will be richer, more robust, all sorts of things that we can't do today.

I think that's a phenomenon that cascades through everything we've talked about here, this notion of fidelity. It's not so much a matter of discovering oil, for instance, faster, but maybe what we'll do is refine our understanding of a particular field that we think is pumped out. Through greater fidelity, we can discover that there may be a lot more oil. This notion of fidelity and refinement of modeling I've been talking about cuts across everything.

HPCwire: Do you think that price/performance is a sufficient guideline for procuring high-end systems?

DT: I would amplify that. I said earlier today that I think we're in an era where price/performance is going to have to be tempered by price per watt or price per sq. meter. No one has a shortage of imagining how to use huge amounts of compute power; everybody has a shortage of imagination about where to put it and pay for it. The path we're on is attacking that second half of the problem. I think what you'll get in the coming 12 months is a deeper understanding of the market place at large. They've got to factor in these attributes of systems much more directly when they compare company A against company B.

TA: I would say that cost/performance is not just it. It is being able to provide a certain level of capability for certain types of applications at the right performance level. We can all lash a gazillion PCs together and possibly get pretty low cost/performance. That's not necessarily a capable machine, though. So it's really about capability at the right level of cost/performance.

HPCwire: Gentlemen, congratulations on the Blue Gene news. The whole world will be watching the new number one computer very closely.


Dave Turek is IBM's vice president of Deep Computing and Tilak Agerwala is vice president of Emerging Business at T.J. Watson Research Center. IBM's Blue Gene/L machine broke the record of the fastest supercomputer in the world, formerly held by NEC's Earth Simulator in Japan. Visit the IBM booth at Supercomputing 2004, booth #909.


Top of Page

Previous Article   |  Table of Contents  |