
Features:
IBM TALKS BLUE GENE: JUST THE FIRST STEP IN A JOURNEY
by Tim Curns, Editor
IBM announced earlier this year that an IBM BlueGene/L supercomputer has
surpassed NEC's Earth Simulator in Japan to become the world's most powerful
supercomputer. In addition, at SC2004 this week, Blue Gene was officially
named the fastest supercomputer in the world. HPCwire sat down with Dave
Turek, IBM's VP of Deep Computing, and Tilak Agerwala, Vice President of Systems, to discuss what this announcement means for IBM, as well as the
world of HPC.
HPCwire: Well, it's shaping up to be a historical year for IBM here at SC2004.
Please tell us a bit about your presence here.
Dave Turek: I think that the notion of what SC is about for us is reflected in
the demonstration of the reach of our broad portfolio. If you look in our
booth, you see power based systems, you see Linux clusters, Opteron, Intel,
PowerBlades, you see Blue Gene systems...really a portfolio of technologies
that we think are appropriate to really cover properly the space of
opportunity that exists here. We are firm believers in the notion that the
marketplace is inherently heterogenous, that diverse applications and diverse
kinds of customers requirements really map to a variety of solutions. The
other major thing here which is not in our booth, is the effort we put into
linking storage issues with classic server kinds of issues. In the StorCloud
area, we've got 150 terabytes of storage set up. There will be benchmarks and
tests run over the next two days. We're going to do very well with respect to
our performance on that technology also.
Tilak Agerwala: And on Blue Gene, I think we're really delighted that we have
won the number one position [of fastest computers in the world]. But I want to
point out that this is just one milestone in a journey. We started three years
ago, and there's a lot more to come. We are going to install 360 teraflops
next year at Lawrence Livermore National Labs (LLNL). We already have plans in
place to extend Blue Gene to the next level. And what I find really exciting
is that our partnership with the Department of Energy has resulted in
something that is extremely powerful, but extremely practical as well. And so,
we are really moving into a new era now, I think. There is a lot of effort
that is going to take place on the application of Blue Gene and understanding
which things it will be really good for and which things it may not be so good
for. There is a big focus really on working with customers at national labs to
really understand the applicability of Blue Gene to a wide range of problems.
The combination of the power, and the fact that we get this power by radically
cutting down on size and cost, is why it is so exciting and why I think a lot
of people will be using this machine.
HPCwire: I spoke with someone this morning who suggested that SC was becoming
more mainstream and that typically, the show always represented the "future"
of computing. Do you feel that Blue Gene embodies the "future" of
supercomputing?
TA: I believe so. If you look at what happens with supercomputing, and the
whole supercomputing community, there is often a change in who is really at
the top. And in a certain year, that's what it's all about. But in the next
year, it's often about the application of these new machines. So I think,
almost naturally, there will be a big focus on these two big machines at the
top [Blue Gene/L, SGI/NASA/ Intel's Columbia] -- what was the track record in
terms of usage, etc. I think we've gotten to a point now where there's such a
significant improvement in cost/ performance that I feel like we are able to
start seeing a real impact on science and business. I think Blue Gene will be
part of that whole phenomenon.
HPCwire: Speaking of science, San Diego Supercomputer Center just announced
that they will be the first university to purchase Blue Gene, right?
DT: The University of Edinburgh, in Scotland, will also take a system. But I
have a perspective on this issue of mainstream. There is a misconception that
has persisted over some number of years that supercomputing is this esoteric
arena that only appeals to the privileged few in research and university
environments. In our historical experience, going back even to the SP days,
the proponents of our business was always driven by commercial deployments.
I think what Tilak is talking about, in terms of not only the inherent
efficiency of the Blue Gene design, coupled with the accommodation to the
realities of space and power, is the possibility of dramatically opening the
aperture of access to this kind of technology. Now the prices are coming down,
that's great. That's what has driven Linux clusters over the last couple of
years. But what the Linux cluster community has not dealt with is the issue of
space and power. Those concepts tend to dominate.
And so, by virtue of what our research division has done and development
organization at IBM, we've got an answer for that proposition. And all it does
is simply make the technology accessible to a greater number of people.
HPCwire: Blue Gene seems to be popular for everything from its speed to even
its packaging. What makes it different from other supercomputers?
TA: It's really a whole new approach to designing supercomputers. These days,
we are running into technology limitations where performance is not just going
to come just from frequency code. So we've got to find innovative ways of
packaging things together, but also integrating a whole bunch of different
technologies and Blue Gene does a very innovative job of that.
It starts with very low power processors, it builds a complete system-on-a-
chip, then uses interconnects to connect these chips together, pushing the
limits of scalability. Then some real innovations in system management allow
you to manage all of this, which I think is extremely important. This whole
focus on low power design, on integration at all levels, leads to a system
that reduces cost, power, and size, but also one that's been balanced, and
will apply to a large number of applications.
DT: The ratio of power from a Linux cluster, which is not even an exotic
architecture like the Earth Simulator, to Blue Gene is 15:1. So even if they
make significant improvements of 30-40%, you've got this 15:1 multiple you've
got to overcome. It hasn't really reached the attention I feel it should at
this point because the fact of the matter is the architectural approaches
haven't, at this point, scaled to significant size where people have said,
"look at the size of my electric bill." If you take the Earth Simulator from
2002 and extrapolate it forward to a petaflop style machine, you'd be looking
at an electric bill of about $150 million a year. They wouldn't do that of
course, and I think the SX-8 has reduced power consumption profile, but the
point illustrates that power and the expense of power is beginning to dominate
a lot of the issues that people have historically dealt with. And then you
couple that with the space issues. The order of magnitude of space to cover
Blue Gene at 360 teraflops is on the order of 100 sq. meters. The amount of
square meters to cover something like the Earth Simulator is around 10,000 sq.
meters. And you look at that and say, "Ok, where are my football fields? Where
do I put this thing?" And if you begin to look at cost of ownership and
operations, you see you've got build your own building, have your own
generator, you're going to run electrical bills like we said, that constrains
access to computing.
I think our strategy opens up access to computing and it gives rise to
interesting possibilities that we haven't even anticipated yet.
HPCwire: Such as? Can you guess?
DT: What we've seen historically as prices come down, is that you get a
blossoming out of different types of businesses who start to engage in efforts
like this. So one of the things we've seen in the last year is this growth of
interest in small to medium businesses to get their hands on this kind of
computing. If you're a 20 person company and you can get access to 5 teraflops
of computing inexpensively, well suddenly you can compete with the big guys.
And so it changes the whole competitive profile in a variety of segments. It
doesn't matter whether it's petroleum or life sciences or animation or digital
media or financial services; it's an empowerment to smaller groups of people -
- small companies, small departments, small divisions -- that by leveraging
these design points and cost efficiency factors, they find themselves in a
position of great competitiveness.
We have companies like this today, 20-30 person companies who leverage our
technology and our On Demand centers, who are perfect candidates for
technology like this.
HPCwire: Let's move on to speed as an issue. Please comment on the importance
of the U.S. to have the "fastest" supercomputer. What does it really mean?
DT: Well the irony there is that we are an international company. Like most of
our competitors, we are headquartered in the U.S. I think it has an impact of
demonstrating the wherewithal of the American industry and economy to be
innovative, creative, and to be able to do things like this. That goes a long
way toward elevating people's attitudes. America is a "can-do" sort of
country. I this reinforces the whole behavioral DNA of the country, and I
think it does energize people quite dramatically.
TA: And just to underscore that point with respect to Blue Gene, I'm not sure
the experiment could have been done anywhere else. Our researchers and
scientists working with the application folks at DOE and Livermore, and just
the expertise across the whole system, demonstrates the fact that you need
expertise across the whole stack and be able to bring that expertise together
in one place to attack the problem. That's the only way we've been able to
make it to number one. You can't become number one by just attacking any
single part of the problem.
DT: There's an implicit point in Tilak's comments about the aspects of our
industry because as you look around, a lot of our competitors have abandoned
microprocessor development activities, a lot have chosen to go to market more
on a proposition of assembly of parts as opposed to innovation of parts. I
think we've been, in the last four or five years, in this era of computing
assembly. What this demonstrates is that such things do not extrapolate well
for the future. We have to have research skills applied to these projects to
make ourselves competitive.
There's a message here for the U.S. in general, but there's a real message for
our industry in particular, in terms of the way we need to start thinking
about how to compete. Blue Gene took us five years. You can't turn around
tomorrow and say, "I want to do this." You've got to have the will, you've got
to have the money, you've got to have the right kind of people. If you scan
the industry, there are very few places where those exist together.
TA: There is this remarkable opportunity for innovation, for this practical
supercomputer that we have to really make breakthroughs in science and
business. This is not just a fancy invention, it's a highly innovative product
that is going to drive breakthroughs.
HPCwire: So Blue Gene is not only fast, it's productive. What is the
relationship between speed and productivity? Are they mutually inclusive?
DT: The easy answer it that it all depends. The world is sort of trained to
think in terms of Moore's Law. That's been the predominant approach to how the
PC business has behaved over time. So this notion of doubling of processors
every 18 months has reflected itself in the way a lot of people thought about
it. If you look at Blue Gene, you see that we're sort of in an era of hyper-
productivity. If you compare Blue Gene today with what the top system was 5
years ago, the factor is 70. Moore's Law would have said: expect a factor of
4- 8. By next spring, the total will be a factor of 360 over 5 years. This
presentation of speed really strikes at the heart of time to solution.
Problems persist through time -- it doesn't matter if it is a logistics
problem, oil discovery problem or drug design problem. The question is how
fast can you get to the end of it. So ten years ago, if it took you three
hundred days to do a reservoir simulation or something like that, and today
you could do it in ten or fifteen or whatever, that's an important thing.
TA: When you get to certain levels of performance, and I would say cost
performance, you can address problems that you couldn't address before. Dave
mentioned drug design. This is one application that we're very focused. It by
and large has to do with how proteins work. You understand how they work by
modeling them. And typically the better the modeling, the better the
understanding. We are running an application at our booth, on the 512-node
Blue Gene, and nobody has ever applied that level of power. It's just too
expensive for any average scientist to get hold of that level of power. But
even on 512- node Blue Gene, we are able to provide levels of performance
which has never been applied to these problems. Imagine what you could with
64,000, especially if it was accessible to a large number of people.
DT: Another example would be the field of animation. It doesn't let you finish
an animated film in less amount of time, what it does is it enables the
animators to make each individual frame that much richer. So the fidelity to
reality becomes that much greater. So they're still going to have their cycle
whenever they finish their animation, but the scenes will be richer, more
robust, all sorts of things that we can't do today.
I think that's a phenomenon that cascades through everything we've talked
about here, this notion of fidelity. It's not so much a matter of discovering
oil, for instance, faster, but maybe what we'll do is refine our understanding
of a particular field that we think is pumped out. Through greater fidelity,
we can discover that there may be a lot more oil. This notion of fidelity and
refinement of modeling I've been talking about cuts across everything.
HPCwire: Do you think that price/performance is a sufficient guideline for
procuring high-end systems?
DT: I would amplify that. I said earlier today that I think we're in an era
where price/performance is going to have to be tempered by price per watt or
price per sq. meter. No one has a shortage of imagining how to use huge
amounts of compute power; everybody has a shortage of imagination about where
to put it and pay for it. The path we're on is attacking that second half of
the problem. I think what you'll get in the coming 12 months is a deeper
understanding of the market place at large. They've got to factor in these
attributes of systems much more directly when they compare company A against
company B.
TA: I would say that cost/performance is not just it. It is being able to
provide a certain level of capability for certain types of applications at the
right performance level. We can all lash a gazillion PCs together and possibly
get pretty low cost/performance. That's not necessarily a capable machine,
though. So it's really about capability at the right level of
cost/performance.
HPCwire: Gentlemen, congratulations on the Blue Gene news. The whole world
will be watching the new number one computer very closely.
Dave Turek is IBM's vice president of Deep Computing and Tilak Agerwala is
vice president of Emerging Business at T.J. Watson Research Center. IBM's Blue
Gene/L machine broke the record of the fastest supercomputer in the world,
formerly held by NEC's Earth Simulator in Japan. Visit the IBM booth at
Supercomputing 2004, booth #909.
|