
Features:
THE INSPIRATION BEHIND COLUMBIA HOPED TO CHANGE THE WORLD
by Tim Curns, Editor
NASA's 10,240-processor Columbia supercomputer, built from 20 SGI Altix
systems, seized the second position in the latest roster of the world's top
supercomputers. Columbia is the world's fastest system to be based on
industry standard Intel Itanium 2 processors and the Linux operating system.
But what makes Columbia really important? HPCwire sat down with NASA's Walter
Brooks and SGI's Jeff Greenwald to find out.
HPCwire: Congratulations on your Columbia work and subsequent recognition
here at SC2004. Walt, could you please just describe your presence here and
your feelings on all this news?
Walter Brooks: It's been a real exciting time at Supercomputing. NASA has
been sort of languishing in the wings for the last ten years in high-end
computing, and you definitely feel the difference in the energy level right
now bringing Columbia into the scene. In our booth, we're displaying a
variety of technologies. Columbia is there, and it's the engine at the middle
of what we call an integrated numerical simulation environment. We really
believe, with the end customer in mind, that it has got be about getting that
person connected, getting their data situated, getting their codes ported and
scaled, verified and validated, and then getting these gigantic data sets,
literally terabyte data sets, rendered in really powered visualization
systems.
If you go to the booth, you not only seen Columbia highlighted, which we are
quite proud of, but you see a whole collection of scientists and users; it's
mostly scientists in our booth, not computer scientists. I think that's one
of the things that is a focus for us, and I think SGI makes a really good
partner with us -- if you listen to Bob Bishop speak, he's constantly talking
about how they are pretty single-minded as a company. So they relate to
scientists, they have a lot of scientists who relate to us and talk to us.
Design is really a reflected interest in big problems, I mean, they are able
to do a broad class of problems. But Columbia now is the very limit of what's
been done with these technologies...the SGI and Voltaire technologies. We've
taken it to these 10,000 processors and those processors can be used in a
variety of ways, they can be used as a supercluster of 20 powerful 512p
supercomputers in their own right. The biggest one we had in NASA just a few
short months ago was the three teraflop 512 and we thought that was just
wonderful. Now there's 20 of them. We also just coupled four of them together
as a 2,048, with shared memory, and we've got a stack of users who are ready
now to expand their codes and run the 2000 processors, and it won't be too
long until we can maybe couple the system together to do highly coupled
processing. We can already do the embarassingly parallel problems, we can do
molecular dynamics, we can do comp chemistry problems at the two, three, ten
thousand goal; we haven't done those yet, but we expect to do those soon.
There are already people proposing projects where we would do ensemble type
analysis, where we would launch codes into all ten thousands of those
processors to look at design space problems.
So there's a lot of energy. My team is totally excited. Behind all of this,
as amazing as our system is...number two on the list, 50 teraflops -- we've
been waiting to bust through 50 teraflops on Linpack -- deploying this in 120
days just changes the whole landscape. When you can use this sweet spot
between commodity processors in a very facile environment, where people have
their code running in a week or day or two, and running very big problems,
that's great. There's still a place in the market for a variety of
architectures, some of them customized to address a few problems. But we
feel that if you go to the booth, you'll see earth science, you'll see
nanotechnology, you'll see aeronautic vehicles, you'll see a really broad
class. That's the joy of being at NASA; we really have a diverse mission set.
If you sit in a sweet spot, like we do in computing, you now become a magnet
for all these technologies and scientists from the best all over the world.
NASA already has some right within our own walls, but now we're drawing
people from all over the world, you get to rub shoulders with them, and help
them make discoveries.
At the national level, we have a vision of trying to show the impact of these
machines. and then demonstrating that, getting the nation excited -- as
excited as we are about Hubble space telescope, about supercolliders, about
some of the major facilities we've built -- getting them excited that these
new numerical simulation systems like this are the engines that propel that
kind of science. If we can start doing that at national level -- Thomas
Zachariah, down at Oak Ridge National Lab, will probably fairly soon
coordinate a call nation-wide to use these systems as leadership systems.
Part of our 2,048 is considered unique in the world, so we want to make that
broadly available. This will create momentum to drive these companies to push
their technology even further.
HPCwire: So do you think this will drive supercomputing into the mainstream?
WB: I hope so!
Jeff Greenwald: Well, ten years ago the then supercomputer was today what you
can get on your laptop. So is this going to become mainstream? The technology
cascades down, but the vision today of taking the power and the abilities of
these large single system images and tying them to the real problems in our
government, in our defense, in our exploration and our research is what
people like Walt and the NASA folks have done. They've tied the vision to
enabling tools to allow them achieve their objectives.
WB: Everytime we make a jump like this, we stop simplifying. We add
complexity and we get closer to the real world. But, I mean, one person's
supercomputer is another person's front end. What we used to call a
supercomputer a couple months ago is kind of what you get in to now to get in
to Columbia. So it's a pretty big spectrum, for guys who have never used 64
processors before -- 64, 128 processor supercomputer is a challenge. But I
think when you've been there, there will be a few of us. The government still
has a role to play. If you want to maintain this nationally, the government
is the flagship that pushes that upper boundary. We've got the top two
systems on there, so I guess we did something right.
But to me, the most exciting thing, and what people ask about the most, is
how we integrated that technology so fast. The technology itself is a part of
that, but...
JG: I'd like to comment on that. Columbia was 10% technology, 90%
inspiration. The technology was maybe a compelling piece, but the work, the
vision, the communication, the midnight hours, the justification...that was
inspiration. That was driven from start to finish by Walt and his team, with
a lot of partners coming on because of the importance of this.
HPCwire: Well that brings me to ask how this all synched up for both
organizations?
JG: Well, there's a 20 year history with SGI. But forget this computer stuff,
forget the technology, forget the complexity, and all this selling stuff. I
want to talk to you now as an American. This is really about three really
important things that have to do with you and your kids and your grandkids.
Let me tell you what they are.
The first one is if there is life out there in the universe. NASA's mission
is to understand if we're alone in the universe. This computer will enable us
to better understand what's going on out there -- communicating,
understanding light, understanding if there's radiation, how the Earth is
expanding, and how the universe is expanding -- that's one.
The second thing it's going to understand meteorology and climate changes on
the Earth. Hurrican path projections, weather temperature changes in the
ocean, etc. That affects people's lives and affects whether you do or don't
evacuate people in Florida when there's a hurrican and it's 600 miles away.
This really affects real people's lives.
Finally, it's about today, return to flight and the space shuttle -- how you
safely make the committment to your President, your God, and to your
astronauts that you can protect them and return them to Earth safely.
So the technology has been a compromise up until today. Finally, there is an
architecture which is bigger than the computational problems required in
order to solve those issues. It's awesome, but it's not unusual that NASA is
in the forefront. Today, the U.S. government, the Department of Energy, NASA,
LLNL, LANL, all of these world class institutions -- and I even extend it to
world class universities and research universities like MIT and Harvard and
NCSA in Illinois -- these organizations see it as such a critical part of
what they intend to deliver over the next 100 years to your kids and my
grandkids, etc. These enabling tools will enable them to do faster discovery
and insight.
That's why we've recently doubled. Last quarter, we doubled the number of
Altix systems we've installed. Around the world, they see this technology as
a way, not to just run Linpack benchmarks, but as a way to really solve what
their institutions are all about.
HPCwire: So how important is a list to say who is fastest, when Columbia is
productively engaging its users on a global scale?
JG: We as a company value the importance of a quantitative metric for
assessing how you're doing as a company. To us, we have a long history of
doing one single thing -- that's providing real science tools to our customer
base. There is a sense of pride, not in the fact that we've sold 1,000 Altix
systems, but that we are making a difference to the world in which we live.
Let's just look at the last year and a half in science. Two discoveries that
have been enabled because of the technology in the world: one, the universe
is expanding at an accelerating rate and two, that there was water on Mars at
a previously. Those two discoveries are going to affect science, the way we
look at the world, and how we perceive how the Earth was created, how the
universe was created, and how we fit in it.
HPCwire: I've heard some people express their discontent with government
initiatives like the High-End Computing Revitalization Task Force (HECRTF),
citing a certain amount of stalling on the government's part to make HPC a
top priority. Do you feel the government is doing enough to advance science,
then?
JG: I'd like to reference Craig Barrett, the chairman of Intel corporation.
He spoke at the Gartner conference in Florida two weeks ago. He asked why we,
as a nation, are spending $5 billion dollars investing in science and
technology, when we're spending $25 billion in farm subsidies and
agricultural supports. His point was that we, as a country, need to re-assess
our priorities relative to our investment in these key science enabling
tools. We are not a nation of engineers, mathematicians, astronomers, and DNA
life sciences researchers...and maybe over the next 100 years, we should be.
HPCwire: Does project Columbia really propel us in that direction?
JG: I think Columbia is one of about six really key initiatives that have
been enabled by the vision of the people in the U.S. government to do exactly
that. But they're not doing it for the image and/or political rhetoric of
just investing in it, they're doing it so that they can deliver real value,
to real citizens, and make a real difference in mankind. That's why I say
this is 10% technology and 90% inspiration.
Let me put it on a personal level. You are a resident of the state of
Florida. You've gone through three hurricanes that have destroyed millions of
homes and done damage to hundreds of miles of coastline, and there's another
one coming. The tools in the world that were here three months ago were able
to do hurricane path projections to a 300 hundred kilometer range. Today, the
improvement is able to get it to within 120 kilometer spread. That means for
residents, a 180 kilometer wide spread will not have to be evacuated from
their homes. That's a very real value to real people. That is a monstrous
step ahead in safety, in communication, and also buys more people time to
evacuate.
HPCwire: Is there a productivity crisis in HPC right now? Do you think
computers, in general, are actually accomplishing things or just boasting
record speeds?
JG: What this conference is all about is tying real value to the technology.
You can walk through here and see literally hundreds of research institutes,
universities, life sciences companies, development labs, countries and
collaborative initiatives that involve governments, manufacturers, private
enterprise, etc. and they are NOT taking the technologies and fitting
problems into that technology. They are doing the reverse. They are
understanding the real problems and goals of the world and then architecting
the technology to solve those problems. It would be a crisis if we were
taking today's tools and trying to retro-fit them to selected problems. What
is actually a wonderful opportunity, is the fact that there is a flexibility,
an architecture and now open systems, specifically, that now enable people to
tie the technology to very specific problems.
HPCwire: Thanks, Jeff, I know that you have to run. Thanks to both of you for
speaking with us a bit about project Columbia and the opportunities it
affords the HPC community.
As chief of the NASA Advanced Supercomputing (NAS) Division, Walt Brooks
oversees the entire gamut of high performance computing work done within the
division, and is working to transform the vision, mission, and direction for
NAS.
Jeff Greenwald is the Senior Director of Project Management & Marketing
Server & Platform Engineering for SGI.
|