HPCwire
 The global publication of record for High Performance Computing - LIVEwire Edition / November 11, 2004: Vol. 13, No. 45C

  |  Table of Contents  |  

Features:

THE INSPIRATION BEHIND COLUMBIA HOPED TO CHANGE THE WORLD
by Tim Curns, Editor

NASA's 10,240-processor Columbia supercomputer, built from 20 SGI Altix systems, seized the second position in the latest roster of the world's top supercomputers. Columbia is the world's fastest system to be based on industry standard Intel Itanium 2 processors and the Linux operating system. But what makes Columbia really important? HPCwire sat down with NASA's Walter Brooks and SGI's Jeff Greenwald to find out.


HPCwire: Congratulations on your Columbia work and subsequent recognition here at SC2004. Walt, could you please just describe your presence here and your feelings on all this news?

Walter Brooks: It's been a real exciting time at Supercomputing. NASA has been sort of languishing in the wings for the last ten years in high-end computing, and you definitely feel the difference in the energy level right now bringing Columbia into the scene. In our booth, we're displaying a variety of technologies. Columbia is there, and it's the engine at the middle of what we call an integrated numerical simulation environment. We really believe, with the end customer in mind, that it has got be about getting that person connected, getting their data situated, getting their codes ported and scaled, verified and validated, and then getting these gigantic data sets, literally terabyte data sets, rendered in really powered visualization systems.

If you go to the booth, you not only seen Columbia highlighted, which we are quite proud of, but you see a whole collection of scientists and users; it's mostly scientists in our booth, not computer scientists. I think that's one of the things that is a focus for us, and I think SGI makes a really good partner with us -- if you listen to Bob Bishop speak, he's constantly talking about how they are pretty single-minded as a company. So they relate to scientists, they have a lot of scientists who relate to us and talk to us. Design is really a reflected interest in big problems, I mean, they are able to do a broad class of problems. But Columbia now is the very limit of what's been done with these technologies...the SGI and Voltaire technologies. We've taken it to these 10,000 processors and those processors can be used in a variety of ways, they can be used as a supercluster of 20 powerful 512p supercomputers in their own right. The biggest one we had in NASA just a few short months ago was the three teraflop 512 and we thought that was just wonderful. Now there's 20 of them. We also just coupled four of them together as a 2,048, with shared memory, and we've got a stack of users who are ready now to expand their codes and run the 2000 processors, and it won't be too long until we can maybe couple the system together to do highly coupled processing. We can already do the embarassingly parallel problems, we can do molecular dynamics, we can do comp chemistry problems at the two, three, ten thousand goal; we haven't done those yet, but we expect to do those soon. There are already people proposing projects where we would do ensemble type analysis, where we would launch codes into all ten thousands of those processors to look at design space problems.

So there's a lot of energy. My team is totally excited. Behind all of this, as amazing as our system is...number two on the list, 50 teraflops -- we've been waiting to bust through 50 teraflops on Linpack -- deploying this in 120 days just changes the whole landscape. When you can use this sweet spot between commodity processors in a very facile environment, where people have their code running in a week or day or two, and running very big problems, that's great. There's still a place in the market for a variety of architectures, some of them customized to address a few problems. But we feel that if you go to the booth, you'll see earth science, you'll see nanotechnology, you'll see aeronautic vehicles, you'll see a really broad class. That's the joy of being at NASA; we really have a diverse mission set. If you sit in a sweet spot, like we do in computing, you now become a magnet for all these technologies and scientists from the best all over the world. NASA already has some right within our own walls, but now we're drawing people from all over the world, you get to rub shoulders with them, and help them make discoveries.

At the national level, we have a vision of trying to show the impact of these machines. and then demonstrating that, getting the nation excited -- as excited as we are about Hubble space telescope, about supercolliders, about some of the major facilities we've built -- getting them excited that these new numerical simulation systems like this are the engines that propel that kind of science. If we can start doing that at national level -- Thomas Zachariah, down at Oak Ridge National Lab, will probably fairly soon coordinate a call nation-wide to use these systems as leadership systems. Part of our 2,048 is considered unique in the world, so we want to make that broadly available. This will create momentum to drive these companies to push their technology even further.

HPCwire: So do you think this will drive supercomputing into the mainstream?

WB: I hope so!

Jeff Greenwald: Well, ten years ago the then supercomputer was today what you can get on your laptop. So is this going to become mainstream? The technology cascades down, but the vision today of taking the power and the abilities of these large single system images and tying them to the real problems in our government, in our defense, in our exploration and our research is what people like Walt and the NASA folks have done. They've tied the vision to enabling tools to allow them achieve their objectives.

WB: Everytime we make a jump like this, we stop simplifying. We add complexity and we get closer to the real world. But, I mean, one person's supercomputer is another person's front end. What we used to call a supercomputer a couple months ago is kind of what you get in to now to get in to Columbia. So it's a pretty big spectrum, for guys who have never used 64 processors before -- 64, 128 processor supercomputer is a challenge. But I think when you've been there, there will be a few of us. The government still has a role to play. If you want to maintain this nationally, the government is the flagship that pushes that upper boundary. We've got the top two systems on there, so I guess we did something right.

But to me, the most exciting thing, and what people ask about the most, is how we integrated that technology so fast. The technology itself is a part of that, but...

JG: I'd like to comment on that. Columbia was 10% technology, 90% inspiration. The technology was maybe a compelling piece, but the work, the vision, the communication, the midnight hours, the justification...that was inspiration. That was driven from start to finish by Walt and his team, with a lot of partners coming on because of the importance of this.

HPCwire: Well that brings me to ask how this all synched up for both organizations?

JG: Well, there's a 20 year history with SGI. But forget this computer stuff, forget the technology, forget the complexity, and all this selling stuff. I want to talk to you now as an American. This is really about three really important things that have to do with you and your kids and your grandkids. Let me tell you what they are.

The first one is if there is life out there in the universe. NASA's mission is to understand if we're alone in the universe. This computer will enable us to better understand what's going on out there -- communicating, understanding light, understanding if there's radiation, how the Earth is expanding, and how the universe is expanding -- that's one.

The second thing it's going to understand meteorology and climate changes on the Earth. Hurrican path projections, weather temperature changes in the ocean, etc. That affects people's lives and affects whether you do or don't evacuate people in Florida when there's a hurrican and it's 600 miles away. This really affects real people's lives.

Finally, it's about today, return to flight and the space shuttle -- how you safely make the committment to your President, your God, and to your astronauts that you can protect them and return them to Earth safely.

So the technology has been a compromise up until today. Finally, there is an architecture which is bigger than the computational problems required in order to solve those issues. It's awesome, but it's not unusual that NASA is in the forefront. Today, the U.S. government, the Department of Energy, NASA, LLNL, LANL, all of these world class institutions -- and I even extend it to world class universities and research universities like MIT and Harvard and NCSA in Illinois -- these organizations see it as such a critical part of what they intend to deliver over the next 100 years to your kids and my grandkids, etc. These enabling tools will enable them to do faster discovery and insight.

That's why we've recently doubled. Last quarter, we doubled the number of Altix systems we've installed. Around the world, they see this technology as a way, not to just run Linpack benchmarks, but as a way to really solve what their institutions are all about.

HPCwire: So how important is a list to say who is fastest, when Columbia is productively engaging its users on a global scale?

JG: We as a company value the importance of a quantitative metric for assessing how you're doing as a company. To us, we have a long history of doing one single thing -- that's providing real science tools to our customer base. There is a sense of pride, not in the fact that we've sold 1,000 Altix systems, but that we are making a difference to the world in which we live.

Let's just look at the last year and a half in science. Two discoveries that have been enabled because of the technology in the world: one, the universe is expanding at an accelerating rate and two, that there was water on Mars at a previously. Those two discoveries are going to affect science, the way we look at the world, and how we perceive how the Earth was created, how the universe was created, and how we fit in it.

HPCwire: I've heard some people express their discontent with government initiatives like the High-End Computing Revitalization Task Force (HECRTF), citing a certain amount of stalling on the government's part to make HPC a top priority. Do you feel the government is doing enough to advance science, then?

JG: I'd like to reference Craig Barrett, the chairman of Intel corporation. He spoke at the Gartner conference in Florida two weeks ago. He asked why we, as a nation, are spending $5 billion dollars investing in science and technology, when we're spending $25 billion in farm subsidies and agricultural supports. His point was that we, as a country, need to re-assess our priorities relative to our investment in these key science enabling tools. We are not a nation of engineers, mathematicians, astronomers, and DNA life sciences researchers...and maybe over the next 100 years, we should be.

HPCwire: Does project Columbia really propel us in that direction?

JG: I think Columbia is one of about six really key initiatives that have been enabled by the vision of the people in the U.S. government to do exactly that. But they're not doing it for the image and/or political rhetoric of just investing in it, they're doing it so that they can deliver real value, to real citizens, and make a real difference in mankind. That's why I say this is 10% technology and 90% inspiration.

Let me put it on a personal level. You are a resident of the state of Florida. You've gone through three hurricanes that have destroyed millions of homes and done damage to hundreds of miles of coastline, and there's another one coming. The tools in the world that were here three months ago were able to do hurricane path projections to a 300 hundred kilometer range. Today, the improvement is able to get it to within 120 kilometer spread. That means for residents, a 180 kilometer wide spread will not have to be evacuated from their homes. That's a very real value to real people. That is a monstrous step ahead in safety, in communication, and also buys more people time to evacuate.

HPCwire: Is there a productivity crisis in HPC right now? Do you think computers, in general, are actually accomplishing things or just boasting record speeds?

JG: What this conference is all about is tying real value to the technology. You can walk through here and see literally hundreds of research institutes, universities, life sciences companies, development labs, countries and collaborative initiatives that involve governments, manufacturers, private enterprise, etc. and they are NOT taking the technologies and fitting problems into that technology. They are doing the reverse. They are understanding the real problems and goals of the world and then architecting the technology to solve those problems. It would be a crisis if we were taking today's tools and trying to retro-fit them to selected problems. What is actually a wonderful opportunity, is the fact that there is a flexibility, an architecture and now open systems, specifically, that now enable people to tie the technology to very specific problems.

HPCwire: Thanks, Jeff, I know that you have to run. Thanks to both of you for speaking with us a bit about project Columbia and the opportunities it affords the HPC community.


As chief of the NASA Advanced Supercomputing (NAS) Division, Walt Brooks oversees the entire gamut of high performance computing work done within the division, and is working to transform the vision, mission, and direction for NAS.

Jeff Greenwald is the Senior Director of Project Management & Marketing Server & Platform Engineering for SGI.


Top of Page

  |  Table of Contents  |