HPCwire
 The global publication of record for High Performance Computing / March 19, 2004: Vol. 13, No. 11

Previous Article   |  Table of Contents  |  

Features:

EXPERT OPINION: THE COMING CRISIS IN COMPUTATIONAL SCIENCE
by Douglass Post

In response to our March 12 column by the High End Crusader, HPCwire article #107185, US FUNDING PRIORITIES AND ROADMAPS FOR PETAFLOPS, Dr. D. E. Post has allowed us to publish a relevant paper he presented as part of the proceedings of the IEEE International Conference on High Performance Computer Architecture: Workshop on Productivity and Performance in High-End Computing, Madrid, Spain, February 14, 2004, Los Alamos Report LA-UR-04-0388 author: D. E. Post, Los Alamos National Laboratory.

Below is part one of Douglass Post's commentary on the future of computational science in the U.S. He works with the Physics Division at Los Alamos National Labs. Look for part two of this three part series in next week's HPCwire.


ABSTRACT

Computational science faces three major challenges: "The Performance Challenge," "The Programming Challenge" and the "The Prediction Challenge". The exponential growth in processor speed and the advent of massive parallelization have increased computing power by a factor of 1013 since 1945. This has enabled scientists and engineers to tackle important problems of unparalleled size and complexity. However, the complicated architectures of these new platforms have made programming more difficult. Furthermore, much of the improved predictive power has been achieved by increasing the complexity of the application models and algorithms. This has raised the level of the challenges associated with developing and using the resulting large, complex computer codes. As a community we are meeting the first challenge-"The Performance Challenge," but are not doing as well with the other two challenges-"The Programming Challenge" and "The Prediction Challenge." Computer capability appears likely continue to grow exponentially in the near term.

On the other hand, crises loom for programming and prediction. For the "Programming Challenge," even short programs are often difficult to write for massively parallel platforms. The time scale for developing large-scale applications is often longer than the life cycle of a single platform architecture. Porting applications to new platforms is difficult and challenging. Existing programming tools are inadequate for rapid code development and optimization. Many, if not most, application codes achieve only a small fraction of the potential peak performance. The High Performance Computing Community must make programming easier, or at least no harder, as it builds ever more powerful-and complex-computers.

With regard to the "Prediction Challenge," computational science does not have the predictive reliability of traditional methodologies such as theory, experiment and engineering design. The results of many major computer applications are often wrong or are misinterpreted, sometimes with disastrous consequences. Computational science must mature as a field if it is to become a reliable methodology for addressing important problems. History indicates that it takes time-and quite a few major and possibly dramatic mistakes-for new methodologies to mature. Such major mistakes are occurring now in computational science. Just as other disciplines have learned from their mistakes, we, as a community, must analyze our mistakes and successes and adopt the "lessons learned". The Computational Science community must improve the predictive capability of application codes if computational science is to become a useful tool for solving society's problems. A key figure of merit is the "time to solution", the time between the identification of a problem and the delivery of validated and analyzed computational solution. For the reasons quoted earlier, the "time to solution" is growing in many cases, not decreasing. Reducing it requires that we address all three challenges.

1. Introduction

Computational science-the use of large-scale computers to address and solve important technical problems-is becoming an everyday tool for design and analysis of complex technical issues. Applications include scientific research, engineering design, policy analysis, training and emergency response and environmental analyses. Computational science has the potential to address complex issues with a degree of realism that has heretofore only been imagined. This exciting and very important -indeed revolutionary- potential is due to the enormous growth in computer power (speed and data storage) over the last 50 years. This growth shows no sign of slowing in the near term. Yet computational science is a very new and immature discipline. It has not achieved the level of maturity of traditional methodologies such as experiment, theory, engineering design and conventional policy analysis for solving problems.

At least three distinct challenges face the computational science community. First exponential growth in computer power gives us greater ability to tackle difficult and more important problems. But as computers have become more powerful, they have also become more complex. Simple computer architectures have evolved into massively parallel structures with very complex designs and connections. This expanding computer power-larger memory and data storage, and faster processing speed-is enabling very large application programs to treat many very complex and strongly interacting effects. Climate models now include models of dozens of effects where before they included only a few. These developments lead to three distinct challenges: 1. "The Performance Challenge": Designing and building high performance computers. 2. "The Programming Challenge": Programming for complex computers. 3. "The Prediction Challenge": Developing codes with complex physical models that are truly predictive. The exponential growth in microchip processing power described by "Moore's Law" together with the concomitant increase in memory and disk speed and size and the advent of massively parallel platform architectures have resulted in a factor of 1013 improvement in computer processing power since 1945. This expansion of raw computing power is enabling computational science to address many important problems with a degree of realism and fidelity that were only dreamt of ten or twenty years ago.

However, the increased complexity of computers has made programming for them more difficult and time consuming. Optimizing code performance is becoming more difficult but the performance analysis and debugging tools for massively parallel platforms are still in their infancy. Programming models (MPI, OpenMP, HPC, etc.) have evolved slowly and vary among platform vendors and architectures. Developing ways to reduce the difficulty of programming high performance computers is a key requirement for computational science to advance.

The main topic of this paper is "The Prediction Challenge," the challenge of successfully developing codes with complex scientific and engineering models that can make accurate predictions or analyze data correctly. Based on surveys of many technical code projects and case studies to develop "lessons learned", I can identify three major "lessons learned" (and many more slightly less important ones) that need much more emphasis by the computational science community if scientific and technical computational results are to have credibility: Verification, Validation, Code Project Management and Quality. Every code consists of models of real effects in nature and mathematics. First, the code must solve the models correctly. The solution algorithms must be applied correctly. The code needs to be free of bugs that significantly affect the results. If the code is not mathematically correct, then any conclusions derived from the code are likely wrong. Second, the models in the code must represent the real world with sufficient accuracy that the code predictions and analysis provide a valid basis for decisions. The models must be checked with experimental data for the regimes of interest. Third, the development of complex, large codes is a complex undertaking itself.

Now the development process often involves teams that can be large as 20 or 30 staff, or even larger. The teams need many different skills: science, programming, computer science and computational mathematics. Twenty years ago, most scientific code development teams were much smaller and the range of required skills much less broad. The project has to be organized so that the team members developing the code know what they are trying to accomplish, how to work together productively, what programming and physical models are appropriate, how they will go about developing the code, how long it should take, what resources they will need, what constitutes success and who is customer. Unfortunately, many computational science projects are seriously deficient in one or more of the three areas highlighted above, and the results of those codes are therefore often of little value.

Even in the cases when the code is credible, it may have been applied inappropriately. Because it is difficult for an outsider to judge the validity of scientific computational predictions and analyses, computational science does not have the credibility of the more mature problem solving methodologies of theory, experiments and engineering design.

To summarize, there are three major challenges for Computational Science. The High Performance Computing community is being successfully meeting the first, "The Performance Challenge"-developing high performance computers. The second- "The Programming Challenge," involves both the High Performance Computer development community and the computer science community. Both communities must find ways to reduce the difficulty of developing application codes for today's and tomorrow's High Performance Computers. The third-"The Prediction Challenge," primarily involves the computational science community. We must improve the predictive capability of these increasingly complicated programs.

A key measure of the effectiveness of computational science is the "time to solution." This is the time required to conceive and develop a validated computational solution to a problem. The time includes the calendar time required to develop and deploy a computer platform, develop the application, obtain validated solutions and analyze the solutions. From the perspective of the sponsors who need accurate answers to important problems, this is the critical issue. A faster computer platform may run problems more quickly, but it takes a lot longer to develop a code for the faster platform, the time to solution may be larger, not less. Reducing the time to solution requires addressing all three Challenges: Performance, Programming and Prediction. This is the goal of the Defense Advanced Research Projects Agency (DARPA) High Productivity Computing Systems (HPCS) project. The HPCS project has focused initiatives in all three challenge areas involving major high performance computing vendors (IBM, Cray and SUN), the computer science community and the technical code development community and has the stated goal of reducing the time to solution.

...PART II TO BE PUBLISHED IN NEXT WEEK'S HPCWIRE


Douglass Post is a physicist in the Physics Division at the Los Alamos National Laboratory. He was the Deputy Division Leader for Simulation in the Applied Physics Division at Los Alamos from 2001 and 2002. From 1998 to 2001, he was the Associate Division Leader for Computational Physics for "A" and "X" Divisions at the Lawrence Livermore National Laboratory. He graduated from Stanford University with a Ph.D in Physics in 1975. He has 30 years of experience with the development of technical software and computational physics and project management in magnetic fusion, atomic and molecular physics, transport phenomena and nuclear weapons at Los Alamos, the Lawrence Livermore National Laboratory, and the Princeton University Plasma Physics Laboratory. Doug was leader of the tokamak modeling group at the Plasma Physics Laboratory from 1975 to 1993. He was head of the Physics Project Unit for the International Thermonuclear Experimental Reactor Conceptual Design Phase (1988-1990) and head of the In-Vessel Physics Group during the Engineering Design Phase (1993-1998). He is the Associate Editor-in-Chief of the IEEE/AIP publication "Computing in Science and Engineering", and a fellow of the American Physical Society and of the American Nuclear Society. His current professional interests include the development of software engineering methodologies for scientific computing as Team Leader for Analysis of Existing Codes for the DARPA High Productivity Computing Systems Project.

Los Alamos National Laboratory, an affirmative action/equal opportunity employer, is operated by the University of California for the U.S. Department of Energy under contract W-7405-ENG-36. By acceptance of this article, the publisher recognizes that the U.S. Government retains a nonexclusive, royalty- free license to publish or reproduce the published form of this contribution, or to allow others to do so, for U.S. Government purposes. Los Alamos National Laboratory requests that the publisher identify this article as work performed under the auspices of the U.S. Department of Energy. Los Alamos National Laboratory strongly supports academic freedom and a researcher's right to publish; as an institution, however, the Laboratory does not endorse the viewpoint of a publication or guarantee its technical correctness.


Top of Page

Previous Article   |  Table of Contents  |