
Features:
EXPERT OPINION: THE COMING CRISIS IN COMPUTATIONAL SCIENCE
by Douglass Post
In response to our March 12 column by the High End Crusader, HPCwire article
#107185, US FUNDING PRIORITIES AND ROADMAPS FOR PETAFLOPS, Dr. D. E. Post has
allowed us to publish a relevant paper he presented as part of the proceedings
of the IEEE International Conference on High Performance Computer
Architecture: Workshop on Productivity and Performance in High-End Computing,
Madrid, Spain, February 14, 2004, Los Alamos Report LA-UR-04-0388 author: D.
E. Post, Los Alamos National Laboratory.
Below is part one of Douglass Post's commentary on the future of computational
science in the U.S. He works with the Physics Division at Los Alamos National
Labs. Look for part two of this three part series in next week's HPCwire.
ABSTRACT
Computational science faces three major challenges: "The Performance
Challenge," "The Programming Challenge" and the "The Prediction Challenge".
The exponential growth in processor speed and the advent of massive
parallelization have increased computing power by a factor of 1013 since 1945.
This has enabled scientists and engineers to tackle important problems of
unparalleled size and complexity. However, the complicated architectures of
these new platforms have made programming more difficult. Furthermore, much of
the improved predictive power has been achieved by increasing the complexity
of the application models and algorithms. This has raised the level of the
challenges associated with developing and using the resulting large, complex
computer codes. As a community we are meeting the first challenge-"The
Performance Challenge," but are not doing as well with the other two
challenges-"The Programming Challenge" and "The Prediction Challenge."
Computer capability appears likely continue to grow exponentially in the near
term.
On the other hand, crises loom for programming and prediction. For the
"Programming Challenge," even short programs are often difficult to write for
massively parallel platforms. The time scale for developing large-scale
applications is often longer than the life cycle of a single platform
architecture. Porting applications to new platforms is difficult and
challenging. Existing programming tools are inadequate for rapid code
development and optimization. Many, if not most, application codes achieve
only a small fraction of the potential peak performance. The High Performance
Computing Community must make programming easier, or at least no harder, as it
builds ever more powerful-and complex-computers.
With regard to the "Prediction Challenge," computational science does not have
the predictive reliability of traditional methodologies such as theory,
experiment and engineering design. The results of many major computer
applications are often wrong or are misinterpreted, sometimes with disastrous
consequences. Computational science must mature as a field if it is to become
a reliable methodology for addressing important problems. History indicates
that it takes time-and quite a few major and possibly dramatic mistakes-for
new methodologies to mature. Such major mistakes are occurring now in
computational science. Just as other disciplines have learned from their
mistakes, we, as a community, must analyze our mistakes and successes and
adopt the "lessons learned". The Computational Science community must improve
the predictive capability of application codes if computational science is to
become a useful tool for solving society's problems. A key figure of merit is
the "time to solution", the time between the identification of a problem and
the delivery of validated and analyzed computational solution. For the reasons
quoted earlier, the "time to solution" is growing in many cases, not
decreasing. Reducing it requires that we address all three challenges.
1. Introduction
Computational science-the use of large-scale computers to address and solve
important technical problems-is becoming an everyday tool for design and
analysis of complex technical issues. Applications include scientific
research, engineering design, policy analysis, training and emergency response
and environmental analyses. Computational science has the potential to address
complex issues with a degree of realism that has heretofore only been
imagined. This exciting and very important -indeed revolutionary- potential is
due to the enormous growth in computer power (speed and data storage) over the
last 50 years. This growth shows no sign of slowing in the near term. Yet
computational science is a very new and immature discipline. It has not
achieved the level of maturity of traditional methodologies such as
experiment, theory, engineering design and conventional policy analysis for
solving problems.
At least three distinct challenges face the computational science community.
First exponential growth in computer power gives us greater ability to tackle
difficult and more important problems. But as computers have become more
powerful, they have also become more complex. Simple computer architectures
have evolved into massively parallel structures with very complex designs and
connections. This expanding computer power-larger memory and data storage, and
faster processing speed-is enabling very large application programs to treat
many very complex and strongly interacting effects. Climate models now include
models of dozens of effects where before they included only a few. These
developments lead to three distinct challenges: 1. "The Performance
Challenge": Designing and building high performance computers. 2. "The
Programming Challenge": Programming for complex computers. 3. "The
Prediction Challenge": Developing codes with complex physical models that are
truly predictive. The exponential growth in microchip processing power
described by "Moore's Law" together with the concomitant increase in memory
and disk speed and size and the advent of massively parallel platform
architectures have resulted in a factor of 1013 improvement in computer
processing power since 1945. This expansion of raw computing power is enabling
computational science to address many important problems with a degree of
realism and fidelity that were only dreamt of ten or twenty years ago.
However, the increased complexity of computers has made programming for them
more difficult and time consuming. Optimizing code performance is becoming
more difficult but the performance analysis and debugging tools for massively
parallel platforms are still in their infancy. Programming models (MPI,
OpenMP, HPC, etc.) have evolved slowly and vary among platform vendors and
architectures. Developing ways to reduce the difficulty of programming high
performance computers is a key requirement for computational science to
advance.
The main topic of this paper is "The Prediction Challenge," the challenge of
successfully developing codes with complex scientific and engineering models
that can make accurate predictions or analyze data correctly. Based on surveys
of many technical code projects and case studies to develop "lessons learned",
I can identify three major "lessons learned" (and many more slightly less
important ones) that need much more emphasis by the computational science
community if scientific and technical computational results are to have
credibility: Verification, Validation, Code Project Management and Quality.
Every code consists of models of real effects in nature and mathematics.
First, the code must solve the models correctly. The solution algorithms must
be applied correctly. The code needs to be free of bugs that significantly
affect the results. If the code is not mathematically correct, then any
conclusions derived from the code are likely wrong. Second, the models in the
code must represent the real world with sufficient accuracy that the code
predictions and analysis provide a valid basis for decisions. The models must
be checked with experimental data for the regimes of interest. Third, the
development of complex, large codes is a complex undertaking itself.
Now the development process often involves teams that can be large as 20 or 30
staff, or even larger. The teams need many different skills: science,
programming, computer science and computational mathematics. Twenty years ago,
most scientific code development teams were much smaller and the range of
required skills much less broad. The project has to be organized so that the
team members developing the code know what they are trying to accomplish, how
to work together productively, what programming and physical models are
appropriate, how they will go about developing the code, how long it should
take, what resources they will need, what constitutes success and who is
customer. Unfortunately, many computational science projects are seriously
deficient in one or more of the three areas highlighted above, and the results
of those codes are therefore often of little value.
Even in the cases when the code is credible, it may have been applied
inappropriately. Because it is difficult for an outsider to judge the validity
of scientific computational predictions and analyses, computational science
does not have the credibility of the more mature problem solving methodologies
of theory, experiments and engineering design.
To summarize, there are three major challenges for Computational Science. The
High Performance Computing community is being successfully meeting the first,
"The Performance Challenge"-developing high performance computers. The second-
"The Programming Challenge," involves both the High Performance Computer
development community and the computer science community. Both communities
must find ways to reduce the difficulty of developing application codes for
today's and tomorrow's High Performance Computers. The third-"The Prediction
Challenge," primarily involves the computational science community. We must
improve the predictive capability of these increasingly complicated programs.
A key measure of the effectiveness of computational science is the "time to
solution." This is the time required to conceive and develop a validated
computational solution to a problem. The time includes the calendar time
required to develop and deploy a computer platform, develop the application,
obtain validated solutions and analyze the solutions. From the perspective of
the sponsors who need accurate answers to important problems, this is the
critical issue. A faster computer platform may run problems more quickly, but
it takes a lot longer to develop a code for the faster platform, the time
to solution may be larger, not less. Reducing the time to solution requires
addressing all three Challenges: Performance, Programming and Prediction. This
is the goal of the Defense Advanced Research Projects Agency (DARPA) High
Productivity Computing Systems (HPCS) project. The HPCS project has focused
initiatives in all three challenge areas involving major high performance
computing vendors (IBM, Cray and SUN), the computer science community and the
technical code development community and has the stated goal of reducing the
time to solution.
...PART II TO BE PUBLISHED IN NEXT WEEK'S HPCWIRE
Douglass Post is a physicist in the Physics Division at the Los Alamos
National Laboratory. He was the Deputy Division Leader for Simulation in the
Applied Physics Division at Los Alamos from 2001 and 2002. From 1998 to 2001,
he was the Associate Division Leader for Computational Physics for "A" and "X"
Divisions at the Lawrence Livermore National Laboratory. He graduated from
Stanford University with a Ph.D in Physics in 1975. He has 30 years of
experience with the development of technical software and computational
physics and project management in magnetic fusion, atomic and molecular
physics, transport phenomena and nuclear weapons at Los Alamos, the Lawrence
Livermore National Laboratory, and the Princeton University Plasma Physics
Laboratory. Doug was leader of the tokamak modeling group at the Plasma
Physics Laboratory from 1975 to 1993. He was head of the Physics Project Unit
for the International Thermonuclear Experimental Reactor Conceptual Design
Phase (1988-1990) and head of the In-Vessel Physics Group during the
Engineering Design Phase (1993-1998). He is the Associate Editor-in-Chief of
the IEEE/AIP publication "Computing in Science and Engineering", and a fellow
of the American Physical Society and of the American Nuclear Society. His
current professional interests include the development of software engineering
methodologies for scientific computing as Team Leader for Analysis of Existing
Codes for the DARPA High Productivity Computing Systems Project.
Los Alamos National Laboratory, an affirmative action/equal opportunity
employer, is operated by the University of California for the U.S. Department
of Energy under contract W-7405-ENG-36. By acceptance of this article, the
publisher recognizes that the U.S. Government retains a nonexclusive, royalty-
free license to publish or reproduce the published form of this contribution,
or to allow others to do so, for U.S. Government purposes. Los Alamos National
Laboratory requests that the publisher identify this article as work performed
under the auspices of the U.S. Department of Energy. Los Alamos National
Laboratory strongly supports academic freedom and a researcher's right to
publish; as an institution, however, the Laboratory does not endorse the
viewpoint of a publication or guarantee its technical correctness.
|