
Features:
EXPERT OPINION: THE COMING CRISIS IN COMPUTATIONAL SCI (III)
by Douglass Post
In response to our March 12 column by the High End Crusader, HPCwire article
#107185, US FUNDING PRIORITIES AND ROADMAPS FOR PETAFLOPS, Dr. D. E. Post has
allowed us to publish a relevant paper he presented as part of the proceedings
of the IEEE International Conference on High Performance Computer
Architecture: Workshop on Productivity and Performance in High-End Computing,
Madrid, Spain, February 14, 2004, Los Alamos Report LA-UR-04-0388
Below is the final installment of Douglass Post's commentary on the future of
computational science in the U.S. For Part I and II, see articles #107234 and
#107294 [http://www.tgc.com/hpcwire/backissues/]. Dr. Post works with the
Physics Division at Los Alamos National Labs. The rest of his workshop papers
can be found at http://www.research.ibm.com/arl/pphec/.
6. Quantitative Estimation
These "lessons learned" were based on a qualitative and a quantitative
analysis of the history of the different ASCI code projects and comparison
with the Information Technology industry and conventional project management
and scientific research. The quantitative analysis was a key element in
establishing that the ASCI code projects had not been given a consistent set
of requirements, resources and schedules. While our analysis (Post and
Kendall, 2003) was relatively simple compared to the methods often employed in
the Information Technology (IT) community(Capers-Jones, 1998), the conclusions
are very clear. We found that the key predictor of success was the age of the
code project and the amount of time allocated to complete the project and meet
milestones. Our analysis of the historical data indicated that it takes about
8 years to develop an ASCI weapons code. The projects that had 8 years of
development often succeeded, and all those that did not have 8 years of
development time failed to meet their initial milestones. This result
emphasized the crucial need to get the requirements rights then to allocate
sufficient resources and time (i.e. schedule) to meet those requirements.
The case studies included metrics (code size, team size, age, etc.). To see if
the ASCI experience was consistent with the Information Technology (IT)
community experience, we analyzed the case studies using a generic "function
point" model (Capers-Jones, 1998)widely used by the IT industry. We calibrated
this model for scientific code projects using the ASCI case study data.
Function points are a weighted total of inputs, outputs, inquiries, logical
files and interfaces(Symons, 1988; Capers-Jones, 1998). Functions points
specifically developed for technical software (computational science software)
do not yet exist. IT function point measures do exist and were something we
could use to make the present argument.
Equation 1 FP = C++ SLOC/53 + C SLOC/128 + F77 SLOC/107
Equation 2 Schedule (months) = FP^x where 0.4 < x < 0.5: use 0.47
Equation 3 Team size = FP / 150
Equation 4 Schedule = Contingency x Function Point Schedule + Delays
Equation 5 Team Size = 3 + 0.6 * FP/150
We first converted the single lines of code to Function Points (FP)(e.g. eq.
1). T. Capers Jones lists the equivalent single lines of code (SLOC) per
function point (FP) for the common computer languages (Capers-Jones, 1998)
since computer languages have different information densities.
In this model, the required schedule and average team size are determined by
the Function Point (FP) count (eqs. 2,3). We calibrated and modified these
general scalings to account for the added complexity and viscosity associated
with developing scientific codes specifically for the nuclear weapons complex.
We increased the schedule by 1.5 years to account for the additional time it
takes to recruit, hire, train and get security clearances for code development
staff. Using a methodology developed by the Lawrence Livermore National
Laboratory Engineering Department(Remer, 2000), we calculated a contingency
factor of 1.6 to account for the additional risks, uncertainties,
complexities, etc. for the LANL and LLNL computing environments (eq. 4). We
modified the standard FP scaling for the size of the code team (eq. 5)
(Capers-Jones, 1998)to match the ASCI data. This included a correction for
small code teams.
We analyzed seven code projects, three at LLNL and four at LANL (Table 2). For
national security classification reasons, we have identified the LLNL codes
with the letters A, B and C. Table 2 lists the size of the code in function
points, the time estimated by equation 4 to develop the initial capability of
the code project, the actual age of the code at the point it was expected to
accomplish its first milestone, whether or not the project succeeded, the
optimal code team size estimated from equation 3 and the actual size of the
team. The sizes of the codes (e.g. lines of code, loc) were approximate
estimates by the code teams. Establishing the size of the code teams was
challenging. In general, good records were not available. Thus the code team
sizes were generally estimated by the code team leaders. Because good records
were not kept, it was also difficult to account for staff who worked on the
code project but were part of other organizations. More than one-half of the
Blanca code project team, for instance, was part of other organizations. Where
this was an issue, we used conservative estimates. For example, the Blanca
code project staff probably had a staffing level of about 50 people for the
first 4 or 5 years of its life instead of the 35 we assumed. We used a smaller
number based on the actual number of people we could definitely identify as
having worked on the project.
The case histories and the estimation procedures indicate that it generally
takes a minimum of 8 years for a code team to develop an initial capability
for a weapons code project. The requirements for a weapons code are determined
by the physics necessary to simulate a nuclear weapon. LANL and LLNL have over
50 years of experience in this area, and know these requirements in detail.
Weapons code projects require between 3000 and 6000 function points (Table 2).
Some of the ASCI codes were started before ASCI began in 1996 (ASCI B, Legacy
A for LLNL, and the LANL Crestone code project). ASCI B was started roughly in
1992 and had a working prototype in 1994. The Crestone code project was
started before 1992. ASCI A and the Shavano and Antero code projects were
started around early 1997. Legacy A was started over 30 years ago and was
included for comparison and normalization. Since we are able to match the
history of weapons codes with scalings derived from the experience of the
commercial software industry, we conclude that the constraints, computer
science practices and management issues that generally apply to the IT
industry generally apply to the development of weapons codes as well (i.e.
there is no "Silver Bullet" that can radically reduce the development
time(Brooks, 1987)).
Table 2
Software Resource Estimates for the LLNL and LANL Code Projects
----------LLNL----------- --------------LANL---------------
ASCI A ASCI B Legacy A Antero Shavano Blanca Crestone
Single Lines of Code 184000 490000 410550 300000 500000 200000 314000
Function Points (Eq.1) 4800 4000 5400 2900 4800 3800 2900
estimated schedule(Eq.4) 8.7 7.6 6.9 6.6 8.1 7.4 6.7
Pjt age (1st milestone) 3 9 N/A 4 3.5 8 8
Successful in achieving
initial ASCI milestone No Yes N/A No No No Yes
Est. staff reqts. (Eq.3) 22 27 24 14 22 18 14
real team size 20 22 8 17 8 35 12
We found that the dominant factor for success is the age of the code project.
The code projects that did not have sufficient time (8 years) to complete
their projects failed to meet their milestones. All but one of the code
projects that had 8 years succeeded in meeting their milestones. This is clear
evidence that schedules and requirements must be consistent. The schedule
cannot be fixed independently of the requirements, a fact long appreciated by
the IT industry(DeMarco, 1997; Capers-Jones, 1998) but not adequately taken
into account in the early planning for ASCI. The ASCI program set the
milestone for demonstrating the capability of each code project to be three
and a half years (December 1999) after the beginning of ASCI (~mid 1996) and
three years after the date that many of the code projects were launched
(~January 1997).
Adequate development time is necessary-but not sufficient-for success. Several
code projects failed in spite of having adequate time. Poor practices and
inadequate support-implicitly included in the contingency factor-hurt many of
the projects as well. The Blanca code project failed to meet its milestones
even with adequate time and ample resources.
Another point is that it is clear from the function point scaling relations
(eqs. 1-5) that the code requirements determine both the schedule and
resources needed for success. This estimating analysis indicates the
importance of a realistic set of requirements, schedule and resources. Without
them, projects will fail and the needed applications will not be developed.
These case studies helped persuade the ASCI senior management that the
"younger" code teams (those started less than 8 years before the milestone)
were not necessarily incompetent, but were just unable to do 8 years of work
in less than 4 years. The management was then able to recognize that several
(but not all) of these "younger" projects were actually making very good
progress compared to "normal" code development rates and had very high
potential for producing successful codes that would give the ASCI program
substantially improved tools. Partly motivated by the case studies, the ASCI
management then developed a more realistic schedule for code development,
placed more emphasis on the needs of the users and provided better support for
the code teams.
Three issues identified as "lessons learned" are expanded on in the following
two sections: verification and validation and software quality. Both areas are
crucial for success for technical software projects, and have special-and
often not well understood-requirements.
7. Verification and Validation
An application code typically solves a model problem that is only an
abstraction of reality. Many things can limit the validity of a code
calculation. The models and solution algorithms may be implemented
incorrectly. The models may not accurately reflect the phenomena of interest
(Roache, 1998; Oberkampf and Trucano, 2002). Verification is the determination
that the code solves the model correctly. Validation is the determination that
the models in the code capture, with adequate fidelity, the phenomena of
interest. Both are essential elements of a program to develop and apply
application codes to problems of interest (Roache, 1998). Without adequate
verification and validation, there is no reason to believe any part of a code
result. Unfortunately, for much of computational science, verification and
validation efforts fall far short of what is needed.
Both verification and validation become more difficult as codes become more
complicated and their applications more important. A typical application might
have many different components. A sophisticated climate modeling code might
include models for ocean evaporation, ocean currents, ocean salinity,
atmospheric flow, clouds, precipitation, CO2 sequestration, radiation
transport, atmospheric chemistry, ground water flow, vegetation growth, ice
formation, etc. The code might predict many observables, such as average
surface temperature, precipitation levels, etc. The accuracy of these
observables depends on the accuracy of each component model, the completeness
of the set of all the models (i.e. does the code treat all of the important
phenomena), the accuracy of the solution method for the model including its
interaction with the other models, the physical data used in the models, the
adequacy of the problem generation and the ability of the user to correctly
set up the problem, run it and interpret the results. Verifying and validating
all of these is a major challenge.
The accuracy of the multi-model code depends first on the accuracy of each
component, as well as the accuracy of their interactions. In practice, first
one has to verify each component, then validate each component for the
relevant regimes, then verify and validate progressively larger collections of
interacting components, until the entire integrated code has been "verified"
and "validated" for the problem regimes of interest.
There are at least four common verification techniques, all with serious
shortcomings:
1. Comparison of the code results with the analytic results for a problem with
an exact answer,
2. Establishing that the convergence rate of the truncation error is
consistent with the expected convergence rate, and
3. Comparison of the observed results with the expected results for a problem
specially manufactured to test the model (or models)(Boehm, 2002),
4. Computation and monitoring "conserved" quantities and parameters that
should be constant or are predictable.
The first method is worthwhile, but extremely limited in practice. There are
usually few (if any) relevant problems with exact answers, especially with
realistic boundary conditions, realistic geometries, realistic data, non-
linear conditions, or multiple-component systems. The computational fluid
dynamics community widely uses the convergence rate of the truncation error to
verify programs(Roache, 1998; Hallquist, 2003). This technique, too, is
limited in applicability. It works best when the expected truncation rate can
be determined from the basic difference equations and boundary conditions.
That is often not possible. Convergence rates often aren't useful to check two
or more interacting modules. The third technique, the Method of Manufactured
Solutions, is, in principle, very powerful(Roache, 1998; Pautz, 2001; Roache,
2002). It works for almost arbitrarily complicated and strongly coupled
models, and almost arbitrarily complicated boundary conditions. However,
problems with real data, moving or adaptive meshes, non-analytic (and non-
differential) terms and real physical data are difficult to treat. These
challenges, as well as the complexity of implementing the manufactured
solutions, seem to prevent its wide-spread use. A fourth technique is
monitoring behavior the developer know has to be correct, such as "conserved"
quantities (e.g. total energy, momentum, mass, etc.), quantities whose
evolution can be estimated (e.g. entropy) to check the accuracy of individual
components and of the whole code, or procedures that can be predicted (e.g.
procedural behaviors designed into the code). Yet, in spite of all these
limitations, verification must be done as thoroughly as possible. If a code
isn't solving the models correctly, then the answers are worthless. Any
correspondence of the answers with reality is completely fortuitous.
Verification needs to be performed every time the code or operating system
(compilers, etc.) changes. A code has to be verified before it can be
validated. Validating an unverified code is generally a waste of time. Given
the deficiencies of existing practices, better verification techniques are
desperately needed. Comparing the results of a problem for two different codes
(Benchmarking) can increase the likelihood of catching errors, but only to a
limited degree. Both codes could be wrong. Two codes usually have different
ways of solving a problem and sorting those effects can be time-consuming and
potentially impossible. Benchmarking is worthwhile because it can catch
errors, but it isn't a substitute for a mathematically rigorous verification
procedure.
As a practical matter, diligent code developers do as much verification as
they judge feasible, and then keep their eyes open for suspicious behavior by
the code. However, this is far from a guarantee that the code is free of
errors. Also, not all code developers (and users) are sufficiently diligent or
knowledgeable.
Once a code has been verified as much as possible, the code must be validated
for the problem regimes of interest. A code is never a valid tool for all
conceivable problems. It can only be validated for specific regimes, and the
validity in adjacent regimes estimated. The entire calculational system
including the user, computer system, problem set-up, problem running and
results analysis for each user and computer system must be validated because
all elements are important. An inexperienced or non-expert user can easily get
wrong answers using a good code in a validated regime.
Validation has a number of challenges. Each individual component and all
important combinations of the components must be validated. Validation data
and experiments have a variety of forms (Table 3).
Table 3
Four types of experiments used to validate codes:
1. Passive observations of physical events (e.g. supernovae explosions or the
weather),
2. Experiments designed to certify a physical component or physical system
(tests of an engineering component such a scaled airplane wing, car crash,
etc.),
3. Experiments designed to elucidate a general physics or engineering
principle or law (e.g. wind tunnel studies of turbulent eddies around
airfoils), and
4. Experiments specifically designed to validate a code application (e.g. wind
tunnel tests designed to provide data to validate a code calculation).
Each type of experiment can be done before or after the code prediction has
been completed and can address single-effect issues or integrated phenomena.
The best validation consists of the comparison of predictions made before an
experiment with data from experiments designed specifically for validation.
Successful prediction of experimental results is a better test than successful
reproduction of existing experiments. Since few codes have no uncertainties,
"tuning" a code for an application is usually necessary to get reasonable
answers. The experienced user has learned how to set up an appropriately zoned
mesh, how to vary the physical data within the known uncertainties to get
reasonable answers, which effects are essential for the application and which
are inappropriate, how to interpret the results, when the code has is outside
the region of validity, etc. With this freedom, it is thus often feasible to
tune a code to match many of the salient points of an existing experiment. It
is a much more rigorous test of the code application to predict experimental
results before the experiment has been conducted. This is also prototypical of
the purpose of the code and computer system, i.e. to make accurate predictions
of unknown events using known data before the events occur. An additional
benefit of the validation process is that it trains the users how to use the
code to get reasonable results. The entire calculational system needs to be
validated (code, user, computer system). As we have seen, an inexperienced
user can get the wrong result.
For many applications, controlled experiments are not feasible or are
impractical. For them validation is especially challenging. Models of
astrophysical and large-scale geophysical phenomena (weather, climate,
volcanoes, asteroid impact, watersheds, etc.) and large scale economic and
political systems, must rely on historical data and current observations. We
will not be able to conduct controlled supernovae explosions in the near
future or schedule earthquakes, volcanoes or asteroid impacts. For these
phenomena, the best that can be done is to collect as extensive sets of data
as possible, especially data that is fundamental to the correctness of the
code. For these systems it is often not possible to get data for all
conditions, a complete time history, adequately resolved data, and data for
many of the quantities of interest.
However, many, if not most, applications can be validated with data from
controlled experiments. Key issues include adequate coverage in space and time
of the appropriate experimental initial conditions and the behavior of the
important variables. An accurate description of the initial and boundary
conditions is essential.
The types of experiments used for validation listed in Table 3 are also listed
in order of their utility for validation. Aeronautical Computational Fluid
Dynamics (CFD) codes were first validated using wind tunnel tests of scaled
aircraft parts (Experiment type 2, Table 3). The object of the experiment was
to test the aircraft part. The use of the data for validation was largely
incidental and occurred after the experiment. Most of experiments of type 2
were integral, in that they gave data that reflected the behavior of the
trade-offs of a number of competing effects. Code developers recognized that
data for specific effects was needed to validate each component in their
codes. They therefore used data from single effect experiments designed to
study a single, isolated phenomenon. Such data might be yield strength data
for metal components, thermal conductivity measurements, etc. Again,
validation of a code was usually not the primary purpose of the experiment,
although such experiments were often cheap enough that they could have been
used for explicit validation experiments. The fourth type of experiments are
those designed explicitly for code validation. The purpose of those
experiments is to test the models in the code. The code is often used to
design the experiment. Some of these points are illustrated in Figure 4. An
airfoil moving through the air sees a plane front of air rushing toward it.
Fifty years ago, wind tunnels were used to faithfully reproduce the plane air
front conditions to test aircraft components. Achievement of a plane front
required a large wind tunnel to minimize the effects of drag by the wall. Now,
much smaller wind tunnels are used to validate the codes that are used to
design airfoils. Once the requirement for a planar air front was removed, a
much smaller and cheaper wind tunnel could be used. The validation wind tunnel
facility can also have shorter set-up and experimental turnaround times and be
more easily and thoroughly diagnosed. The idea is to test the code, not the
component. A final test of the component may be advisable, but a CFD code
validated for the appropriate conditions can be used for many of the design
studies, especially if the final results of a computational optimization study
are checked experimentally.
In fact, data from experiments not designed for validation can sometimes be
misleading or inaccurate for validation. The experiment may have been designed
to measure a particular effect. The data for other effects may not have been
checked sufficiently and may be inaccurate, misleading or wrong. As noted in
the sonoluminescence example earlier, codes can be, and have been, forced to
match incorrect experimental data.
A paradigm shift with regard to the value and importance of validation
experiments is needed in the experimental community. Experimentalists and
funding agencies understand the value of experiments designed to explore new
scientific phenomena, test theories or certify and test the performance of a
design component. Few appreciate the value of experiments explicitly conducted
solely for the purpose of code validation. There generally exist no mechanisms
to get validation experiments funded even if experimentalists are interested.
Finally, since the value of verification and validation is to ensure that the
code can give accurate predictions for the phenomena of interest, a written
record of the verification and validation of the code is extremely important.
That record is necessary to establish the credibility of the code predictions
with the code project sponsors and customers. In fact, validation needs to be
organized like a project, with goals and requirements, a plan, resources, a
schedule, and deliverables including a documented record of the validation
project.
Few existing computational science projects practice systematic verification
or validation. Almost none have dedicated experimental validation programs
with dedicated validation experiments. Yet, without such programs,
computational science will never achieve credibility.
8. Software Quality and Software Project Management
Software quality and software project management are very important issues.
Improvements in quality offer the promise of greater longevity and easier
maintenance. Attention to quality will likely improve the code. Inattention to
quality will almost certainly contribute to poor quality (high defect rates,
and code that is hard to maintain and upgrade). It can also leave the code
project vulnerable to the Software Quality Assurance (SQA) mafia. If poor
quality becomes an issue, the sponsors and customers will take action. The DoD
and other sponsors have developed fairly rigid processes for code development
and software quality assurance in response to disasters caused by buggy
aircraft and satellite control software. Bugs in aircraft control software can
cause airplane crashes. To reduce the defect rate, the Air Force established a
very rigorous procedure for vendors to follow to develop such software(Paulk,
1994).
Similarly, sound software project management can do a lot to speed code
development, increase the likelihood of a successful product and minimize the
defect rate.
Quality was an issue for the US automobile industry in the 1970's and
1980's(Halberstam, 1986). The American automobile industry produced poor
quality cars that people didn't buy. The Japanese built high quality cars that
people did buy. A basic difference was that the US automobile industry did not
emphasize quality on the assembly line and in the externally supplied
components. They mostly tested the cars after they came off the assembly line
and tried to fix the worst ones. The Japanese, on the other hand, emphasized
quality at every step of the assembly process and for components. They tested
the cars at many points along the assembly line and tested components before
installation. The result was that Japanese cars had much higher quality, and
the American automobile industry lost many customers.
Similarly, software quality engineering is most effective when it is applied
at each step of the software development process. This is much better than the
all too common practice of waiting until the code is nearly complete to begin
testing the code. However, just as on the assembly line, different development
processes require different methods. No one size fits all. Also, just as the
Japanese auto makers emphasized input from the assembly line workers, the code
developers themselves are often the best judges of how to implement quality. A
process rigidly imposed by senior management will likely get the same type of
token compliance observed in the US auto industry.
Quality assurance for technical software has an important sociological
dimension. Technical software is developed by teams of scientists and
engineers. Scientists and engineers are trained to question everything, and
accept nothing purely on the basis of authority. After all, even though he
might want to, your boss can't change the laws of nature-and that's what you
are trying to model. In fact, that's why we hire scientists to develop
scientific software. The models in the codes have to be right. If the models
don't reflect reality, the code results are worthless. We will then make
decisions that will be wrong, often with tragic consequences. Giving
scientists a "bible" that describes an elaborate, rigid process for developing
software, but which provides little in the way of justification is counter-
productive. It seems to be more successful to work with each team to identify
the "practices" that add value to the scientific code development process, and
encourage the teams to implement the practices they helped to identify
(Phillips, 1997). It's also necessary to provide support, especially to carry
out some of the more routine practices. For large projects, it's better to
hire a "code librarian" to implement and maintain the configuration management
system and a dedicated "tester" to design, implement and run regression test
suites than just telling the team to do it. Without additional resources, the
team will have to drop other tasks to complete newly assigned software quality
jobs. The practices that technical software development groups have found
useful include configuration management, requirements definition, sound
software project management, regression testing, adequate documentation,
design and code reviews, etc.
A lot of technical software is developed for various government agencies. The
contracting officers for these agencies often aren't very knowledgeable about
the challenges of developing large, technical software projects. They are,
however, accountable for delivery of the programs and projects they sponsor.
Large technical software projects have substantial risks. The record indicates
that they are often behind schedule, over-budget, don't deliver exactly what
was promised, and even fail entirely. To succeed, sponsors have to hold the
code development organizations accountable. It is therefore very tempting for
the government agencies to require that the organizations they sponsor follow
a "process" model like the Capability Maturity Model (CMM) developed by the
Software Engineering Institute at Carnegie-Mellon University for the Air
Force(Paulk, 1994). After all, there is a lot of data that indicates that code
development organizations that follow the CMM processes produce "better" code,
meet milestones, etc., and, in the end, who can be against quality? This kind
of quality for scientific software, however, comes with a severe price. A
detailed analysis of the CMM processes indicates that it works well for
software that must have no bugs (e.g. the airplane control software mentioned
above). Implementing the CMM process, however, takes a lot of time. History
shows that several years are required for each step to move from one CMM level
to the next. There are five distinct CMM Levels. In addition it requires a lot
of additional resources. The major problem with applying the full CMM to the
development of scientific software is that the strong emphasis on avoiding and
reducing bugs and defects adds a lot of viscosity to the development process.
Computational science has different goals and requirements from aircraft
control. It is much more important that the physics or chemistry be right and
that the solution algorithms be right than that every last bug be eliminated.
Developing the right physics or chemistry package usually takes a lot of
experimentation and creativity. It is impossible to plan every detailed facet
of a large complex code with scientific and mathematical challenges. The code
development team must be very creative. It must develop and test many new
algorithms and models to find ones that work. A rigid code development process
impedes the flexibility and creativity needed to develop new codes. This is
not only the case for scientific codes, but also for most really innovative
software development. There is a running debate on this issue in the software
literature between the "rigid process" community and the "agile software"
community. The "agile software" community stresses the importance of
innovation and the difficulty of being innovative if one is constrained by
rigid processes(Highsmith and Cockburn, 2001; Boehm, 2002; DeMarco and Boehm,
2002). The "rigid process" community stresses the importance of reduced
defects and efficient code development(Herbsleb, Zubrow et al., 1997). Both
positions have valid points, but the reality is that there is no "one size
fits all" answer. Just as there is no "one way" to do laboratory experiments
in physics, chemistry or biology, theoretical work in chemistry, physics or
biology, or engineering design and analysis, there is no "one way" to develop
technical software. There is no "fool proof" way to develop codes, or as
Frederick Brooks states: "There is no silver bullet for software development"
(Brooks, 1995). Just as in other scientific methodologies, one has to do the
intellectually hard work of examining and testing candidate practices and then
use the ones that work for the problem at hand. But this does not mean that
"any old method" is acceptable and will work. It only means that not every
development problem has the same answer. We can't be lazy. While we can't
blindly accept what people hand us, we do have to find something that works
well.
A constant theme that seems to always emerge from case studies is that good
software project management is essential. It is usually more important than
any set of externally imposed processes. It is noteworthy that the Software
Engineering Institute has recently recognized the importance of software
project management. It has developed the "Team Software Process"(Humphrey,
2001) that appears to be very similar to the software project management
methods long advocated in the general IT industry, especially in the non-
government IT industry (e.g.(Brooks, 1995; DeMarco, 1997; Remer, 2000;
Thomsett, 2002)). The SEI data shows that introducing sound software project
management achieves a greater proportional reduction in the defect rate than
moving many levels up in CMM process level.
The burden of identifying code development methods that work well falls on
every code team. As noted before, if the team doesn't find methods that work,
the sponsor will attempt to force the team to use processes and methods that
he selects on the basis of what he has been told by others. The processes he
picks likely won't be the ones that the team would pick. Developing a good set
of practices and implementing them is the beginning of a good defense against
being forced to follow externally imposed practices. The team also has to be
able to articulate their practices and be able to demonstrate to management
and, in some cases, to auditors from DoD, DOE, NASA, etc. that the team's
practices work. There is no one solution to this problem either. The team has
to work to establish credibility with its management so management will trust
the team to do things right.
While the technical software community has many unique issues, it nonetheless
can learn much from the general IT industry. The IT community has had to
address the problem of how to plan and coordinate the activities of large
numbers of programmers writing fairly complex software. I have found that few
of even the simplest well-known and proven methods for organizing and managing
code development teams and projects are being employed by the technical
software community. The most common approach seems to be to independently
rediscover the IT industry "lessons". This unfortunately leads to wasted
effort and all too often results in failure.
Several of these "lessons" are worth highlighting. We need to learn how to
develop specifications and requirements for technical projects. Most technical
projects start out with a vision of what the code team leaders want to
accomplish. Unfortunately, the leaders don't develop requirements and
specifications at the level of detail that the other members of a large team
can follow to produce an integrated code at the end. There is relatively
little planning and almost no estimation of resources and schedule. This often
leads to overly ambitious goals and unrealistic schedules, missed milestones,
and sometimes to project failure. While good estimation is hard, one commonly
recommended technique is to develop a prototype that requires 5 to 10% of the
full project resources, and use it for estimation(McConnell, 1997). Another
technique is to look at similar projects and scale from them. In fact, most
technical code projects don't appear to follow very many of the "lessons
learned" from the ASCI code projects listed in Table 1.
A final issue is the need to balance the requirement to improve the computer
science techniques and methodologies used for code development while using
conservative and reliable practices for the development of essential
applications. A good example is the effort to develop the Common Components
Architecture as a way to standardize component development. In principle, this
is a great idea. If one develops a module, it would be wonderful if it could
be used in many applications. The core of the idea is to develop interface
specifications for modules. Where this is possible, it should greatly help
code development. Unfortunately, every module has a different purpose, and
usually requires different interfaces for each technical problem. The hard
part is to define the specific interfaces and it's not clear that this can be
done in a general way. It's difficult to see how a computer scientist will be
able to anticipate what interfaces are needed so that a module that calculates
the thrust from rotor on a bacterium can be successfully integrated into a
unified calculation of a swimming bacterium. Clearly new code development
methodologies must be developed and tested on real problems. Identifying ways
to develop these new methodologies and test them without unduly impeding
application codedevelopment and greatly increasing application code
development risk will continue to be a major challenge for the computational
science community. Another challenge will be to develop appropriate metrics
for the development of technical software. Clearly conventional function
points are inadequate. Technical software has additional complexity and
challenges beyond those faced by the IT industry. Developing those metrics
should be a key goal of any "lessons learned" activity. Gathering data on code
projects will be essential. Without real data on the code development process
and the codes themselves, it will be difficult to identify what works and what
lessons can be learned and applied to other projects.
There is also a tendency toward the formation of "virtual teams", teams of
non-collocated software developers at geographically separated sites. Such
teams have the advantage of bringing varied skill sets to the project without
the need to relocate and the potential for tapping the expertise of a number
of institutions and generating the political as well as technical support of
many institutions. Collocated development teams face communication and
coordination challenges. Those problems are more severe for virtual teams, and
success will require addressing these challenges.
9. Conclusions and Path Forward
Computational science has an important role to play in society. The High
Performance Computing community is meeting "The Performance Challenge" to
provide us unprecedented power to tackle important problems. However, two
additional challenges must be met before that potential can be realized.
First, the community must be able to efficiently develop programs for the ever
more powerful and ever more complicated computer platforms-"The Programming
Challenge." Secondly, the application models must become sufficiently accurate
that they can be used for prediction with confidence "The Prediction
Challenge." To meet "The Programming Challenge," the High Performance Computer
and operations and development software community (industry, government and
academia) must develop the tools and methods to facilitate the development and
running of codes so that application codes can be developed quickly and
reliably and can be run efficiently on the High Performance Platforms. To meet
"The Prediction Challenge," the computational science community (industry,
government and academia) will need to become as mature as the theoretical and
experimental scientific and engineering design communities. The computational
science community must develop methods to ensure that the equations and models
in the codes accurately reflect the real world, that the equations and models
are solved correctly, that the applications are set up and run correctly by
knowledgeable and careful people, and that the results are interpreted
correctly. Accurate equations and correctly implemented models, as well as
efficient and economic development, require attention to the code development
process. The process must be consistent with the general "lessons learned"
discussed in the paper. One of the most important "lessons learned" is that an
intensive verification and validation program is an essential element of
ensuring that computational results are accurate. Unfortunately, not only is
the level of verification and validation usually insufficient, there is
inadequate effort devoted to developing new methodologies and concepts for
verification and validation. Much is needed and little is being done. Finally,
those developing the code and those using the code must have a deep
appreciation of the limits of the code and a deep-rooted appreciation that the
results may not be correct.
As in other methodologies, retrospective case studies of past practices are an
essential part of the path toward maturity. It is imperative that we as a
discipline continuously examine and assess our mistakes and our successes.
Without such a continuous re-assessment, we will continue to make the same
mistakes. Our field will never be able to fulfill the tremendous promise that
powerful computers give us.
Another way to look at it is as an issue of professional integrity. Unless our
field has the same level of professional integrity as other methodologies
(experiment, theory and engineering design), we will never be as credible as
the other methodologies. We will continue to hear the refrain: "Who can
believe that? It's just a code result and we know they can get anything they
want if they play with the code enough." Scientists who conduct experiments
irresponsibly find that their professional reputations are discredited quickly
and thoroughly. Who knows where Ponds or Fleischman (the "discoverers of cold
fusion) are today(Huizenga, Harris et al., 1989)? It is rare that anyone in
computational science gets even the slightest rebuke for a misleading or
incorrect result.
It is not enough for 95% of the work in computational science to be reliable,
and 5% to be wrong. Unless the outside world can tell which 5% is bogus, none
of our work will have the impact it deserves.
The DARPA High Productivity Computing Systems (HPCS) project is focusing on
reducing the time to solution for important problems by meeting the
Performance, the Programming and the Prediction challenges. It is working with
three vendors, IBM, Cray and Sun to design and build peta-flop platforms. Part
of the HPCS project is the development of benchmarks for the platforms that
are prototypical of real applications. Attention is being paid to the
development of programming models and development tools for optimizing
parallel performance. The HPCS project is sponsoring case studies of
representative computational science projects in the DoD, DOE, NASA, NOAA,
industry and academia to identify the lessons learned and document and publish
them for the benefit of the computational science community. I have outlined a
number of "lessons learned" that have already been developed from the ASCI
code projects. As we assess a wider range of projects, we will refine those
lessons and identify new ones. Adoption of these "lessons learned" by the
computational science community will help the field to mature just as the
development of "lessons learned" and their adoption has helped other fields to
mature.
10. Acknowledgements:
The author is grateful for discussions with Marv Alme, Don Burton, Bill
Carlson, Gary Carlson, John Cerutti, Linnea Cook, Larry Cox, Tom DeMarco, Tom
Gorman, Dale Henderson, Leo Kadanoff, Richard Kendall, Jeremy Kepner, Joseph
Kindel, William Krauser, Ken Koch, Steve Libby, Bob Lucas, Tom McAbee, Doug
Miller, Pat Miller, Jim Rathkopf, Don Remer, Rob Thomsett, Tim Trucano, David
Tubbs, Larry Votta, Robert Webster and Mike Zika and to the Los Alamos
National Laboratory and Department of Energy for support. The author is
especially grateful to Tim Trucano for many discussions and careful
proofreading of the paper.
Los Alamos National Laboratory, an affirmative action/equal opportunity
employer, is operated by the University of California for the U.S. Department
of Energy under contract W-7405-ENG-36. By acceptance of this article, the
publisher recognizes that the U.S. Government retains a nonexclusive, royalty-
free license to publish or reproduce the published form of this contribution,
or to allow others to do so, for U.S. Government purposes. Los Alamos National
Laboratory requests that the publisher identify this article as work performed
under the auspices of the U.S. Department of Energy. Los Alamos National
Laboratory strongly supports academic freedom and a researcher's right to
publish; as an institution, however, the Laboratory does not endorse the
viewpoint of a publication or guarantee its technical correctness.
References
1. Frank, M.P., The Physical Limits of Computing. Computing in Science and
Engineering, 2002. 4(3): p. 16-26.
2. Laughlin, R., The Physical Basis of Computability. Computing in Science and
Engineering, 2002. 4(3): p. 27-30.
3. Petroski, H., Design Paradigms: Case Histories of Error and Judgement in
Engineering. 1994, New York: Cambridge University Press. 221.
4. Gehman, H.W., et al., Report of the Columbia Accident Investigation Board.
2003, National Aeronautics and Space Administration: Washington, DC. p. 248.
5. Hallquist, J.O. Current and Future Developments of LS-DYNA-1. in 4th
European LS-DYNA Conference. 2003. ULM, Germany: Livermore Software Technology
Corporation.
6. Taleyarkhan, R.P., et al., Evidence for Nuclear Emissions During Acoustic
Cavitation. Science, 2002. 295(1): p. 1868-1873.
7. Shapira, D. and M. Saltmarsh, Nuclear Fusion in Collapsing Bubbles-Is It
There? An Attempt to Repeat the Observation of Nuclear Emissions from
Sonoluminescence. Physical Review Letters, 2002. 89(10): p. 104302-104305.
8. Post, D. and R. Kendall. Lessons Learned From ASCI. in DOE Software Quality
Forum 2003. 2003. Washington, DC: Los Alamos National Laboratory.
9. Thomsett, R., Radical Project Management. 2002, Upper Saddle River, NJ:
Prentice Hall.
10. DeMarco, T., The Deadline. 1997, New York, New York: Dorset House
Publishing. 310.
11. Beck, K., Extreme Programming Explained. 2000, Boston: Addison Wesley.
12. Remer, D. Managing Software Projects. in UCLA Technical Management
Institute. 2000. Los Angeles, CA: UCLA Extension Courses.
13. Vliet, H.v., Software Engineering, Principles and Practice. 2000,
Chichester: John Wiley and Sons, Ltd. 726.
14. Brooks, F., The Mythical Man-Month: Essays on Software Engineering,
Anniversary Edition. 1995, Menlo Park: Addision-Wesley Publishing Co. 322.
15. Verzuh, E., The Fast forward MBA in Project Management. 1999: John Wiley.
16. Ruskin, A.M. and W.E. Estes, What Every Engineer Should Know About Project
Management. 2 ed. What Every Engineer Should Know, ed. W. H.Middendorf. Vol.
33. 1995, New York: Marcel Dekker, Inc. 274.
17. DeMarco, T. and T. Lister, Waltzing with Bears, Managing Risk on Software
Projects. 2003, New York, New York: Dorset House Publishing. 196.
18. Glass, R.L., Software Runaways: Monumental Software Disasters. 1998, New
York: Prentice Hall PTR. 288.
19. Demarco, T. and T. Lister, Risk Management for Software. 2002, The
Cutter
20. Capers-Jones, T., Estimating Software Costs. 1998, New York: McGraw- Hill.
21. Yourdon, E., Death March. 1997, Upper Saddle River, NJ: Prentice Hall PTR.
|