HPCwire
 The global publication of record for High Performance Computing / April 2, 2004: Vol. 13, No. 13

  |  Table of Contents  |  

Features:

EXPERT OPINION: THE COMING CRISIS IN COMPUTATIONAL SCI (III)
by Douglass Post

In response to our March 12 column by the High End Crusader, HPCwire article #107185, US FUNDING PRIORITIES AND ROADMAPS FOR PETAFLOPS, Dr. D. E. Post has allowed us to publish a relevant paper he presented as part of the proceedings of the IEEE International Conference on High Performance Computer Architecture: Workshop on Productivity and Performance in High-End Computing, Madrid, Spain, February 14, 2004, Los Alamos Report LA-UR-04-0388

Below is the final installment of Douglass Post's commentary on the future of computational science in the U.S. For Part I and II, see articles #107234 and #107294 [http://www.tgc.com/hpcwire/backissues/]. Dr. Post works with the Physics Division at Los Alamos National Labs. The rest of his workshop papers can be found at http://www.research.ibm.com/arl/pphec/.


6. Quantitative Estimation

These "lessons learned" were based on a qualitative and a quantitative analysis of the history of the different ASCI code projects and comparison with the Information Technology industry and conventional project management and scientific research. The quantitative analysis was a key element in establishing that the ASCI code projects had not been given a consistent set of requirements, resources and schedules. While our analysis (Post and Kendall, 2003) was relatively simple compared to the methods often employed in the Information Technology (IT) community(Capers-Jones, 1998), the conclusions are very clear. We found that the key predictor of success was the age of the code project and the amount of time allocated to complete the project and meet milestones. Our analysis of the historical data indicated that it takes about 8 years to develop an ASCI weapons code. The projects that had 8 years of development often succeeded, and all those that did not have 8 years of development time failed to meet their initial milestones. This result emphasized the crucial need to get the requirements rights then to allocate sufficient resources and time (i.e. schedule) to meet those requirements.

The case studies included metrics (code size, team size, age, etc.). To see if the ASCI experience was consistent with the Information Technology (IT) community experience, we analyzed the case studies using a generic "function point" model (Capers-Jones, 1998)widely used by the IT industry. We calibrated this model for scientific code projects using the ASCI case study data. Function points are a weighted total of inputs, outputs, inquiries, logical files and interfaces(Symons, 1988; Capers-Jones, 1998). Functions points specifically developed for technical software (computational science software) do not yet exist. IT function point measures do exist and were something we could use to make the present argument.

Equation 1 FP = C++ SLOC/53 + C SLOC/128 + F77 SLOC/107

Equation 2 Schedule (months) = FP^x where 0.4 < x < 0.5: use 0.47

Equation 3 Team size = FP / 150

Equation 4 Schedule = Contingency x Function Point Schedule + Delays

Equation 5 Team Size = 3 + 0.6 * FP/150

We first converted the single lines of code to Function Points (FP)(e.g. eq. 1). T. Capers Jones lists the equivalent single lines of code (SLOC) per function point (FP) for the common computer languages (Capers-Jones, 1998) since computer languages have different information densities.

In this model, the required schedule and average team size are determined by the Function Point (FP) count (eqs. 2,3). We calibrated and modified these general scalings to account for the added complexity and viscosity associated with developing scientific codes specifically for the nuclear weapons complex. We increased the schedule by 1.5 years to account for the additional time it takes to recruit, hire, train and get security clearances for code development staff. Using a methodology developed by the Lawrence Livermore National Laboratory Engineering Department(Remer, 2000), we calculated a contingency factor of 1.6 to account for the additional risks, uncertainties, complexities, etc. for the LANL and LLNL computing environments (eq. 4). We modified the standard FP scaling for the size of the code team (eq. 5) (Capers-Jones, 1998)to match the ASCI data. This included a correction for small code teams.

We analyzed seven code projects, three at LLNL and four at LANL (Table 2). For national security classification reasons, we have identified the LLNL codes with the letters A, B and C. Table 2 lists the size of the code in function points, the time estimated by equation 4 to develop the initial capability of the code project, the actual age of the code at the point it was expected to accomplish its first milestone, whether or not the project succeeded, the optimal code team size estimated from equation 3 and the actual size of the team. The sizes of the codes (e.g. lines of code, loc) were approximate estimates by the code teams. Establishing the size of the code teams was challenging. In general, good records were not available. Thus the code team sizes were generally estimated by the code team leaders. Because good records were not kept, it was also difficult to account for staff who worked on the code project but were part of other organizations. More than one-half of the Blanca code project team, for instance, was part of other organizations. Where this was an issue, we used conservative estimates. For example, the Blanca code project staff probably had a staffing level of about 50 people for the first 4 or 5 years of its life instead of the 35 we assumed. We used a smaller number based on the actual number of people we could definitely identify as having worked on the project.

The case histories and the estimation procedures indicate that it generally takes a minimum of 8 years for a code team to develop an initial capability for a weapons code project. The requirements for a weapons code are determined by the physics necessary to simulate a nuclear weapon. LANL and LLNL have over 50 years of experience in this area, and know these requirements in detail. Weapons code projects require between 3000 and 6000 function points (Table 2).

Some of the ASCI codes were started before ASCI began in 1996 (ASCI B, Legacy A for LLNL, and the LANL Crestone code project). ASCI B was started roughly in 1992 and had a working prototype in 1994. The Crestone code project was started before 1992. ASCI A and the Shavano and Antero code projects were started around early 1997. Legacy A was started over 30 years ago and was included for comparison and normalization. Since we are able to match the history of weapons codes with scalings derived from the experience of the commercial software industry, we conclude that the constraints, computer science practices and management issues that generally apply to the IT industry generally apply to the development of weapons codes as well (i.e. there is no "Silver Bullet" that can radically reduce the development time(Brooks, 1987)).

Table 2

Software Resource Estimates for the LLNL and LANL Code Projects

                        ----------LLNL-----------  --------------LANL---------------

                        ASCI A   ASCI B  Legacy A  Antero  Shavano   Blanca Crestone

Single Lines of Code    184000   490000   410550   300000   500000   200000   314000

Function Points (Eq.1)    4800     4000     5400     2900     4800     3800     2900

estimated schedule(Eq.4)   8.7      7.6      6.9      6.6      8.1      7.4      6.7

Pjt age (1st milestone)      3        9      N/A        4      3.5        8        8

Successful in achieving
initial ASCI milestone      No      Yes      N/A       No       No       No      Yes

Est. staff reqts. (Eq.3)    22       27       24       14       22       18       14

real team size              20       22        8       17        8       35       12

We found that the dominant factor for success is the age of the code project. The code projects that did not have sufficient time (8 years) to complete their projects failed to meet their milestones. All but one of the code projects that had 8 years succeeded in meeting their milestones. This is clear evidence that schedules and requirements must be consistent. The schedule cannot be fixed independently of the requirements, a fact long appreciated by the IT industry(DeMarco, 1997; Capers-Jones, 1998) but not adequately taken into account in the early planning for ASCI. The ASCI program set the milestone for demonstrating the capability of each code project to be three and a half years (December 1999) after the beginning of ASCI (~mid 1996) and three years after the date that many of the code projects were launched (~January 1997).

Adequate development time is necessary-but not sufficient-for success. Several code projects failed in spite of having adequate time. Poor practices and inadequate support-implicitly included in the contingency factor-hurt many of the projects as well. The Blanca code project failed to meet its milestones even with adequate time and ample resources.

Another point is that it is clear from the function point scaling relations (eqs. 1-5) that the code requirements determine both the schedule and resources needed for success. This estimating analysis indicates the importance of a realistic set of requirements, schedule and resources. Without them, projects will fail and the needed applications will not be developed.

These case studies helped persuade the ASCI senior management that the "younger" code teams (those started less than 8 years before the milestone) were not necessarily incompetent, but were just unable to do 8 years of work in less than 4 years. The management was then able to recognize that several (but not all) of these "younger" projects were actually making very good progress compared to "normal" code development rates and had very high potential for producing successful codes that would give the ASCI program substantially improved tools. Partly motivated by the case studies, the ASCI management then developed a more realistic schedule for code development, placed more emphasis on the needs of the users and provided better support for the code teams.

Three issues identified as "lessons learned" are expanded on in the following two sections: verification and validation and software quality. Both areas are crucial for success for technical software projects, and have special-and often not well understood-requirements.

7. Verification and Validation

An application code typically solves a model problem that is only an abstraction of reality. Many things can limit the validity of a code calculation. The models and solution algorithms may be implemented incorrectly. The models may not accurately reflect the phenomena of interest (Roache, 1998; Oberkampf and Trucano, 2002). Verification is the determination that the code solves the model correctly. Validation is the determination that the models in the code capture, with adequate fidelity, the phenomena of interest. Both are essential elements of a program to develop and apply application codes to problems of interest (Roache, 1998). Without adequate verification and validation, there is no reason to believe any part of a code result. Unfortunately, for much of computational science, verification and validation efforts fall far short of what is needed.

Both verification and validation become more difficult as codes become more complicated and their applications more important. A typical application might have many different components. A sophisticated climate modeling code might include models for ocean evaporation, ocean currents, ocean salinity, atmospheric flow, clouds, precipitation, CO2 sequestration, radiation transport, atmospheric chemistry, ground water flow, vegetation growth, ice formation, etc. The code might predict many observables, such as average surface temperature, precipitation levels, etc. The accuracy of these observables depends on the accuracy of each component model, the completeness of the set of all the models (i.e. does the code treat all of the important phenomena), the accuracy of the solution method for the model including its interaction with the other models, the physical data used in the models, the adequacy of the problem generation and the ability of the user to correctly set up the problem, run it and interpret the results. Verifying and validating all of these is a major challenge.

The accuracy of the multi-model code depends first on the accuracy of each component, as well as the accuracy of their interactions. In practice, first one has to verify each component, then validate each component for the relevant regimes, then verify and validate progressively larger collections of interacting components, until the entire integrated code has been "verified" and "validated" for the problem regimes of interest.

There are at least four common verification techniques, all with serious shortcomings:

1. Comparison of the code results with the analytic results for a problem with an exact answer,

2. Establishing that the convergence rate of the truncation error is consistent with the expected convergence rate, and

3. Comparison of the observed results with the expected results for a problem specially manufactured to test the model (or models)(Boehm, 2002),

4. Computation and monitoring "conserved" quantities and parameters that should be constant or are predictable.

The first method is worthwhile, but extremely limited in practice. There are usually few (if any) relevant problems with exact answers, especially with realistic boundary conditions, realistic geometries, realistic data, non- linear conditions, or multiple-component systems. The computational fluid dynamics community widely uses the convergence rate of the truncation error to verify programs(Roache, 1998; Hallquist, 2003). This technique, too, is limited in applicability. It works best when the expected truncation rate can be determined from the basic difference equations and boundary conditions. That is often not possible. Convergence rates often aren't useful to check two or more interacting modules. The third technique, the Method of Manufactured Solutions, is, in principle, very powerful(Roache, 1998; Pautz, 2001; Roache, 2002). It works for almost arbitrarily complicated and strongly coupled models, and almost arbitrarily complicated boundary conditions. However, problems with real data, moving or adaptive meshes, non-analytic (and non- differential) terms and real physical data are difficult to treat. These challenges, as well as the complexity of implementing the manufactured solutions, seem to prevent its wide-spread use. A fourth technique is monitoring behavior the developer know has to be correct, such as "conserved" quantities (e.g. total energy, momentum, mass, etc.), quantities whose evolution can be estimated (e.g. entropy) to check the accuracy of individual components and of the whole code, or procedures that can be predicted (e.g. procedural behaviors designed into the code). Yet, in spite of all these limitations, verification must be done as thoroughly as possible. If a code isn't solving the models correctly, then the answers are worthless. Any correspondence of the answers with reality is completely fortuitous. Verification needs to be performed every time the code or operating system (compilers, etc.) changes. A code has to be verified before it can be validated. Validating an unverified code is generally a waste of time. Given the deficiencies of existing practices, better verification techniques are desperately needed. Comparing the results of a problem for two different codes (Benchmarking) can increase the likelihood of catching errors, but only to a limited degree. Both codes could be wrong. Two codes usually have different ways of solving a problem and sorting those effects can be time-consuming and potentially impossible. Benchmarking is worthwhile because it can catch errors, but it isn't a substitute for a mathematically rigorous verification procedure.

As a practical matter, diligent code developers do as much verification as they judge feasible, and then keep their eyes open for suspicious behavior by the code. However, this is far from a guarantee that the code is free of errors. Also, not all code developers (and users) are sufficiently diligent or knowledgeable.

Once a code has been verified as much as possible, the code must be validated for the problem regimes of interest. A code is never a valid tool for all conceivable problems. It can only be validated for specific regimes, and the validity in adjacent regimes estimated. The entire calculational system including the user, computer system, problem set-up, problem running and results analysis for each user and computer system must be validated because all elements are important. An inexperienced or non-expert user can easily get wrong answers using a good code in a validated regime.

Validation has a number of challenges. Each individual component and all important combinations of the components must be validated. Validation data and experiments have a variety of forms (Table 3).

Table 3

Four types of experiments used to validate codes:

1. Passive observations of physical events (e.g. supernovae explosions or the weather),

2. Experiments designed to certify a physical component or physical system (tests of an engineering component such a scaled airplane wing, car crash, etc.),

3. Experiments designed to elucidate a general physics or engineering principle or law (e.g. wind tunnel studies of turbulent eddies around airfoils), and

4. Experiments specifically designed to validate a code application (e.g. wind tunnel tests designed to provide data to validate a code calculation).

Each type of experiment can be done before or after the code prediction has been completed and can address single-effect issues or integrated phenomena.

The best validation consists of the comparison of predictions made before an experiment with data from experiments designed specifically for validation. Successful prediction of experimental results is a better test than successful reproduction of existing experiments. Since few codes have no uncertainties, "tuning" a code for an application is usually necessary to get reasonable answers. The experienced user has learned how to set up an appropriately zoned mesh, how to vary the physical data within the known uncertainties to get reasonable answers, which effects are essential for the application and which are inappropriate, how to interpret the results, when the code has is outside the region of validity, etc. With this freedom, it is thus often feasible to tune a code to match many of the salient points of an existing experiment. It is a much more rigorous test of the code application to predict experimental results before the experiment has been conducted. This is also prototypical of the purpose of the code and computer system, i.e. to make accurate predictions of unknown events using known data before the events occur. An additional benefit of the validation process is that it trains the users how to use the code to get reasonable results. The entire calculational system needs to be validated (code, user, computer system). As we have seen, an inexperienced user can get the wrong result.

For many applications, controlled experiments are not feasible or are impractical. For them validation is especially challenging. Models of astrophysical and large-scale geophysical phenomena (weather, climate, volcanoes, asteroid impact, watersheds, etc.) and large scale economic and political systems, must rely on historical data and current observations. We will not be able to conduct controlled supernovae explosions in the near future or schedule earthquakes, volcanoes or asteroid impacts. For these phenomena, the best that can be done is to collect as extensive sets of data as possible, especially data that is fundamental to the correctness of the code. For these systems it is often not possible to get data for all conditions, a complete time history, adequately resolved data, and data for many of the quantities of interest.

However, many, if not most, applications can be validated with data from controlled experiments. Key issues include adequate coverage in space and time of the appropriate experimental initial conditions and the behavior of the important variables. An accurate description of the initial and boundary conditions is essential.

The types of experiments used for validation listed in Table 3 are also listed in order of their utility for validation. Aeronautical Computational Fluid Dynamics (CFD) codes were first validated using wind tunnel tests of scaled aircraft parts (Experiment type 2, Table 3). The object of the experiment was to test the aircraft part. The use of the data for validation was largely incidental and occurred after the experiment. Most of experiments of type 2 were integral, in that they gave data that reflected the behavior of the trade-offs of a number of competing effects. Code developers recognized that data for specific effects was needed to validate each component in their codes. They therefore used data from single effect experiments designed to study a single, isolated phenomenon. Such data might be yield strength data for metal components, thermal conductivity measurements, etc. Again, validation of a code was usually not the primary purpose of the experiment, although such experiments were often cheap enough that they could have been used for explicit validation experiments. The fourth type of experiments are those designed explicitly for code validation. The purpose of those experiments is to test the models in the code. The code is often used to design the experiment. Some of these points are illustrated in Figure 4. An airfoil moving through the air sees a plane front of air rushing toward it. Fifty years ago, wind tunnels were used to faithfully reproduce the plane air front conditions to test aircraft components. Achievement of a plane front required a large wind tunnel to minimize the effects of drag by the wall. Now, much smaller wind tunnels are used to validate the codes that are used to design airfoils. Once the requirement for a planar air front was removed, a much smaller and cheaper wind tunnel could be used. The validation wind tunnel facility can also have shorter set-up and experimental turnaround times and be more easily and thoroughly diagnosed. The idea is to test the code, not the component. A final test of the component may be advisable, but a CFD code validated for the appropriate conditions can be used for many of the design studies, especially if the final results of a computational optimization study are checked experimentally.

In fact, data from experiments not designed for validation can sometimes be misleading or inaccurate for validation. The experiment may have been designed to measure a particular effect. The data for other effects may not have been checked sufficiently and may be inaccurate, misleading or wrong. As noted in the sonoluminescence example earlier, codes can be, and have been, forced to match incorrect experimental data.

A paradigm shift with regard to the value and importance of validation experiments is needed in the experimental community. Experimentalists and funding agencies understand the value of experiments designed to explore new scientific phenomena, test theories or certify and test the performance of a design component. Few appreciate the value of experiments explicitly conducted solely for the purpose of code validation. There generally exist no mechanisms to get validation experiments funded even if experimentalists are interested.

Finally, since the value of verification and validation is to ensure that the code can give accurate predictions for the phenomena of interest, a written record of the verification and validation of the code is extremely important. That record is necessary to establish the credibility of the code predictions with the code project sponsors and customers. In fact, validation needs to be organized like a project, with goals and requirements, a plan, resources, a schedule, and deliverables including a documented record of the validation project.

Few existing computational science projects practice systematic verification or validation. Almost none have dedicated experimental validation programs with dedicated validation experiments. Yet, without such programs, computational science will never achieve credibility.

8. Software Quality and Software Project Management

Software quality and software project management are very important issues. Improvements in quality offer the promise of greater longevity and easier maintenance. Attention to quality will likely improve the code. Inattention to quality will almost certainly contribute to poor quality (high defect rates, and code that is hard to maintain and upgrade). It can also leave the code project vulnerable to the Software Quality Assurance (SQA) mafia. If poor quality becomes an issue, the sponsors and customers will take action. The DoD and other sponsors have developed fairly rigid processes for code development and software quality assurance in response to disasters caused by buggy aircraft and satellite control software. Bugs in aircraft control software can cause airplane crashes. To reduce the defect rate, the Air Force established a very rigorous procedure for vendors to follow to develop such software(Paulk, 1994).

Similarly, sound software project management can do a lot to speed code development, increase the likelihood of a successful product and minimize the defect rate.

Quality was an issue for the US automobile industry in the 1970's and 1980's(Halberstam, 1986). The American automobile industry produced poor quality cars that people didn't buy. The Japanese built high quality cars that people did buy. A basic difference was that the US automobile industry did not emphasize quality on the assembly line and in the externally supplied components. They mostly tested the cars after they came off the assembly line and tried to fix the worst ones. The Japanese, on the other hand, emphasized quality at every step of the assembly process and for components. They tested the cars at many points along the assembly line and tested components before installation. The result was that Japanese cars had much higher quality, and the American automobile industry lost many customers.

Similarly, software quality engineering is most effective when it is applied at each step of the software development process. This is much better than the all too common practice of waiting until the code is nearly complete to begin testing the code. However, just as on the assembly line, different development processes require different methods. No one size fits all. Also, just as the Japanese auto makers emphasized input from the assembly line workers, the code developers themselves are often the best judges of how to implement quality. A process rigidly imposed by senior management will likely get the same type of token compliance observed in the US auto industry.

Quality assurance for technical software has an important sociological dimension. Technical software is developed by teams of scientists and engineers. Scientists and engineers are trained to question everything, and accept nothing purely on the basis of authority. After all, even though he might want to, your boss can't change the laws of nature-and that's what you are trying to model. In fact, that's why we hire scientists to develop scientific software. The models in the codes have to be right. If the models don't reflect reality, the code results are worthless. We will then make decisions that will be wrong, often with tragic consequences. Giving scientists a "bible" that describes an elaborate, rigid process for developing software, but which provides little in the way of justification is counter- productive. It seems to be more successful to work with each team to identify the "practices" that add value to the scientific code development process, and encourage the teams to implement the practices they helped to identify (Phillips, 1997). It's also necessary to provide support, especially to carry out some of the more routine practices. For large projects, it's better to hire a "code librarian" to implement and maintain the configuration management system and a dedicated "tester" to design, implement and run regression test suites than just telling the team to do it. Without additional resources, the team will have to drop other tasks to complete newly assigned software quality jobs. The practices that technical software development groups have found useful include configuration management, requirements definition, sound software project management, regression testing, adequate documentation, design and code reviews, etc.

A lot of technical software is developed for various government agencies. The contracting officers for these agencies often aren't very knowledgeable about the challenges of developing large, technical software projects. They are, however, accountable for delivery of the programs and projects they sponsor. Large technical software projects have substantial risks. The record indicates that they are often behind schedule, over-budget, don't deliver exactly what was promised, and even fail entirely. To succeed, sponsors have to hold the code development organizations accountable. It is therefore very tempting for the government agencies to require that the organizations they sponsor follow a "process" model like the Capability Maturity Model (CMM) developed by the Software Engineering Institute at Carnegie-Mellon University for the Air Force(Paulk, 1994). After all, there is a lot of data that indicates that code development organizations that follow the CMM processes produce "better" code, meet milestones, etc., and, in the end, who can be against quality? This kind of quality for scientific software, however, comes with a severe price. A detailed analysis of the CMM processes indicates that it works well for software that must have no bugs (e.g. the airplane control software mentioned above). Implementing the CMM process, however, takes a lot of time. History shows that several years are required for each step to move from one CMM level to the next. There are five distinct CMM Levels. In addition it requires a lot of additional resources. The major problem with applying the full CMM to the development of scientific software is that the strong emphasis on avoiding and reducing bugs and defects adds a lot of viscosity to the development process.

Computational science has different goals and requirements from aircraft control. It is much more important that the physics or chemistry be right and that the solution algorithms be right than that every last bug be eliminated. Developing the right physics or chemistry package usually takes a lot of experimentation and creativity. It is impossible to plan every detailed facet of a large complex code with scientific and mathematical challenges. The code development team must be very creative. It must develop and test many new algorithms and models to find ones that work. A rigid code development process impedes the flexibility and creativity needed to develop new codes. This is not only the case for scientific codes, but also for most really innovative software development. There is a running debate on this issue in the software literature between the "rigid process" community and the "agile software" community. The "agile software" community stresses the importance of innovation and the difficulty of being innovative if one is constrained by rigid processes(Highsmith and Cockburn, 2001; Boehm, 2002; DeMarco and Boehm, 2002). The "rigid process" community stresses the importance of reduced defects and efficient code development(Herbsleb, Zubrow et al., 1997). Both positions have valid points, but the reality is that there is no "one size fits all" answer. Just as there is no "one way" to do laboratory experiments in physics, chemistry or biology, theoretical work in chemistry, physics or biology, or engineering design and analysis, there is no "one way" to develop technical software. There is no "fool proof" way to develop codes, or as Frederick Brooks states: "There is no silver bullet for software development" (Brooks, 1995). Just as in other scientific methodologies, one has to do the intellectually hard work of examining and testing candidate practices and then use the ones that work for the problem at hand. But this does not mean that "any old method" is acceptable and will work. It only means that not every development problem has the same answer. We can't be lazy. While we can't blindly accept what people hand us, we do have to find something that works well.

A constant theme that seems to always emerge from case studies is that good software project management is essential. It is usually more important than any set of externally imposed processes. It is noteworthy that the Software Engineering Institute has recently recognized the importance of software project management. It has developed the "Team Software Process"(Humphrey, 2001) that appears to be very similar to the software project management methods long advocated in the general IT industry, especially in the non- government IT industry (e.g.(Brooks, 1995; DeMarco, 1997; Remer, 2000; Thomsett, 2002)). The SEI data shows that introducing sound software project management achieves a greater proportional reduction in the defect rate than moving many levels up in CMM process level.

The burden of identifying code development methods that work well falls on every code team. As noted before, if the team doesn't find methods that work, the sponsor will attempt to force the team to use processes and methods that he selects on the basis of what he has been told by others. The processes he picks likely won't be the ones that the team would pick. Developing a good set of practices and implementing them is the beginning of a good defense against being forced to follow externally imposed practices. The team also has to be able to articulate their practices and be able to demonstrate to management and, in some cases, to auditors from DoD, DOE, NASA, etc. that the team's practices work. There is no one solution to this problem either. The team has to work to establish credibility with its management so management will trust the team to do things right.

While the technical software community has many unique issues, it nonetheless can learn much from the general IT industry. The IT community has had to address the problem of how to plan and coordinate the activities of large numbers of programmers writing fairly complex software. I have found that few of even the simplest well-known and proven methods for organizing and managing code development teams and projects are being employed by the technical software community. The most common approach seems to be to independently rediscover the IT industry "lessons". This unfortunately leads to wasted effort and all too often results in failure.

Several of these "lessons" are worth highlighting. We need to learn how to develop specifications and requirements for technical projects. Most technical projects start out with a vision of what the code team leaders want to accomplish. Unfortunately, the leaders don't develop requirements and specifications at the level of detail that the other members of a large team can follow to produce an integrated code at the end. There is relatively little planning and almost no estimation of resources and schedule. This often leads to overly ambitious goals and unrealistic schedules, missed milestones, and sometimes to project failure. While good estimation is hard, one commonly recommended technique is to develop a prototype that requires 5 to 10% of the full project resources, and use it for estimation(McConnell, 1997). Another technique is to look at similar projects and scale from them. In fact, most technical code projects don't appear to follow very many of the "lessons learned" from the ASCI code projects listed in Table 1.

A final issue is the need to balance the requirement to improve the computer science techniques and methodologies used for code development while using conservative and reliable practices for the development of essential applications. A good example is the effort to develop the Common Components Architecture as a way to standardize component development. In principle, this is a great idea. If one develops a module, it would be wonderful if it could be used in many applications. The core of the idea is to develop interface specifications for modules. Where this is possible, it should greatly help code development. Unfortunately, every module has a different purpose, and usually requires different interfaces for each technical problem. The hard part is to define the specific interfaces and it's not clear that this can be done in a general way. It's difficult to see how a computer scientist will be able to anticipate what interfaces are needed so that a module that calculates the thrust from rotor on a bacterium can be successfully integrated into a unified calculation of a swimming bacterium. Clearly new code development methodologies must be developed and tested on real problems. Identifying ways to develop these new methodologies and test them without unduly impeding application codedevelopment and greatly increasing application code development risk will continue to be a major challenge for the computational science community. Another challenge will be to develop appropriate metrics for the development of technical software. Clearly conventional function points are inadequate. Technical software has additional complexity and challenges beyond those faced by the IT industry. Developing those metrics should be a key goal of any "lessons learned" activity. Gathering data on code projects will be essential. Without real data on the code development process and the codes themselves, it will be difficult to identify what works and what lessons can be learned and applied to other projects.

There is also a tendency toward the formation of "virtual teams", teams of non-collocated software developers at geographically separated sites. Such teams have the advantage of bringing varied skill sets to the project without the need to relocate and the potential for tapping the expertise of a number of institutions and generating the political as well as technical support of many institutions. Collocated development teams face communication and coordination challenges. Those problems are more severe for virtual teams, and success will require addressing these challenges.

9. Conclusions and Path Forward

Computational science has an important role to play in society. The High Performance Computing community is meeting "The Performance Challenge" to provide us unprecedented power to tackle important problems. However, two additional challenges must be met before that potential can be realized. First, the community must be able to efficiently develop programs for the ever more powerful and ever more complicated computer platforms-"The Programming Challenge." Secondly, the application models must become sufficiently accurate that they can be used for prediction with confidence "The Prediction Challenge." To meet "The Programming Challenge," the High Performance Computer and operations and development software community (industry, government and academia) must develop the tools and methods to facilitate the development and running of codes so that application codes can be developed quickly and reliably and can be run efficiently on the High Performance Platforms. To meet "The Prediction Challenge," the computational science community (industry, government and academia) will need to become as mature as the theoretical and experimental scientific and engineering design communities. The computational science community must develop methods to ensure that the equations and models in the codes accurately reflect the real world, that the equations and models are solved correctly, that the applications are set up and run correctly by knowledgeable and careful people, and that the results are interpreted correctly. Accurate equations and correctly implemented models, as well as efficient and economic development, require attention to the code development process. The process must be consistent with the general "lessons learned" discussed in the paper. One of the most important "lessons learned" is that an intensive verification and validation program is an essential element of ensuring that computational results are accurate. Unfortunately, not only is the level of verification and validation usually insufficient, there is inadequate effort devoted to developing new methodologies and concepts for verification and validation. Much is needed and little is being done. Finally, those developing the code and those using the code must have a deep appreciation of the limits of the code and a deep-rooted appreciation that the results may not be correct.

As in other methodologies, retrospective case studies of past practices are an essential part of the path toward maturity. It is imperative that we as a discipline continuously examine and assess our mistakes and our successes. Without such a continuous re-assessment, we will continue to make the same mistakes. Our field will never be able to fulfill the tremendous promise that powerful computers give us.

Another way to look at it is as an issue of professional integrity. Unless our field has the same level of professional integrity as other methodologies (experiment, theory and engineering design), we will never be as credible as the other methodologies. We will continue to hear the refrain: "Who can believe that? It's just a code result and we know they can get anything they want if they play with the code enough." Scientists who conduct experiments irresponsibly find that their professional reputations are discredited quickly and thoroughly. Who knows where Ponds or Fleischman (the "discoverers of cold fusion) are today(Huizenga, Harris et al., 1989)? It is rare that anyone in computational science gets even the slightest rebuke for a misleading or incorrect result.

It is not enough for 95% of the work in computational science to be reliable, and 5% to be wrong. Unless the outside world can tell which 5% is bogus, none of our work will have the impact it deserves.

The DARPA High Productivity Computing Systems (HPCS) project is focusing on reducing the time to solution for important problems by meeting the Performance, the Programming and the Prediction challenges. It is working with three vendors, IBM, Cray and Sun to design and build peta-flop platforms. Part of the HPCS project is the development of benchmarks for the platforms that are prototypical of real applications. Attention is being paid to the development of programming models and development tools for optimizing parallel performance. The HPCS project is sponsoring case studies of representative computational science projects in the DoD, DOE, NASA, NOAA, industry and academia to identify the lessons learned and document and publish them for the benefit of the computational science community. I have outlined a number of "lessons learned" that have already been developed from the ASCI code projects. As we assess a wider range of projects, we will refine those lessons and identify new ones. Adoption of these "lessons learned" by the computational science community will help the field to mature just as the development of "lessons learned" and their adoption has helped other fields to mature.

10. Acknowledgements:

The author is grateful for discussions with Marv Alme, Don Burton, Bill Carlson, Gary Carlson, John Cerutti, Linnea Cook, Larry Cox, Tom DeMarco, Tom Gorman, Dale Henderson, Leo Kadanoff, Richard Kendall, Jeremy Kepner, Joseph Kindel, William Krauser, Ken Koch, Steve Libby, Bob Lucas, Tom McAbee, Doug Miller, Pat Miller, Jim Rathkopf, Don Remer, Rob Thomsett, Tim Trucano, David Tubbs, Larry Votta, Robert Webster and Mike Zika and to the Los Alamos National Laboratory and Department of Energy for support. The author is especially grateful to Tim Trucano for many discussions and careful proofreading of the paper.

Los Alamos National Laboratory, an affirmative action/equal opportunity employer, is operated by the University of California for the U.S. Department of Energy under contract W-7405-ENG-36. By acceptance of this article, the publisher recognizes that the U.S. Government retains a nonexclusive, royalty- free license to publish or reproduce the published form of this contribution, or to allow others to do so, for U.S. Government purposes. Los Alamos National Laboratory requests that the publisher identify this article as work performed under the auspices of the U.S. Department of Energy. Los Alamos National Laboratory strongly supports academic freedom and a researcher's right to publish; as an institution, however, the Laboratory does not endorse the viewpoint of a publication or guarantee its technical correctness.

References

1. Frank, M.P., The Physical Limits of Computing. Computing in Science and Engineering, 2002. 4(3): p. 16-26.

2. Laughlin, R., The Physical Basis of Computability. Computing in Science and Engineering, 2002. 4(3): p. 27-30.

3. Petroski, H., Design Paradigms: Case Histories of Error and Judgement in Engineering. 1994, New York: Cambridge University Press. 221.

4. Gehman, H.W., et al., Report of the Columbia Accident Investigation Board. 2003, National Aeronautics and Space Administration: Washington, DC. p. 248.

5. Hallquist, J.O. Current and Future Developments of LS-DYNA-1. in 4th European LS-DYNA Conference. 2003. ULM, Germany: Livermore Software Technology Corporation.

6. Taleyarkhan, R.P., et al., Evidence for Nuclear Emissions During Acoustic Cavitation. Science, 2002. 295(1): p. 1868-1873.

7. Shapira, D. and M. Saltmarsh, Nuclear Fusion in Collapsing Bubbles-Is It There? An Attempt to Repeat the Observation of Nuclear Emissions from Sonoluminescence. Physical Review Letters, 2002. 89(10): p. 104302-104305.

8. Post, D. and R. Kendall. Lessons Learned From ASCI. in DOE Software Quality Forum 2003. 2003. Washington, DC: Los Alamos National Laboratory.

9. Thomsett, R., Radical Project Management. 2002, Upper Saddle River, NJ: Prentice Hall.

10. DeMarco, T., The Deadline. 1997, New York, New York: Dorset House Publishing. 310.

11. Beck, K., Extreme Programming Explained. 2000, Boston: Addison Wesley.

12. Remer, D. Managing Software Projects. in UCLA Technical Management Institute. 2000. Los Angeles, CA: UCLA Extension Courses.

13. Vliet, H.v., Software Engineering, Principles and Practice. 2000, Chichester: John Wiley and Sons, Ltd. 726.

14. Brooks, F., The Mythical Man-Month: Essays on Software Engineering, Anniversary Edition. 1995, Menlo Park: Addision-Wesley Publishing Co. 322.

15. Verzuh, E., The Fast forward MBA in Project Management. 1999: John Wiley.

16. Ruskin, A.M. and W.E. Estes, What Every Engineer Should Know About Project Management. 2 ed. What Every Engineer Should Know, ed. W. H.Middendorf. Vol. 33. 1995, New York: Marcel Dekker, Inc. 274.

17. DeMarco, T. and T. Lister, Waltzing with Bears, Managing Risk on Software Projects. 2003, New York, New York: Dorset House Publishing. 196.

18. Glass, R.L., Software Runaways: Monumental Software Disasters. 1998, New York: Prentice Hall PTR. 288.

19. Demarco, T. and T. Lister, Risk Management for Software. 2002, The Cutter

20. Capers-Jones, T., Estimating Software Costs. 1998, New York: McGraw- Hill.

21. Yourdon, E., Death March. 1997, Upper Saddle River, NJ: Prentice Hall PTR.


Top of Page

  |  Table of Contents  |