HPCwire
 The global publication of record for High Performance Computing - LIVEwire Edition / November 19, 2003: Vol. 10, No. 2

  |  Table of Contents  |  

Features:

THE TRUTH ABOUT BENCHMARKS: AN INTERVIEW WITH LARRY DAVIS
By Alan Beck, Editor-in-Chief, HPCwire

HPCwire: Because HPC applications differ so widely, benchmarks have always been a controversial topic. How useful really are the two most generally- utilized benchmarks, i.e. the Linpack and the IDC Balanced Rating?

LARRY DAVIS: Let me preface my answer by saying that I have a strong personal aversion to ranking efforts (except, perhaps, in college football); I think our time is better spent developing a deeper understanding of how and why important applications perform as they do on different HPC architectures. Thus, my evaluation of the usefulness of particular benchmarks is heavily based on their contribution to that goal. The program for which I work, the DoD High Performance Computing Modernization Program, has recently established an activity to consider benchmarks from two perspectives. First, we want to measure the HPC system performance against our specific HPC workload and, second, we'd like to better understand the performance characteristics of that workload. Thus, we use application and synthetic benchmarks to help us make acquisition decisions, and we also are beginning to collect application code profiling information that will allow application code designers and users to write and use application codes more efficiently on our HPC systems.

Linpack, as used to populate the Top500, has been with us for awhile, and it has served to form the basis for a useful historical trend of HPC performance. I think that at this point most HPC experts agree that a one-dimensional measure such as Linpack is not a very good predictor of performance for most current HPC applications, and thus is no longer terribly useful for decisions such as what systems to buy and how to partition computational workload among disparate HPC resources. Linpack basically measures maximum CPU performance, and unfortunately many people have used it it in a search for a single metric to compare HPC system performance. The IDC Balanced Rating is an attempt to add additional important dimensions, such as memory bandwidth, to categorize a system's performance. This is a good idea, but again reducing HPC system performance to a single metric, even one that attempts to incorporate additional important dimensions, may be misleading. After an initial look, I have not attempted to assess its technical merits since our program has opted for a different approach to the use of benchmarking.

HPC: Do we need more HPC benchmarks or better ones -- or neither? Please elaborate on your rationale.

LD: I think we basically need more careful studies to determine a minimal spanning set of low-level HPC benchmarks that can accurately predict performance of key applications on existing and proposed HPC systems. This is exactly what we're exploring in our HPC System Performance Modeling Panel on Nov. 20 at SC2003. We proposed this panel because it was becoming obvious that several federal agencies, including DoD, NASA and DOE, had begun substantial programs to investigate this question, and key players from each of those agencies are represented on the panel. We hope to use the panel as a springboard to tighten coordination among these federal agencies and other interested organizations in the benchmarking and performance modeling area. I invite any additional organizations with an interest in benchmarking and performance modeling of HPC systems to join us in these discussions.

We all share a common goal -- the ability to predict specific application performance on current and future HPC systems. We see better benchmarks, in some form, as essential to achieving that goal. If we reach the point where we have confidence that a set of low-level benchmarks can be used to accurately predict the performance of our key applications, we will be able to greatly simplify the set of benchmarks we use for our HPC acquisitions. In addition, the application code profiling necessary to use low-level benchmarks to make accurate performance predictions also can be used to provide guidance to developers and users of that code to improve the code's performance and target the use of that code to the most productive architectures for that code.

HPC: How has the increasing popularity of Grid-based HPC affected the benchmarking picture?

LD: At this point, I don't see that benchmarking Grid-based applications has a lot of importance to the traditional HPC community. We operate a relatively small number of consolidated HPC centers, so there is little performance incentive to run applications across centers. On the other hand, we are very interested in the possibility of scheduling and queuing jobs across centers as a method to better utilize our total pool of program resources. Thus, benchmarking of cross-center schedulers and related system software (e.g. file systems) is clearly of interest. If, in the future, more applications become efficiently scalable to thousands of processors and they are latency tolerant, Grid-based performance measurements will become more important.

HPC: How should vendors tackle the issue of benchmarking?

LD: We'd like for the vendors, of course, to run anything and everything that we throw at them accurately and precisely on exactly the system configuration that they are offering us as buyers of their systems. This is really tough, however, for most vendors to accomplish, primarily because most don't have large systems to use for benchmarking, particularly of newer architectures that may not be assembled until just before delivery. I do believe that it might be worth the vendors' time to establish fairly robust benchmarking centers with not only reasonably sized equipment, but also available benchmarking experts. We will continue to work with vendors to find the correct balance between a very extensive benchmarking suite which fully covers all possible applications to be run and a minimal benchmarking suite that is easy for them to run. Part of our motivation for investigating low-level benchmarks as predictors of performance for key applications is to make it easier for vendors to easily understand our needs, measure their systems' performance, and participate in our acquisitions.

HPC: Is there anything else about this issue that our readers should understand?

LD: I think I've about covered it at a general level. I'd like to emphasize, in closing, that HPC system and application performance modeling activities benchmarking -- is needed for any organization to predict performance of an HPC system relative to its specific workload. In addition, detailed application code profiling of the workload not only identifies the most relevant benchmarks, it also provides insight for distributing workload across multiple HPC resources as well as guidance to application designers and users on how to improve efficiency of an application on specific HPC architectures. Both goals are very important to our program.


Top of Page

  |  Table of Contents  |