
Features:
THE TRUTH ABOUT BENCHMARKS: AN INTERVIEW WITH LARRY DAVIS
By Alan Beck, Editor-in-Chief, HPCwire
HPCwire: Because HPC applications differ so widely, benchmarks have always
been a controversial topic. How useful really are the two most generally-
utilized benchmarks, i.e. the Linpack and the IDC Balanced Rating?
LARRY DAVIS: Let me preface my answer by saying that I have a strong personal
aversion to ranking efforts (except, perhaps, in college football); I think
our time is better spent developing a deeper understanding of how and why
important applications perform as they do on different HPC architectures.
Thus, my evaluation of the usefulness of particular benchmarks is heavily
based on their contribution to that goal. The program for which I work, the
DoD High Performance Computing Modernization Program, has recently established
an activity to consider benchmarks from two perspectives. First, we want to
measure the HPC system performance against our specific HPC workload and,
second, we'd like to better understand the performance characteristics of that
workload. Thus, we use application and synthetic benchmarks to help us make
acquisition decisions, and we also are beginning to collect application code
profiling information that will allow application code designers and users to
write and use application codes more efficiently on our HPC systems.
Linpack, as used to populate the Top500, has been with us for awhile, and it
has served to form the basis for a useful historical trend of HPC performance.
I think that at this point most HPC experts agree that a one-dimensional
measure such as Linpack is not a very good predictor of performance for most
current HPC applications, and thus is no longer terribly useful for decisions
such as what systems to buy and how to partition computational workload among
disparate HPC resources. Linpack basically measures maximum CPU performance,
and unfortunately many people have used it it in a search for a single metric
to compare HPC system performance. The IDC Balanced Rating is an attempt to
add additional important dimensions, such as memory bandwidth, to categorize a
system's performance. This is a good idea, but again reducing HPC system
performance to a single metric, even one that attempts to incorporate
additional important dimensions, may be misleading. After an initial look, I
have not attempted to assess its technical merits since our program has opted
for a different approach to the use of benchmarking.
HPC: Do we need more HPC benchmarks or better ones -- or neither? Please
elaborate on your rationale.
LD: I think we basically need more careful studies to determine a minimal
spanning set of low-level HPC benchmarks that can accurately predict
performance of key applications on existing and proposed HPC systems. This is
exactly what we're exploring in our HPC System Performance Modeling Panel on
Nov. 20 at SC2003. We proposed this panel because it was becoming obvious that
several federal agencies, including DoD, NASA and DOE, had begun substantial
programs to investigate this question, and key players from each of those
agencies are represented on the panel. We hope to use the panel as a
springboard to tighten coordination among these federal agencies and other
interested organizations in the benchmarking and performance modeling area. I
invite any additional organizations with an interest in benchmarking and
performance modeling of HPC systems to join us in these discussions.
We all share a common goal -- the ability to predict specific application
performance on current and future HPC systems. We see better benchmarks, in
some form, as essential to achieving that goal. If we reach the point where we
have confidence that a set of low-level benchmarks can be used to accurately
predict the performance of our key applications, we will be able to greatly
simplify the set of benchmarks we use for our HPC acquisitions. In addition,
the application code profiling necessary to use low-level benchmarks to make
accurate performance predictions also can be used to provide guidance to
developers and users of that code to improve the code's performance and target
the use of that code to the most productive architectures for that code.
HPC: How has the increasing popularity of Grid-based HPC affected the
benchmarking picture?
LD: At this point, I don't see that benchmarking Grid-based applications has a
lot of importance to the traditional HPC community. We operate a relatively
small number of consolidated HPC centers, so there is little performance
incentive to run applications across centers. On the other hand, we are very
interested in the possibility of scheduling and queuing jobs across centers as
a method to better utilize our total pool of program resources. Thus,
benchmarking of cross-center schedulers and related system software (e.g. file
systems) is clearly of interest. If, in the future, more applications become
efficiently scalable to thousands of processors and they are latency tolerant,
Grid-based performance measurements will become more important.
HPC: How should vendors tackle the issue of benchmarking?
LD: We'd like for the vendors, of course, to run anything and everything that
we throw at them accurately and precisely on exactly the system configuration
that they are offering us as buyers of their systems. This is really tough,
however, for most vendors to accomplish, primarily because most don't have
large systems to use for benchmarking, particularly of newer architectures
that may not be assembled until just before delivery. I do believe that it
might be worth the vendors' time to establish fairly robust benchmarking
centers with not only reasonably sized equipment, but also available
benchmarking experts. We will continue to work with vendors to find the
correct balance between a very extensive benchmarking suite which fully covers
all possible applications to be run and a minimal benchmarking suite that is
easy for them to run. Part of our motivation for investigating low-level
benchmarks as predictors of performance for key applications is to make it
easier for vendors to easily understand our needs, measure their systems'
performance, and participate in our acquisitions.
HPC: Is there anything else about this issue that our readers should
understand?
LD: I think I've about covered it at a general level. I'd like to emphasize,
in closing, that HPC system and application performance modeling activities
benchmarking -- is needed for any organization to predict performance of an
HPC system relative to its specific workload. In addition, detailed
application code profiling of the workload not only identifies the most
relevant benchmarks, it also provides insight for distributing workload across
multiple HPC resources as well as guidance to application designers and users
on how to improve efficiency of an application on specific HPC architectures.
Both goals are very important to our program.
|