
Features:
HEC RESPONDS TO US-HPC FUNDING, LINPACK, CLUSTER-RUT, ETC
LETTER TO THE EDITOR
A reader writes to express his admiration for the High End Crusader's
arguments, recently presented in HPCwire. The reader also expands on many of
HEC's ideas and urges HEC to write again. The High-End Crusader then briefly
comments on the letter.
Dear High-End Crusader:
Thank you for the series of thought provoking articles published in the last
few months by HPCwire. [HPCwire article numbers
107185,
107292,
107455,
107765,
107896,
108052,
108384,
108466]
Please contribute more. Consider a review of
SC2004, tips on emerging technologies, or comment on the following thoughts
and questions, especially correcting any misperceptions.
(Because HEC's breadth and depth of knowledge exceeds what's possible for a
single person, this correspondent believes that HEC is a collective effort,
genderless and plural, so "HEC" will be used throughout.)
Why HEC Urges U.S. Funding for Supercomputing Research
HEC perspicuously argues that current supercomputing technology ill serves
important national missions without revealing national capabilities or
especially, limitations to potential adversaries. Accordingly, HEC delineates
a large class of computations that performs poorly on clusters, explains
reasons for poor performance, and gives numerous examples of big computations
people would do if they could. Although reasonable people can disagree about
the size of the commercial market for Grand Challenge-scale supercomputing,
HEC argues that national interests in supercomputing alone justify
revitalization of the supercomputing R&D community with significant, long-term
funding.
A letter to the editor explained that vendors sell what customers want to buy:
machines that run large-grained MPI code well. Of course, customers looking to
upgrade physical capital only use large-grained MPI programs because they run
well on their installed base.
Therein lies the cluster/MPI market rut. HEC argues that supercomputing
research sponsorship is a proper role for governments because market forces
alone will not overcome the MPI/cluster rut. HEC's concludes that national
interests in effective supercomputing are so vital to justify sufficient
funding to create a vibrant, alternative, supercomputing R&D community above
and beyond that envisioned by the HECRTF report. (http://www.itrd.gov/hecrtf-outreach/)
HEC's long, cogent argument ranges from incontestable technological facts like
performance impact of architecture imbalance and need for effective
exploitation of temporal and spatial locality, to more abstract, value
judgments like vital national interests justify significant funding for
alternative supercomputing R&D without being too specific about those national
interests.
HEC's secondary argument for funding asserts that general-purpose
supercomputing has commercial potential; many people would run many, new
applications that cannot now partition data and computation together into
large-grained MPI code. HEC cited such applications from diverse fields, many
unfamiliar to this corespondent, that are not being performed now, but would
be, if the government-sponsored supercomputing community produces
revolutionary breakthroughs HEC claims are needed.
Given these limitations this correspondent believes HPCwire readers will agree
with me that HEC's exposition has been brilliant, even if the reader disagrees
with HEC's conclusion.
Granted, whether national interests are vital, whether commercial demand for
computations that cannot be cost-effectively performed now will sustain a
nascent supercomputing market, whether revolutionary breakthroughs are even
possible, and whether government sponsorship of an alternative supercomputing
R&D community is the best, or only, way to make those revolutionary
breakthroughs possible, are all debatable.
Despite these acknowledged weaknesses, this correspondent is persuaded by
HEC's argument on all counts.
Benchmark Kerfuffle
Recently, HEC dared to criticize a vendor's reliance on an easily-scaled
benchmark, Linpack, to proclaim their machine "fastest" [108466]. Prof.
Schonauer's characterization of benchmarks as placebos seems ever more
tautological [107985]. Benchmarks are important to the extent we believe
they're important; as long as raw machoflops are revered, Linpack compares
fairly. Therefore computer architects looking to score high on the TOP500
devote a small fraction of the hardware to the global interconnection network.
HEC argues that HPC Challenge set of benchmarks, which includes Linpack as one
benchmark among many, will be a more accurate measure of performance than
Linpack alone. By characterizing both applications and architectures in three
dimensions, performance can be predicted for particular programs on particular
machines.
Sometimes, customers' workload allows partitioning of computation and data
together into large grains communicating little that can easily be coded as a
SPMD program using MPI. Such lucky customers may find BlueGene/L to be a cost-
effective platform. For unlucky customers whose computation is necessarily
data-dependent and need to access huge data structures by pointer chasing,
BlueGene/L will stand still.
Good global Gup/s with adroit synchronization by devoting a significant
fraction of the system hardware to global interconnect will not improve the
machine's Linpack-TOP500 ranking an iota. However, such a machine might be a
cost-effective platform for those unlucky customers whose applications can not
be partitioned, a priori, into large-grained MPI code.
HPC Challenge benchmarks are an important step forward in performance
estimation to empower supercomputer experts to make the case to their
superiors that proposed acquisitions will meet needs at predictable life-cycle
costs. However, as long as Linpack machoflops remain the measure of computer
power, supercomputing will remain mired in the cluster/MPI market rut.
HPC Challenge Should Include Paper-and-Pencil Option
Even the HPC Challenge benchmarks are tainted! All of the benchmarks have MPI
source code that must be executed without modification allowing vendors to
post additional results after "tweaking".
Consider a supercomputer-friendly processor with 64-bit physical addressing
that recognizes global memory accesses as part of address translation to
immediately initiate a global network transaction. Measuring global network
bandwidth and latency with clumsy, high-overhead MPI packets on such a machine
would miss the dramatic improvement from a high-bandwidth, low-latency global
network for those applications that necessarily do lots of pointer chasing.
HPC Challenge benchmarks should include paper-and-pencil versions, complete
with special review to ensure conformance, allowing novel
models/architectures/languages to be accurately compared with the installed
base running MPI.
Alternative Ranking(s) to Linpack Needed
As an alternative ranking to just Linpack, multiply results of HPC Challenge
benchmarks together representing the volume of a multidimensional prisms could
be used to determine a hierarchy of rankings. The highest rank would include
all HPC Challenge benchmarks, scoring by multiplying aggregate benchmark
results (or dividing in the case of Latency) for the seven categories of
results: HPL (=Linpack), PTRANS, STREAM, RandomAccess (Gup/s), Latency, and
Bandwidth. The machine that gets the highest product of all benchmarks is
deemed "fastest" supercomputer in the world.
But there would be awards for many subsidiary categories considering subsets
of the benchmarks allowing for multiple bragging rights. Not all customers
need all the features measured by the HPC Challenge benchmarks. For them,
getting the largest volume prism of the dimensions of concern for the cost can
provide empirical evidence needed to justify acquisition. Many different
vendors may have meaningful bragging rights for fastest or most cost effective
for some subset of the benchmarks, the overall fastest must include all the
benchmarks, particularly RandomAccess. It's conceivable that a supercomputer-
friendly processor described above might outperform clusters that incur MPI
message overhead for global accesses by a factor of a thousand or more! A
"small" high-bandwidth machine might claim the "fastest" supercomputer title
over much larger, low-bandwidth, MPI clusters.
Sponsor Time-to-Solution Contest
A time-to-solution contest would focus attention on programmability and
scalability together:
a paper-and-pencil description of the problem would be released on the weekend
before a major supercomputing conference; the problem's difficulty would be
chosen intending that at least some crack teams would finish coding and
computing by the end of the conference; contestant teams would walk through
their code with judges during their "final" run; teams would keep data to
measure software productivity in addition to hardware speed; measure
scalability by requiring successively larger data sets running the same
program on the same hardware.
The first data set will small enough to be to fit in PC main memory. Sample
results for the small data set will be given to validate program correctness.
The "medium" data set will be about 100 times larger than the small. The
"large" data set will be 100 times larger than the medium and so on, without
bound, computing ever larger data sets until the contest's end.
Because reward is sweetened for honor, prizes should be offered for 1st, 2nd,
3rd place, and every team that eventually turns in correct results. Most teams
would be sponsored by universities together with supercomputer centers, but
company-sponsored teams would be welcomed too.
Posting results on horse-race monitors throughout the convention as soon as
certified will contribute to the interest, excitement, and news-worthiness of
supercomputing. Withholding results from the last few hours before deadline
will heighten excitement for the awards ceremony.
Such a high-visibility, time-to-solution contest would be every would-be
revolutionary's fantasy: to use a novel process of computing, performed by a
novel machine, programmed with a novel language to compose elegant, short,
highly-concurrent programs, and runs faster on hardware that costs less.
Empirical evidence of programming and processing efficiencies will be
priceless for iconoclasts attempting to show that their technology does indeed
satisfy HEC's demands for revolution in supercomputing.
Some Questions for HEC:
Does the omnibus spending bill recently passed by Congress include funding for
HECRTF recommendations?
If so, will the funding be divided among government agencies, or centralized
to fund integrated programs developing novel architecture, o.s., compiler,
language, computational model, and software development methodology together?
Certainly, decisively winning a time-to-solution contest will publicly
demonstrate achievement of HEC's revolutionary demands, but how could claims
for revolutionary improvements in scalability and programmability be evaluated
before actually building and programming the new machine?
Will software engineering join every other engineering discipline using
mathematics to model its subject by treating programs and their executions as
mathematical objects?
Are mathematically-defined programming languages necessary to prove
interference freedom, because interference among concurrent computations is
often impossible to debug, or sometimes even to detect?
Would a growing library of highly-concurrent, proven-correct subprograms that
can be rapidly cobbled together like tinker toys be likely to improve time-to-
solution for real-word computations HEC needs?
All HPCS winners were teams led by established supercomputing companies that
will happily scoop up all HECRTF funding, if there is any. If possible, how
would HEC set RFP-defined format and prioritize evaluation criteria so that
proposals for projects with the potential for revolutionary improvements HEC
calls for may get funded, eschewing incremental improvements in current
technology?
Is there any way out of the cluster/MPI market rut?
Thank you again, HEC; please contribute more.
Brian R. Larson
Chairman, Multitude Corporation
The HEC has a somewhat exceptional range of interests and expertise as judged
by the articles he has written for HPCwire. Nevertheless, he is an individual
human being who just happens to read and consult a lot; he has learned from a
collection of very smart people.
He is not hibernating, merely recovering from quadruple bypass surgery. When
he gets a bit more energy, he will return as opinionated as ever.
When reached for comment, the High-End Crusader had this to say:
"In the make-or-break struggle to revitalize high-end computing, we are
nowhere near the beginning of the end, even if we are just starting to see the
outlines of the end of the beginning."
"The war for high-end computing is just hitting its full stride. It is a war
of competing ideas, with powerful vendor and government forces in play. Like
another famous war, not everyone recognizes the war for high-end computing for
what it is nor how large a task it will be to win it."
|