HPCwire
 The global publication of record for High Performance Computing / December 10, 2004: Vol. 13, No. 49

Previous Article   |  Table of Contents  |  

Features:

HEC RESPONDS TO US-HPC FUNDING, LINPACK, CLUSTER-RUT, ETC
LETTER TO THE EDITOR

A reader writes to express his admiration for the High End Crusader's arguments, recently presented in HPCwire. The reader also expands on many of HEC's ideas and urges HEC to write again. The High-End Crusader then briefly comments on the letter.


Dear High-End Crusader:

Thank you for the series of thought provoking articles published in the last few months by HPCwire. [HPCwire article numbers 107185, 107292, 107455, 107765, 107896, 108052, 108384, 108466] Please contribute more. Consider a review of SC2004, tips on emerging technologies, or comment on the following thoughts and questions, especially correcting any misperceptions.

(Because HEC's breadth and depth of knowledge exceeds what's possible for a single person, this correspondent believes that HEC is a collective effort, genderless and plural, so "HEC" will be used throughout.)

Why HEC Urges U.S. Funding for Supercomputing Research

HEC perspicuously argues that current supercomputing technology ill serves important national missions without revealing national capabilities or especially, limitations to potential adversaries. Accordingly, HEC delineates a large class of computations that performs poorly on clusters, explains reasons for poor performance, and gives numerous examples of big computations people would do if they could. Although reasonable people can disagree about the size of the commercial market for Grand Challenge-scale supercomputing, HEC argues that national interests in supercomputing alone justify revitalization of the supercomputing R&D community with significant, long-term funding.

A letter to the editor explained that vendors sell what customers want to buy: machines that run large-grained MPI code well. Of course, customers looking to upgrade physical capital only use large-grained MPI programs because they run well on their installed base.

Therein lies the cluster/MPI market rut. HEC argues that supercomputing research sponsorship is a proper role for governments because market forces alone will not overcome the MPI/cluster rut. HEC's concludes that national interests in effective supercomputing are so vital to justify sufficient funding to create a vibrant, alternative, supercomputing R&D community above and beyond that envisioned by the HECRTF report. (http://www.itrd.gov/hecrtf-outreach/)

HEC's long, cogent argument ranges from incontestable technological facts like performance impact of architecture imbalance and need for effective exploitation of temporal and spatial locality, to more abstract, value judgments like vital national interests justify significant funding for alternative supercomputing R&D without being too specific about those national interests.

HEC's secondary argument for funding asserts that general-purpose supercomputing has commercial potential; many people would run many, new applications that cannot now partition data and computation together into large-grained MPI code. HEC cited such applications from diverse fields, many unfamiliar to this corespondent, that are not being performed now, but would be, if the government-sponsored supercomputing community produces revolutionary breakthroughs HEC claims are needed.

Given these limitations this correspondent believes HPCwire readers will agree with me that HEC's exposition has been brilliant, even if the reader disagrees with HEC's conclusion.

Granted, whether national interests are vital, whether commercial demand for computations that cannot be cost-effectively performed now will sustain a nascent supercomputing market, whether revolutionary breakthroughs are even possible, and whether government sponsorship of an alternative supercomputing R&D community is the best, or only, way to make those revolutionary breakthroughs possible, are all debatable.

Despite these acknowledged weaknesses, this correspondent is persuaded by HEC's argument on all counts.

Benchmark Kerfuffle

Recently, HEC dared to criticize a vendor's reliance on an easily-scaled benchmark, Linpack, to proclaim their machine "fastest" [108466]. Prof. Schonauer's characterization of benchmarks as placebos seems ever more tautological [107985]. Benchmarks are important to the extent we believe they're important; as long as raw machoflops are revered, Linpack compares fairly. Therefore computer architects looking to score high on the TOP500 devote a small fraction of the hardware to the global interconnection network.

HEC argues that HPC Challenge set of benchmarks, which includes Linpack as one benchmark among many, will be a more accurate measure of performance than Linpack alone. By characterizing both applications and architectures in three dimensions, performance can be predicted for particular programs on particular machines.

Sometimes, customers' workload allows partitioning of computation and data together into large grains communicating little that can easily be coded as a SPMD program using MPI. Such lucky customers may find BlueGene/L to be a cost- effective platform. For unlucky customers whose computation is necessarily data-dependent and need to access huge data structures by pointer chasing, BlueGene/L will stand still.

Good global Gup/s with adroit synchronization by devoting a significant fraction of the system hardware to global interconnect will not improve the machine's Linpack-TOP500 ranking an iota. However, such a machine might be a cost-effective platform for those unlucky customers whose applications can not be partitioned, a priori, into large-grained MPI code.

HPC Challenge benchmarks are an important step forward in performance estimation to empower supercomputer experts to make the case to their superiors that proposed acquisitions will meet needs at predictable life-cycle costs. However, as long as Linpack machoflops remain the measure of computer power, supercomputing will remain mired in the cluster/MPI market rut.

HPC Challenge Should Include Paper-and-Pencil Option

Even the HPC Challenge benchmarks are tainted! All of the benchmarks have MPI source code that must be executed without modification allowing vendors to post additional results after "tweaking".

Consider a supercomputer-friendly processor with 64-bit physical addressing that recognizes global memory accesses as part of address translation to immediately initiate a global network transaction. Measuring global network bandwidth and latency with clumsy, high-overhead MPI packets on such a machine would miss the dramatic improvement from a high-bandwidth, low-latency global network for those applications that necessarily do lots of pointer chasing.

HPC Challenge benchmarks should include paper-and-pencil versions, complete with special review to ensure conformance, allowing novel models/architectures/languages to be accurately compared with the installed base running MPI.

Alternative Ranking(s) to Linpack Needed

As an alternative ranking to just Linpack, multiply results of HPC Challenge benchmarks together representing the volume of a multidimensional prisms could be used to determine a hierarchy of rankings. The highest rank would include all HPC Challenge benchmarks, scoring by multiplying aggregate benchmark results (or dividing in the case of Latency) for the seven categories of results: HPL (=Linpack), PTRANS, STREAM, RandomAccess (Gup/s), Latency, and Bandwidth. The machine that gets the highest product of all benchmarks is deemed "fastest" supercomputer in the world.

But there would be awards for many subsidiary categories considering subsets of the benchmarks allowing for multiple bragging rights. Not all customers need all the features measured by the HPC Challenge benchmarks. For them, getting the largest volume prism of the dimensions of concern for the cost can provide empirical evidence needed to justify acquisition. Many different vendors may have meaningful bragging rights for fastest or most cost effective for some subset of the benchmarks, the overall fastest must include all the benchmarks, particularly RandomAccess. It's conceivable that a supercomputer- friendly processor described above might outperform clusters that incur MPI message overhead for global accesses by a factor of a thousand or more! A "small" high-bandwidth machine might claim the "fastest" supercomputer title over much larger, low-bandwidth, MPI clusters.

Sponsor Time-to-Solution Contest

A time-to-solution contest would focus attention on programmability and scalability together:

a paper-and-pencil description of the problem would be released on the weekend before a major supercomputing conference; the problem's difficulty would be chosen intending that at least some crack teams would finish coding and computing by the end of the conference; contestant teams would walk through their code with judges during their "final" run; teams would keep data to measure software productivity in addition to hardware speed; measure scalability by requiring successively larger data sets running the same program on the same hardware.

The first data set will small enough to be to fit in PC main memory. Sample results for the small data set will be given to validate program correctness. The "medium" data set will be about 100 times larger than the small. The "large" data set will be 100 times larger than the medium and so on, without bound, computing ever larger data sets until the contest's end.

Because reward is sweetened for honor, prizes should be offered for 1st, 2nd, 3rd place, and every team that eventually turns in correct results. Most teams would be sponsored by universities together with supercomputer centers, but company-sponsored teams would be welcomed too.

Posting results on horse-race monitors throughout the convention as soon as certified will contribute to the interest, excitement, and news-worthiness of supercomputing. Withholding results from the last few hours before deadline will heighten excitement for the awards ceremony.

Such a high-visibility, time-to-solution contest would be every would-be revolutionary's fantasy: to use a novel process of computing, performed by a novel machine, programmed with a novel language to compose elegant, short, highly-concurrent programs, and runs faster on hardware that costs less. Empirical evidence of programming and processing efficiencies will be priceless for iconoclasts attempting to show that their technology does indeed satisfy HEC's demands for revolution in supercomputing.

Some Questions for HEC:

Does the omnibus spending bill recently passed by Congress include funding for HECRTF recommendations?

If so, will the funding be divided among government agencies, or centralized to fund integrated programs developing novel architecture, o.s., compiler, language, computational model, and software development methodology together?

Certainly, decisively winning a time-to-solution contest will publicly demonstrate achievement of HEC's revolutionary demands, but how could claims for revolutionary improvements in scalability and programmability be evaluated before actually building and programming the new machine?

Will software engineering join every other engineering discipline using mathematics to model its subject by treating programs and their executions as mathematical objects?

Are mathematically-defined programming languages necessary to prove interference freedom, because interference among concurrent computations is often impossible to debug, or sometimes even to detect?

Would a growing library of highly-concurrent, proven-correct subprograms that can be rapidly cobbled together like tinker toys be likely to improve time-to- solution for real-word computations HEC needs?

All HPCS winners were teams led by established supercomputing companies that will happily scoop up all HECRTF funding, if there is any. If possible, how would HEC set RFP-defined format and prioritize evaluation criteria so that proposals for projects with the potential for revolutionary improvements HEC calls for may get funded, eschewing incremental improvements in current technology?

Is there any way out of the cluster/MPI market rut?

Thank you again, HEC; please contribute more.

Brian R. Larson
Chairman, Multitude Corporation


The HEC has a somewhat exceptional range of interests and expertise as judged by the articles he has written for HPCwire. Nevertheless, he is an individual human being who just happens to read and consult a lot; he has learned from a collection of very smart people.

He is not hibernating, merely recovering from quadruple bypass surgery. When he gets a bit more energy, he will return as opinionated as ever.

When reached for comment, the High-End Crusader had this to say:

"In the make-or-break struggle to revitalize high-end computing, we are nowhere near the beginning of the end, even if we are just starting to see the outlines of the end of the beginning."

"The war for high-end computing is just hitting its full stride. It is a war of competing ideas, with powerful vendor and government forces in play. Like another famous war, not everyone recognizes the war for high-end computing for what it is nor how large a task it will be to win it."


Top of Page

Previous Article   |  Table of Contents  |