HPCwire
 The global publication of record for High Performance Computing / October 8, 2004: Vol. 13, No. 40

  |  Table of Contents  |  

Features:

ON THE RELEVANCE OF IBM'S FASTEST COMPUTER CLAIM

In article 108466.html, the High-End Crusader questions the worthiness of the recent announcement by IBM concerning the performance of the Blue Gene/L system on the Linpack benchmark. The Crusader makes a few interesting points that I would like to discuss, but first, I point out that he has missed an important aspect of the announcement. To appreciate this, consider the advanced packaging technology that allows BG/L to deliver the same Linpack performance as ES (Earth Simulator), but with less than 2 orders of magnitude in cost (using the most conservative estimates of ES), 2.5 orders of magnitude lower in power consumption, and several orders of magnitude smaller in footprint. These translate to cost savings during operation, and shows that the United States can build powerful machines within far tigher economic boundaries than ES.

The Crusader then argues that it would have been more relevant to report BG/L's performance on the new HPC Challenge benchmark suite. He projects that ES would beat BG/L hands down as we move toward the northeastern corner of the cartesian product of the HPC Challenge space (applications with low space and time locality). First, I would like to declare that I agree with the Crusader's assertion about the limitations of the Linpack benchmark and its inadequacies. However, if it is meaningless to talk about an absolute Linpack TF/sec, then also I consider it equally meaningless to talk about absolute GUPS or PTRANS numbers without calibrating these with the cost/operation. In a sense, reporting these absolute numbers as a measure of "goodness" encourages designs that seek performance leadership at all cost---essentially an ES exercise, which is an unaffordable and unrealistic proposition for the U.S. The user community really needs the industrial players to focus on maximizing the harnessed performance/$, because this is what they base their procurement decisions on. Tying the benchmark to $'s will foster creativity in coming up with solutions that maximize the return per $ invested. Eventually, including cost as a factor will produce solutions that can survive in the marketplace on their own economic merit. I note that the TPC benchmarks have done that for years, with a reasonable degree of acceptability and success (nothing is perfect). I also note that the recent IDC numbers are showing clearly that the user community are voting increasingly for cost-efficient solutions, such as grids or clusters of Linux systems. This tidal wave cannot be ignored.

To sum it up, the HPC Challenge as it stands today is a cumbersome, if not flawed, methodology to compare machines. At a minimum, I believe that a 3rd dimension must be added to express the cost of what is being compared, if we are going to make comparisons at all. This is an important issue for a multi- dimensional benchmark, where no single figure of merit can be used to guide comparisons or procurement decisions. If this dimension is included, then we can start talking about the GUPS/$ of BG/L, which following your methodology, I project a competitive figure compared to ES.

Another aspect of the HPC Challenge is the chaotic structure for reporting results. Respectable benchmarks such as SPEC and TPC impose serious restrictions on who can report results, and how these reports can be certified and released. It is not surprising that these benchmarks, flawed as they may be, have become defacto standards for reporting performance and they are being used as we speak to drive procurement decisions. Currently, the HPC Challenge does not have a similar arrangement. It will prove difficult to focus on this benchmark as long as there is no formal process for reporting these numbers. Remember, the benchmarks in the HPC area are many (NAS, BLAST, SPLASH, etc), and none of them has achieved the success of the Linpack benchmark. Why?

The Crusader also makes some comments about applications. The Crusader argues that BG/L is not a general-purpose parallel machine, as it focuses on a narrow region in the cartesian space of the HPC Challenge benchmark. Perhaps. But consider the lifecycle of real applications. A common myth is that most vendors design their systems for the "commercial" application market, where applications have low memory bandwidth requirements and can effectively use the cache to improve performance and obtain lower memory latency. This is not really true. Vendors design their systems to attain performance leadership within cost constraints that yield a commercially competitve product. Applications are fitted into the cost structure through continuous performance tuning, and hardware components that yield the best performance/cost are added only when all else fails (e.g. caches). Early versions of transaction processing code, Web serving code, and the like, started really closer to the northeastern corner of the HPC Challenge cartesian space (low time and space locality), and then gradually moved over the years toward the southwestern corner (high time and space locality) as advances in algorithms and performance tuning yielded tangible fruits. Linpack itself did not start in the southwestern corner. It is only because of its status in defining the top 500 list that vendors started focusing on it, improving its performance by algorithmic changes and the like. The nutshell of this is that applications cannot be thought of as static points in the HPC Challenge cartesian space, indeed, they gradually move to the southwest to fit the cost envelope that is mandated by the user community and the competitive market environment. It is perhaps interesting here to see that some of the applications in the HPCS project show some potential to move southwesternly to fit in "mainstream" machines.

Last, the Crusader expresses frustration concerning the apparent decision to abandon the HECRTF proposal by the US government. I share the frustration. But the picture currently is not clear. Consider the current set of applications believed to require the expensive machines that can handle the northeastern corner of the HPC Challenge. How many of these applications can we ascertain that they cannot move to the south or the west on the curve any further? Have we applied the right computer science to these applications? The answers are not clear. The debate then is whether public funds should be used to train better computer scientists who can move these applications down to run on cheap machines, or whether it should go to build expensive machines that can accommodate inexperienced programmers and tolerate sloppy algorithms/programming? The private funds, any way, seem to go for the first alternative, sometimes overpaying for talent. A movie like Shrek-2 has been produced on commodity hardware after the rendering algorithms have been changed to accommodate the cheaper hardware. Globalization also seems to be pushing into the global market more talent. And even if the debate ascertains that the second alternative is the way public funds should go to, what is the long-term effect on our competetiveness as a nation, when we fail to train enough computer scientists to solve these problems? No amount of memory bandwidth will, on the long run, bail out a low-skilled programmer community. Certainly, not in this global market.


Editor's note: This article, due to the author's request, has been posted anonymously. Any comments will be considered for publication and can be sent to tim@tgc.com.


Top of Page

  |  Table of Contents  |