![]() |
|
| The global publication of record for High Performance Computing / March 19, 2004: Vol. 13, No. 11 | |
|
||||
Features:LETTERS TO THE EDITOR: RESPONDING TO THE HIGH END CRUSADER[The following are responses to last week's article "US FUNDING PRIORITIES AND ROADMAPS FOR PETFLOPS" written by the anonymous High End Crusader, http://www.tgc.com/hpcwire/hpcwireWWW/04/0312/107185.html.] Great article! Thanks. Speaking as a co-chair of the HECRTF -- and I agree that the fact that the report has not yet been made public is truly "tragic" - - I would suggest that rather than speak of the "collapse of HECRTF", it is more correct to speak of the stonewalling of HECRTF by the Administration (OMB). Congress (e.g., the House Science Committee) desperately wants this report and they can't even get it! How very sad. Cheers, Dear Alan, While there are some technical apps (the NSA's GUPS benchmark, notably) that suffer badly from a complete lack of spatiotemporal locality in data refereces, sparse matrix kernels are not among them. Thus, the statement "... random single-word memory accessing, say, in sparse-matrix operations, destroys spatial locality ..." in the High End Crusader's recounting of a microprocessor architecture analysis (that he attributes to Dally) is somewhat off the mark. Look at sparse matrix-vector product using compressed sparse row storage, as an example. Four arrays are accessed: one (the destination vector) with perfect temporal locality (repeated access of the same element), two of them with perfect stride-one spatial locality (the matrix elements and column indices) and one (the source vector) with a partially random pattern. Partially random, because intelligent ordering of rows and columns of the matrix can cluster nonzero elements near the diagonal, and so the part if the source vector that has been touched once and will be touched again will likely fit in some level of the cache. Similary comments apply to direct sparse solvers, which now make use of level-3 BLAS code for much of their computational work. A number of fine papers presenting results of this kind were given last week at the SIAM Conference on Parallel Processing for Scientific Computing. The Crusader's pessimism may be realistic for scientific code that hasn't adapted to modern architectures, and for certain recalcitrant benchmarks that have defied optimization; they are not correct for the newest sparse matrix kernels. Of course, architectures that require such significant code modification and optimization are less user-friendly than those that have bandwidth to burn. But because of these recent algorithmic adaptations to low-bandwidth systems, sparse matrices aren't the best exemplar of the problem of getting performance on irregular applications. Rob Schreiber Dear Alan, I am a subscriber to your HPCwire newsletter services. Your 3/12/2004 HPCwire featured an article on "US Funding Priorities and Roadmaps for Petaflops [107185]". This article very accurately describes the paradigm shift necessary to overcome current limitations in HPC. Jeffrey Wolff Dear High-End Crusader, With great interest I have read in HPCwire your note on US Funding Priorities and Rodmaps for Petaflops. Here are two remarks: If we want to design a PFLOPS computer, we must be able to pay PBYTES of memory for a balanced system. Before we cannot afford such a memory, there will be no PFLOPS computer. In my lecture notes "Scientific Supercomputing" I have defined in section 17.4 how the ideal supercomputer should look like. I would like to send you a copy, if you give me an address where to send it. This may be of interest to you. Best regards, Willi Schoenauer |
||||
| | Table of Contents | |