![]() |
|
| The global publication of record for High Performance Computing / September 24, 2004: Vol. 13, No. 38 | |
|
||||
Features:RADICAL CHANGE MAY NOT BE THE ANSWER: RESPONDING TO THE HECDear High-End Crusader,With great interest I have read your article "High End Computing Needs Radical Programming Change" [108384] in the September 17, 2004 edition of HPCwire [http://www.tgc.com/hpcwire/hpcwireWWW/04/0917/108384.html.] In my group, we develop software for the numerical solution of nonlinear systems of PDEs (with error estimate). This is a black-box solver where the user puts in the PDEs and the domain (by the grid). The goal is to generate highly efficient code, including the solution of the extremely large and sparse linear systems (with full LU preconditioning). This is a typical application of supercomputers (we test the code on all architectures available in German universities). With this background in mind, there are some comments to your article: 1. The kernel problem of supercomputing is the bottleneck for cache (hierarchy), memory and communication. If the CPU waits for operands, it is idling. This cannot be cured by a better language or a radical programming change. 2. There are some measures that can help: cache reuse, storing of overlap data to avoid communication etc. The key to all these "tricks" is the basic data structure. No compiler or code optimizer can do this, and also not a radical programming change. 3. For our coding of numerical algorithms there is no alternative to Fortran. Scientific codes operate exclusively with arrays. The array concept of Fortran has been designed just for these needs. I cannot understand why people program scientific codes in C or C++. Fortran compilers are tailored to generate efficient code. How could a new language improve the situation? 4. Explicit message passing by MPI or Co-Array Fortran is the only possibility to use efficiently distributed memory parallel computers. An important effect of MPI is that the programmer is forced to _think about communication_: he knows which data are where at which moment. If he uses OpenMP for a distributed global shared memory like the SGI Origin/Altix he is not aware when communication takes place and thus will not optimize it. 5. The generation of efficient code for a parallel computer must start with the optimization of the single processor code. If we have inefficient single processor code, in SPMD (same program on all processors) for 1024 processors we have 1024-fold inefficiency. Therefore we need a programming language that generates efficient single processor code. 6. We should have really parallel computers where the communication can completely be hidden behind the communication (the Fujitsu VPP5000 was such a computer). Then we had only to care for latency hiding by corresponding data structures, e.g. by double buffer techniques where one buffer is processed and sent in a ring , while the other buffer is loaded. Unfortunately in many computers the CVPU has to do also the communication and latency hiding is not possible. How could a "radical programming change" help here? 7. Finally we must recognize that all efforts end at the _hardware_. Unfortunately we are forced to compute with commercial computers, not scientific computers. The market for scientific computers is too small, so really efficient scientific supercomputers will never be built. A radical programming change will not cure this inherent fault. Best regards, Willi Schoenauer Willi Schoenauer |
||||
| | Table of Contents | |