HPCwire
 The global publication of record for High Performance Computing / June 18, 2004: Vol. 13, No. 24

Previous Article   |  Table of Contents  |  

Features:

WATANABE DISCUSSES NEC'S POSITION IN HPC INDUSTRY
by Christopher Lazou

Christopher Lazou interviewed NEC's VP, Mr. Tadashi Watanabe, designer of the NEC SX series who explained why "vector parallel multiprocessor architecture is the most promising supercomputing path to achieve sustained petaflop/s."

CL: (Christopher Lazou): Four years ago I interviewed you in Paris and discussed several burning issues of the time. Since then many things happened. The great achievement of the Earth Simulator - causing a major rethink in the USA with the emphasis on high productivity, the one processor on a chip in the SX-6, the Intel Itanium Family of systems, the failure of commodity chips to deliver badly needed sustained performance and the resuscitation of Cray Inc., focusing on vector parallel architectures again, just to list but a few such momentous events. Now we meet in Kiel and what I suggest is to run through my questions and get an update where NEC is today.

There is a lot of talk about replacing silicon with new material for computer chips. What do you make of this and what is the new predicted life of silicon for high end supercomputing?

TW (Tadashi Watanabe): NEC is naturally doing basic research for alternative material for the future, but the next six years and possibly longer will be using silicon following the ITRS roadmap achieving 45nanometer CMOS devices by around 2010. These devices will probably use liquid cooling for High End computing. Liquid cooling will not be necessary for consumer goods such as mobiles as these devices work on low power and cooling is not as big a problem.

CL: Chip manufacturers have managed to refine their etching to produce more dense chips. Can you give some indication of new near future NEC products?

TW: The current generation etching of NEC is 90nanometers already mass- produced and I expect to reduce this first to 65nanometers going down to 45nanometers by around year 2010. One has to remember that etching is only part of the problem for producing viable chips.

CL: What are the engineering challenges to be overcome?

TW: For a start there is heat extraction. When you have a Giga number of transistors on an area of a centimetre square, the heat reaches a few hundreds of watts, the same as a hot soldering iron. This heat needs to be extracted somehow. Another major problem is leakage of current, which becomes very pronounced for devices below 100nanometers.

CL: What about performance issues?

TW: There are several performance issues. First with a rich populated chip, the frequency of the internal clock gets shorter, but one has the problem that the clock cycle outside the chip is always longer. Even if the internal chip frequency reaches 4 GHz when using 65nanometers, the snag is when one goes out of chip this frequency is much longer. Consequently, the peak speed of the processor will grow according to frequency, delivering speeds satisfying Moore's Law, but the actual sustained speed will grow at a slower rate. What one needs is a set of balanced system parameters in order to achieve a high, sustained performance from the overall system.

Another issue is pin size. With more transistors one needs greater numbers of pins and this sets physical limits. The system architect has to compromise to accommodate physical pin size. At present there are 5,000 pins on our chips and this factor slows down what performance can be achieved. It is not possible for pin count to increase in proportion to the increase of the frequency. Therefore, pin count and outside chip clock frequency will affect memory bandwidth.

CL: Is memory bandwidth still important?

TW: Yes, very important. As you know, for a supercomputer it is very critical that the frequencies of all the elements, which make the total system should be synchronised, whereas on PCs only the chip is relevant.

CL: What alternatives are there for increasing performance?

TW: The NEC SX-6 has one processor on a single chip. Although it is possible to have more than one processor on a single chip, it would not necessarily help since performance particularly the CPU will saturate; as a total system we will need more parallelism. Thus for the High End vector systems, having 2 cores on a chip the benefit is not clear-cut, as we will need the High End bandwidth to and from memory, so the trade-off may not be as advantageous. As for the Itanium Processor Family (IPF) chips, this is likely to happen, but it depends on Intel. IBM has 2 processors on a chip, and it is true that more processors on a chip will increase peak performance. Thus, to benefit from multiple CPUs on a chip we will need to expand parallelism.

Another way is to have the processor and memory on the same chip. Even if they can co-exist the memory size relative to processor speed becomes a problem. For every one Gflop/s we need 10 Gbits of memory to maintain a balanced efficient system. For a 100 Gflop/s CPU, we will need one trillion bits of memory. One can see that for high performance the processor and memory idea is not feasible. This of course is possible for small amounts of memory, but not for enough memory to produce a balanced system.

CL: Let me clarify this further. Are you saying that RISC processor machines where vendors intent to put up to 4 processors on a chip are not going to deliver sustained performance to user applications?

TW: No, in my view what you will get is high peak within a chip, but this will not translate into an equivalent benefit for the user application. The reasons are those explained above, how to deal with out-of-chip clock frequencies, out-of-chip memory and associated memory bandwidth and the tricky engineering problem of pin size. All these problems will conspire against delivering the extra peak performance to the user application. The other vendors' solution is to introduce more cache to service the CPUs. Using cache it becomes viable, but for main memory this is not a bright idea.

CL: How is performance to be increased then?

TW: Having exhausted the processor and memory on a chip, the only path open to us is hierarchical memory. We know that as performance increases, data size also increases. Architects must consider these problems and find a compromise solution to optimise the total system. This in my view inevitably leads to a vector parallel multiprocessor system with a hierarchical shared memory.

One component, which is a candidate for delivering speed improvement, is the inclusion of optical connections from CPU to memory. Electrical signals degrade as they propagate thus distance becomes a limiting factor, while optical propagation does not have this problem, but the optical device suffers from high cost to perform conversion from electrical to optical and then back to electrical signal. One has to decide whether the trade-off is worthwhile.

CL: What do you think of the various R&D architecture paths pursued in the USA, by IBM, Cray Inc., and Sun Microsystems as part of the DARPA, High End Computing Revitalization Task Force (HEC-RTF) high productivity initiative?

TW: We have to wait and see what they actually come up with as a practical solution. Some of the ideas proposed are interesting concepts, but in my view may turn out to suffer from a major drawback, namely that they end up being too specialised. In that case they will probably achieve good performance in certain codes, which match this particular machine architecture, but it is unlikely to become competitive across the more general scientific applications. Another difficulty is changing the software to take advantage of these specialised features, especially large application codes provided by ISVs. We believe a more general architecture adopted in the NEC SX systems is more flexible and allows incorporation of more parallelism and more functionality and creating a balanced system while incorporating the latest advancements in technology.

CL: The SX series from its inception over 20 years ago turned out to be a very successful architecture and has become the undisputed workhorse for large- scale applications. It is of course based on vector parallel processors. What are we to expect from future supercomputers then?

TW: I think future supercomputers will have more compact circuits, providing a faster clock frequency, the new technology will enable the inclusion of more CPUs, more functional parallelism, more functional unit pipelining and more pins.

The NEC path is to improve on current vector architecture and not introduce any specialised new features. Peak petaflop/s performance by 2009 is technically possible, but achieving sustained petaflop/s performance on a general-purpose system, is another matter. Nevertheless, I am convinced that a parallel vector processor system is the most likely architecture to deliver sustained petaflop/s. Given the expected technology trends, petaflop/s can be delivered with around 10,000 vector processors, rather than the much larger number of processors needed when using a scalar microprocessor architecture.

CL: One last question. With the US government providing a lot of R&D funding, in effect subsidising US vendors to enable them to deliver a high productivity supercomputer with one petaflop/s sustained performance by 2009, can you briefly describe the business vision of NEC not only in supercomputing, but for delivering a high productivity total solution to the user?

TW: NEC will continue to support current SX vector architecture for high end capability computing. The vector architecture provides very efficient processing capability, particularly for many technical applications, such as meteorology, computational physics and chemistry, and crash analysis. As far as the challenge from the USA on petaflop/s systems by 2009, as a response to the Earth Simulator, NEC will be there, leading the field. The technology breakthroughs coming out of this kind of project are transferable to NEC's volume products and hence the R&D costs are amortised across the other NEC businesses.

CL: I think we explored a number of issues. Thank you for your time in talking to me and I am sure our readers would find this update very interesting.

(Brands and names are the property of their respective owners) Copyright: Christopher Lazou, HiPerCom Consultants, Ltd., UK. June 2004.


Top of Page

Previous Article   |  Table of Contents  |