
Features:
WATANABE DISCUSSES NEC'S POSITION IN HPC INDUSTRY
by Christopher Lazou
Christopher Lazou interviewed NEC's VP, Mr. Tadashi Watanabe, designer of the
NEC SX series who explained why "vector parallel multiprocessor architecture
is the most promising supercomputing path to achieve sustained petaflop/s."
CL: (Christopher Lazou): Four years ago I interviewed you in Paris and
discussed several burning issues of the time. Since then many things happened.
The great achievement of the Earth Simulator - causing a major rethink in the
USA with the emphasis on high productivity, the one processor on a chip in the
SX-6, the Intel Itanium Family of systems, the failure of commodity chips to
deliver badly needed sustained performance and the resuscitation of Cray Inc.,
focusing on vector parallel architectures again, just to list but a few such
momentous events. Now we meet in Kiel and what I suggest is to run through my
questions and get an update where NEC is today.
There is a lot of talk about replacing silicon with new material for computer
chips. What do you make of this and what is the new predicted life of silicon
for high end supercomputing?
TW (Tadashi Watanabe): NEC is naturally doing basic research for alternative
material for the future, but the next six years and possibly longer will be
using silicon following the ITRS roadmap achieving 45nanometer CMOS devices by
around 2010. These devices will probably use liquid cooling for High End
computing. Liquid cooling will not be necessary for consumer goods such as
mobiles as these devices work on low power and cooling is not as big a
problem.
CL: Chip manufacturers have managed to refine their etching to produce more
dense chips. Can you give some indication of new near future NEC products?
TW: The current generation etching of NEC is 90nanometers already mass-
produced and I expect to reduce this first to 65nanometers going down to
45nanometers by around year 2010. One has to remember that etching is only
part of the problem for producing viable chips.
CL: What are the engineering challenges to be overcome?
TW: For a start there is heat extraction. When you have a Giga number of
transistors on an area of a centimetre square, the heat reaches a few hundreds
of watts, the same as a hot soldering iron. This heat needs to be extracted
somehow. Another major problem is leakage of current, which becomes very
pronounced for devices below 100nanometers.
CL: What about performance issues?
TW: There are several performance issues. First with a rich populated chip,
the frequency of the internal clock gets shorter, but one has the problem that
the clock cycle outside the chip is always longer. Even if the internal chip
frequency reaches 4 GHz when using 65nanometers, the snag is when one goes out
of chip this frequency is much longer. Consequently, the peak speed of the
processor will grow according to frequency, delivering speeds satisfying
Moore's Law, but the actual sustained speed will grow at a slower rate. What
one needs is a set of balanced system parameters in order to achieve a high,
sustained performance from the overall system.
Another issue is pin size. With more transistors one needs greater numbers of
pins and this sets physical limits. The system architect has to compromise to
accommodate physical pin size. At present there are 5,000 pins on our chips
and this factor slows down what performance can be achieved. It is not
possible for pin count to increase in proportion to the increase of the
frequency. Therefore, pin count and outside chip clock frequency will affect
memory bandwidth.
CL: Is memory bandwidth still important?
TW: Yes, very important. As you know, for a supercomputer it is very critical
that the frequencies of all the elements, which make the total system should
be synchronised, whereas on PCs only the chip is relevant.
CL: What alternatives are there for increasing performance?
TW: The NEC SX-6 has one processor on a single chip. Although it is possible
to have more than one processor on a single chip, it would not necessarily
help since performance particularly the CPU will saturate; as a total system
we will need more parallelism. Thus for the High End vector systems, having 2
cores on a chip the benefit is not clear-cut, as we will need the High End
bandwidth to and from memory, so the trade-off may not be as advantageous. As
for the Itanium Processor Family (IPF) chips, this is likely to happen, but it
depends on Intel. IBM has 2 processors on a chip, and it is true that more
processors on a chip will increase peak performance. Thus, to benefit from
multiple CPUs on a chip we will need to expand parallelism.
Another way is to have the processor and memory on the same chip. Even if they
can co-exist the memory size relative to processor speed becomes a problem.
For every one Gflop/s we need 10 Gbits of memory to maintain a balanced
efficient system. For a 100 Gflop/s CPU, we will need one trillion bits of
memory. One can see that for high performance the processor and memory idea is
not feasible. This of course is possible for small amounts of memory, but not
for enough memory to produce a balanced system.
CL: Let me clarify this further. Are you saying that RISC processor machines
where vendors intent to put up to 4 processors on a chip are not going to
deliver sustained performance to user applications?
TW: No, in my view what you will get is high peak within a chip, but this will
not translate into an equivalent benefit for the user application. The reasons
are those explained above, how to deal with out-of-chip clock frequencies,
out-of-chip memory and associated memory bandwidth and the tricky engineering
problem of pin size. All these problems will conspire against delivering the
extra peak performance to the user application. The other vendors' solution is
to introduce more cache to service the CPUs. Using cache it becomes viable,
but for main memory this is not a bright idea.
CL: How is performance to be increased then?
TW: Having exhausted the processor and memory on a chip, the only path open to
us is hierarchical memory. We know that as performance increases, data size
also increases. Architects must consider these problems and find a compromise
solution to optimise the total system. This in my view inevitably leads to a
vector parallel multiprocessor system with a hierarchical shared memory.
One component, which is a candidate for delivering speed improvement, is the
inclusion of optical connections from CPU to memory. Electrical signals
degrade as they propagate thus distance becomes a limiting factor, while
optical propagation does not have this problem, but the optical device suffers
from high cost to perform conversion from electrical to optical and then back
to electrical signal. One has to decide whether the trade-off is worthwhile.
CL: What do you think of the various R&D architecture paths pursued in the
USA, by IBM, Cray Inc., and Sun Microsystems as part of the DARPA, High End
Computing Revitalization Task Force (HEC-RTF) high productivity initiative?
TW: We have to wait and see what they actually come up with as a practical
solution. Some of the ideas proposed are interesting concepts, but in my view
may turn out to suffer from a major drawback, namely that they end up being
too specialised. In that case they will probably achieve good performance in
certain codes, which match this particular machine architecture, but it is
unlikely to become competitive across the more general scientific
applications. Another difficulty is changing the software to take advantage of
these specialised features, especially large application codes provided by
ISVs. We believe a more general architecture adopted in the NEC SX systems is
more flexible and allows incorporation of more parallelism and more
functionality and creating a balanced system while incorporating the latest
advancements in technology.
CL: The SX series from its inception over 20 years ago turned out to be a very
successful architecture and has become the undisputed workhorse for large-
scale applications. It is of course based on vector parallel processors. What
are we to expect from future supercomputers then?
TW: I think future supercomputers will have more compact circuits, providing a
faster clock frequency, the new technology will enable the inclusion of more
CPUs, more functional parallelism, more functional unit pipelining and more
pins.
The NEC path is to improve on current vector architecture and not introduce
any specialised new features. Peak petaflop/s performance by 2009 is
technically possible, but achieving sustained petaflop/s performance on a
general-purpose system, is another matter. Nevertheless, I am convinced that a
parallel vector processor system is the most likely architecture to deliver
sustained petaflop/s. Given the expected technology trends, petaflop/s can be
delivered with around 10,000 vector processors, rather than the much larger
number of processors needed when using a scalar microprocessor architecture.
CL: One last question. With the US government providing a lot of R&D funding,
in effect subsidising US vendors to enable them to deliver a high productivity
supercomputer with one petaflop/s sustained performance by 2009, can you
briefly describe the business vision of NEC not only in supercomputing, but
for delivering a high productivity total solution to the user?
TW: NEC will continue to support current SX vector architecture for high end
capability computing. The vector architecture provides very efficient
processing capability, particularly for many technical applications, such as
meteorology, computational physics and chemistry, and crash analysis. As far
as the challenge from the USA on petaflop/s systems by 2009, as a response to
the Earth Simulator, NEC will be there, leading the field. The technology
breakthroughs coming out of this kind of project are transferable to NEC's
volume products and hence the R&D costs are amortised across the other NEC
businesses.
CL: I think we explored a number of issues. Thank you for your time in talking
to me and I am sure our readers would find this update very interesting.
(Brands and names are the property of their respective owners) Copyright:
Christopher Lazou, HiPerCom Consultants, Ltd., UK. June 2004.
|