
Features:
PRODUCTIVITY - A TRUE MEASURE OF SUPERCOMPUTING
by Christopher Lazou
May 24-27, 2004, Kiel, Germany: Seventy five experts, from thirteen mostly
European countries, attended the NEC User Group, NUG-XVI meeting in the
Maritim Hotel Bellevue, which overlooks the Baltic sea in the northern city of
Kiel, Germany. This meeting was hosted by, the German Climate Computing
Centre, the Deutsches Klimarechenzentrum GmbH (DKRZ) Hamburg. Attendance and
consequently the talks were mainly from computer centre directors explaining
how they manage their own computing facilities to support their core business
plus NEC support staff describing new hardware and software developments.
Below are a few extracts from these talks, which should be of interest to the
supercomputer community.
This get together, enabled experts in computing, meteorology and other
technical engineering fields, to share their latest research results,
crystallise future hardware and software needs and collectively leverage NEC
to take these onboard in their development plans for new systems. These needs
are not only for faster more powerful (petaflop/s) systems for the scientific
and technical market, but increasingly also in data management and storage
handling of petabytes size file systems. In short an infrastructure to deliver
a total solution.
In recent years meteorology evolved from its esoteric weather prediction role
and has become a high profile e-business, with enormous commercial potential.
The UK Met Office for example sells hundreds of different products to a
diverse set of clients. Because of global warming the frequency of extreme
events has increased, so UK Met wants to refine their local weather prediction
model from a 12Kms grid down to 2Kms to improve the accuracy of their
prediction. The importance of meteorology was reflected in the NUG programme
where the whole of Monday was allocated to the Special Interest Group for
Meteorology Applications (SIG-MA). Speakers came from weather and climate
centres across the globe from Australia, Japan, Brazil and most of those in
European countries.
The host site, DKRZ was founded in 1987 with the mission to provide state-of-
the-art supercomputing, data handling and associated services, including high-
level visualization, to the German scientific community, in order to conduct
large-scale earth system and climate modelling.
Three years ago DKRZ upgraded its computer systems focusing on productivity
and became one of Europe's fastest supercomputer facilities in production at
that time, used for climate research. They purchased the then latest
supercomputers from NEC, the SX-6 series with half a teraflop/s sustained
performance, a unified data management system, based on the Intel IA-64
(Itanium) architecture and Linux.
With the advent of new technology, one trend in high performance computing is
the fusion of computation, simulation and data analysis. With advance
satellite technology delivering massive data streams in the earth systems and
climate area, the challenges and opportunities for fusing observational and/or
experimental data with classical simulation have increased enormously.
To address this new reality, DKRZ developed a unified concept, capable of
delivering a total solution with transparent access for the climate user
community. In addition to the high compute servers, an integrated distributed
data management system was specified as an essential part of this upgrade. To
achieve this, new hardware and software had to be put in place to support the
high numerical calculations, high networking demands and a scalable
architecture unified shared file system and archive, to handle the massive
volume of generated new data. As Dr. Wolfgang Sell, director of DKRZ said:
"Vector architecture machines deliver data to the application on time".
Another speaker Dr. Francois Mescam, director of computing and networking at
the French research aerospace agency (ONERA) whose mission is equivalent to
that of NASA in the USA, explained why for aerospace applications,
productivity could only be achieved by using vector parallel supercomputer
systems.
ONERA is the scientific and technical government agency reporting to the
French Ministry of Defence and employs some 2,000 people. Established over 50
years ago, ONERA has been actively involved in all major French and European
aerospace programs, including Mirage, Concorde, Airbus including the latest
super jumbo-jet A380, which it is claimed can carry up to 800 (555 normal)
passengers, space vehicles, such as Ariane, Rafale, and many more. Its
partners include large companies such as SNEGMA (aircraft engines), THALES,
Airbus France, Eurocopter and others. Headquartered in Chƒtillon, in the
Paris suburbs, the agency has eight sites throughout France, including
Palaiseau (Paris region), the Toulouse Research Centre and the Modane Wind
Tunnels in the French Alps.
With ONERA, France, NLR, Holland and DLR, Germany currently using NEC SX
vector supercomputers, this means that in Europe at least, the aerospace
industry has decided that the NEC SX series, with its vector/parallel
architecture and large shared memory, provides the extra power and delivers
unparallel productivity the true measure of supercomputing. This enabled
engineers to achieve efficiencies in design, and gain a competitive edge. The
use of the same system across the European consortium made integration much
easier. As Dr. Gerard Hameetman, the Director of computing at NLR told me over
dinner: "Using the NEC SX-6, NLR developed new light antiglare material, used
between the cockpit and the wings of the Airbus A380 plane and this saved
1000Kgms, the equivalent of 10 passengers.
To digress a little, cost savings achieved by the use of supercomputers, and
the integration of all design functions and production using a virtual
enterprise model, enabled European companies to consolidate their position in
the market. For example, Airbus is now the largest manufacturer of civilian
aircraft.
The competition between Boeing and Airbus is putting enormous pressure on both
companies to cut costs. As 80% of costs are fixed by the preliminary design
chosen, it is important to cut costs at the margin. Today the building of the
Airbus involves more than 50 thousand people across Europe. The design teams
are merging their work sharing designs and risks whilst people are working in
different countries.
The dream in Europe is for the virtual designed aircraft to be a realistic
one. For example, in the last five years there was a drive to reduce design
times for the aircraft wing, from 2 years to less than one month; but to
achieve this, it required more than a hundred times improvement in human
efficiency. This is why high performance supercomputing moved centre stage and
became a critical and essential element in the design process.
In fact the aerospace industry benefited enormously from using parallel vector
supercomputers and is one of the industries, which provides the financial
impetus for sustaining the development of shared memory vector machines, such
as the NEC SX series, and also was a pressure point for the recent
resuscitation of Cray Inc., in the USA.
In most high technology industries the margin between success and failure is
very narrow. In the past three decades very few commercial aircraft were
successful enough to make their manufacturer profit. The Boeing aircraft and
the European Airbus are notable exemptions. The economics of aircraft
operation are such that even a small improvement in efficiency can translate
into substantial savings in operating costs. Therefore, the operating
efficiency of an aeroplane is a major attraction to potential buyers and
aircraft manufacturers have a compelling incentive to design the most
efficient operating aircraft. That Boeing lost its primacy in civilian
aircraft sales and has now been overtaken by the European Airbus is in part
probably caused by design constraints due to the unavailability of modern
vector parallel systems in the USA during the last eight years, until the
recent arrival of Cray X1.
Prof. Dr.-Ing. Michael Resch, director of the High Performance Computing
Centre in Stuttgart (HLRS) gave a presentation titled: "Vector Systems: The
key to sustained Teraflop/s". This is not wishful thinking since Stuttgart
recently completed a competitive procurement and purchased a new NEC SX series
system with a four Teraflop/s sustained performance. HLRS supports users from
R&D in the use of leading edge supercomputer technology and its applications.
The mission of HLRS is to provide its users with tools and expertise to
achieve top international positions in their research field. Capabilities and
economy of scale are possible, at the high-end, through a joint operation of
supercomputer systems with T-Systems Solutions for Research GmbH and Porsche
AG, who provide 25% of the HLRS income. For Porsche the benefit of using
supercomputers is quantifiable, namely, as the shortest time to deliver a
solution for crash analysis.
Michael Resch set the tone about the situation of supercomputing by showing a
slide with the following two quotes:
"Computational scientists have seen a frustrating trend of stagnating
application performance despite dramatic increases in the claimed peak
capability of HPC Systems". Oliker et al, 2004, LBNL, USA
"The fact that there has been no fundamental advance in high-performance
capability computers in the last eight years has forced these communities to
adapt less qualified commercial offerings to the solution of their problems".
Vincent Scarafino Manager, Numerically Intensive Computing, Ford Motor
Company, Before the Committee on Science, US House of Representatives, July
16, 2003.
Professor Resch, then shared findings from the recent HLRS procurement
exercise. They surveyed many supercomputer sites and analysed performance
results for three types of systems, super-scalar microprocessors, clusters and
parallel vector processors. The price/performance graphs showed, what some of
us have been saying for years, namely that: "Using a price/peak performance
metric, vector systems appear to be more expensive, but are much better
performing when using a price/sustained performance measure". When hidden
costs of reliability, staff to operate the system, electrical power use,
cooling costs and machine room costs are taken into account, the measure of
Total Cost of Ownership (TCO) was very clear: "The cost for achieving one
Teraflop/s is much more expensive for microprocessor systems, a factor of two
compared to the NEC SX series, parallel vector systems".
The recent developments of including 2 cores on a chip, increases CPU speed,
but the memory gap gets worse. Their survey found that most users in HPC use
512=>1000 CPUs and capability is the reserve of vector systems, while
throughput can be achieved using clusters. The message is: "Use the right
architecture for the right task".
At this point I could not help but muse: "USA microprocessor and cluster
vendors promise a system to climb Everest and deliver a flight ticket to
Kathmandu".
Apart from technology overviews, including a fascinating one from Mr. T.
Kondo, CEO of NEC Solutions, many interesting user talks were given,
illustrating fascinating results from supercomputing centres. Other talks
recounted how computer centres struggle to solve the ever burgeoning data
handling problem. Peter Haas, HLRS gave a talk titled: Advanced Network
Technology and Parallel File Systems", describing the latest thinking of how
the international community is tackling this problem.
Dr. Sell computer director of DKRZ summed up by saying: "There are many
technical challenges and unresolved IT-Issues in Earth Systems Modelling. I am
very happy in using NEC expertise and dedication to attain effective
solutions". This sentiment was even more positive after users listened to NEC
executives, under non-disclosure conditions, about NEC's future products.
All in all, NUG provided a fruitful meeting and an enjoyable stay in Kiel. The
user community not only shared their own research and operational experiences,
but also had the opportunity to leverage technology from all of NEC's
businesses for their own benefit.
The next article will report on an exclusive interview I had with Mr. Tadashi
Watanabe, NEC Vice President and designer of the NEC parallel vector SX series
systems. The interview explores his views on the state of the industry and his
vision for achieving petaflop/s Computing.
(Brands and names are the property of their respective owners) Copyright:
Christopher Lazou, HiPerCom Consultants, Ltd., UK. June 2004.
|