
Features:
METEOROLOGY CODES PROVIDE PERFORMANCE REALITY CHECK
by Christopher Lazou, HiPerCom Consultants, Ltd.
In my last article from the ECMWF Workshop, I reported that ECMWF measured
6.89% sustained using 2048 IBM P690+ processors, on their IFS (T799 L91) model
for a 10-day forecast. This is equivalent to a measured 1.073 Teraflop/s of
sustained performance. Then I constructed a thought experiment to calculate
that one needs around 22-25 nodes, less than 200 SX-8 CPUs, to deliver the
1.073 Teraflop/s sustained when running IFS. Remember this was just a thought
experiment, as I have not seen any results from the SX-8.
This article reports on some real results on the Earth Simulator and the early
results from the IBM Blue Gene/L in the meteorology field. I also briefly
report on the progress in providing a unified Earth System Model (ESM) and
efforts for standardizing interfaces to allow coupling of various models.
Keiko Takahashi from the Earth Simulator Centre, JAMSTEC, gave a presentation
titled: "Non-hydrostatic Atmospheric GCM Development and its Computational
Performance on the Earth Simulator (ES) system." (To remind the reader, the ES
system has 5120 processors based on the vector parallel NEC SX-6 technology,
with a 40Teraflop/s peak performance).
Takahashi stated that they developed a non-hydrostatic coupled atmosphere-
ocean simulation code from scratch to maximise the capability of the ES. He
went on to describe design details of the new code and show some preliminary
validation results with respect to non-hydrostatic atmospheric phenomena. He
also presented some excellent high sustained, performance results from this
atmospheric/ocean simulation code, attained on the ES.
The new code consists of a reduced grid with orthogonal co-ordinates (same as
the latitude and longitude geometry) but with no polar singularity. This is
the same grid structure of N and E component, easy to nest, is highly
parallel, but one needs to take care of conservation law. To do this, one
computes all fluxes on computational grid and applies a correction to the
conservation law. For flux FEF on a circular arc EF, computed by the budget of
fluxes fN on grid ABCD of N system, and the flux fE is estimated on a circular
arc, GHI of E system. Using this conservative scheme, they evaluated that the
evolution of relative error of mass over time, changed within the limit of
rounding error. Using this method both the solid body rotation field and the
2nd-order accuracy is maintained.
The CIP-CSLR scheme used is a conservative semi-Lagrangian scheme with
rational function (Xiao et al. 2002) and based on CIP (Cubic-interpolated
pseudo-particle, Yabe et al. 1991). It is used to predict both cell-
integration and interface, like CIP for basic variables and their spatial
gradient, which makes it more accurate, but slightly increases computation.
The scheme is conservative, oscillation-free, and has the merit of no
additional limit needed. This provides high-accuracy calculations for one
cell.
The approach is to use a nested system with different resolutions. For
seasonal/annual, the model uses 2.6Km in the horizontal and 100 layers in the
vertical. For days/weeks this is reduced to 1.2Km, 100 layers, for local
events 100~500m, 200 layers and urban events 10~200m and 200 layers.
The preliminary validation of the global non-hydrostatic model, using cloud
microphysics, was tested on the typhoon, which hit Japan on the 7-11th August
2003. The typhoon caused heavy local rain in the Japanese regions of Kinki,
Tokai and Hokkaido.
Takahashi said: "Comparing the statistics between AFES and our new code with
10 km resolution for horizontal, our code is 10 times faster in computational
performance without CIP-CSLR. A 2.6km in the horizontal with resolution points
of 11,520x3840x96x2, using 512nodes (4,096 processors) on the ES, has attained
19.65Tflop/s, sustained performance, i.e. 59.9% of the available
32.8Teraflop/s peak."
Their immediate future plans are to reproduce/predict non-hydrostatic
phenomena such as typhoons, heavy rain in Baiu season, tornadoes and so on.
They intend to perform, many validation experiments for each of the atmosphere
and ocean components. Also, perform cost-tuning experiments, with CIP-CSLR,
using regional coupled simulation codes with ultra high resolution.
The high productivity (sustained) performance of the Earth Simulator was, also
reported by Michel Desgagne, environment Canada, when simulating Hurricane
EARL (of September 1998 vintage) on the ES, with the Canadian MC2 community
model. The forecast at 1km resolution and 11,000x8640x51 grid, took 6.75 days
on 495 ES nodes (3960 processors) and achieved 13Tflop/s. This amounts to
3.28Gflop/s per processor (41%) sustained when using, 77% of the total ES
system.
Rich Loft, NCAR, gave a presentation titled, "Price and power approaches to
advancing atmospheric science." The physics of cloud formation in general
circulation models are currently simulated using phenomenological
parameterisations. The dream of direct numerical simulation of cloud processes
on a global scale awaits at least six orders of magnitude of improvement in
computational power. Cloud Resolving Convection Parameterisation (CRCP), also
known as super-parameterisation, is a compromise technique that promises to
improve the simulation of sub-grid scale cloud processes in climate models.
Unfortunately, CRCP is still two to three orders of magnitude more expensive
than traditional parameterisation techniques. Supercomputing systems, and the
infrastructure that support them, face huge challenges providing the kind of
computing power necessary to perform more realistic simulations of cloud
processes using the CRCP approach. This situation clearly motivates the demand
for future production-quality computing systems that deliver not only
dramatically better-sustained price-performance, but also better power-
performance as well.
Recently, IBM has created a densely packaged, massively parallel supercomputer
call Blue Gene/L with low electrical power requirements that looks promising
for CRCP applications. Researchers at NCAR, in collaboration with IBM
researchers, have built a CRCP package optimised for Blue Gene/L and have
combined this package with a scalable and efficient GCM dynamical core based
upon spectral elements. The result is a CRCP-based atmospheric model capable
of exploiting Blue Gene/L scalability and computational power to practically
realise scientifically useful integration rates for multi-year simulations.
Rich Loft went on to say that Blue Gene/L seems to be an important
architecture for many reasons. Power/space & "fuel" efficiency $/Teraflop/s
sustained, fast reduction network. On an 80 km, 20 level explicit HOMME model,
they achieved 587Gflop/s on 1944 nodes of Blue Gene/L. It produced 13.7
simulated years/day (a useful climate rate). A 0.1degree "eddy permitting" POP
ocean model would be the next interesting thing to look at, since barotropic
CG-solver for the resultant elliptic equation needs fast global sums.
To remind readers, a Blue Gene/L node consists of 2 processors and 4MBytes
memory on a chip, with 5.8GFlop/s peak performance per node. There are 64
processors on a board, and 32 boards in a cabinet. The model was run, on 1944
nodes, (11.275Teraflop/s peak) slightly less than 2 cabinets, but utilizing
only 1 processor per node. They have not had the machine access time to
evaluate running two processors per node, which cuts memory available per
process in half. The computation is embarrassingly parallel which favours T-
type systems and by discarding half the processors the test achieved 10.4%
(or, in reality 5.2% of the actual system used) efficiency on the Blue Gene/L.
In the view of this (Lazou) reporter, this raises the question as to whether
the full configuration of 65,536 processors (360Tflop/s peak) Blue Gene/L
system would come anywhere near achieving productivity (sustained performance)
as good as the Earth Simulator, on models such as those described by Takahashi
and Desgagne, above. The low power consumption and small space of the Blue
Gene/L (typical of T-type systems) are impressive, but the recent Linpack
results and $/Teraflop/s metrics are likely to be overoptimistic as far as
capability and productivity is concerned. The small memory on the node (4MB)
will severely restrict efficiency on a large range of applications, not
embarrassingly parallel.
Returning back to the Rich Loft presentation, the basic idea for CRCP or
super-parameterisation is to represent sub grid scales of the 3D large-scale
model (with horizontal resolution ~ 100 km) by embedding a 2D cloud resolving,
model in each column of the large-scale model. This involves thousands of 2D
cloud resolving models interacting in a way consistent with large-scale
dynamics. It is embarrassingly parallel but extremely expensive (150x over
traditional physics).
CRCP is the next step in the quest for a cloud-system-resolving AGCM. The
computational cost of the following simulations are, approximately the same: A
millennium-long simulation using a traditional climate model; a few years-long
simulation using a traditional climate model with CRCP; a day-long simulation
of a cloud-system-resolving AGCM O(few Km). The cost of each is separated by 3
orders of magnitude.
Using 2D CRCP allows for better representation of convection, clouds and
radiation-transfer and surface exchange. It allows for a dynamic response to
changes in cloud parameters (particle sizes, aerosol characteristics and
precipitation mechanisms at approximately correct length scales. For example,
the cloud top long-wave is cooling in response to particle size change.
The talk went on to discuss how exponential trends in scientific applications
and computer technology must interact to maximise scientific productivity
within fixed budgets.
In the course of this workshop, many other presentations dealt with solving
substantive problems by coupling weather and ocean models. Community
frameworks for building coupled Earth system models have been an area of
intense research and development over the last few years. The GDFL Flexible
Modelling System (FMS) has been in active use for about five years. Two broad-
based efforts to develop frameworks across the community are now approaching
maturity levels that allows for actual deployment in the very near future: The
Earth System Modelling Framework (ESMF) in the U.S.A. and the Programme for
Integrated Earth System Modelling (PRISM) in Europe. GDFL has been a key
participant in the development of both. Specifically the GDFL community ocean
model MOM4, initially developed as an FMS component, has since been cast as
both a PRISM and ESMF prototype component.
At the other end of the spectrum, forecasts are used to validate the US inter-
organizational modelling initiative known as the Weather Research and Forecast
(WRF) model. WRF has a three-pronged objective of developing a) the next
generation meso-scale Numerical Weather Prediction (NWP) modelling system for
research and operations; b) a common modelling infrastructure that facilitates
operational NWP collaboration, scientific interoperability, accelerates the
transfer of new science from research into operations; and c) a repeatable
process that continuously infuses innovations and capabilities into the
community meso-scale NWP modelling system.
As principal partners of this (U.S.) national effort, the Air Force Weather
Agency (AFWA) and Fleet Numerical Meteorology and Oceanography Centre (FNMOC)
have been able to leverage a vast array of resources only available to
Department of Defence (DoD) entities. In particular the resources made
available through the DoD's High Performance Computing Modernisation Programme
(HPCMP), whose objective is to facilitate the rapid application of advance
technology into superior war-fighting capabilities. Examples of sand storm
predictions over Iraq and Saudi Arabia were presented. This is part of the
U.S. doctrine of acquiring full-spectrum dominance of military capabilities,
enabling it to impose unilateral solutions at will. In operational mode
Teraflop/s and even Petaflop/s computing power is essential, as timeliness is
critical.
(Brands and names are the property of their respective owners) Copyright:
Christopher Lazou, HiPerCom Consultants, Ltd., UK. November 2004.
|