HPCwire
 The global publication of record for High Performance Computing / November 19, 2004: Vol. 13, No. 46

  |  Table of Contents  |  

Features:

METEOROLOGY CODES PROVIDE PERFORMANCE REALITY CHECK
by Christopher Lazou, HiPerCom Consultants, Ltd.

In my last article from the ECMWF Workshop, I reported that ECMWF measured 6.89% sustained using 2048 IBM P690+ processors, on their IFS (T799 L91) model for a 10-day forecast. This is equivalent to a measured 1.073 Teraflop/s of sustained performance. Then I constructed a thought experiment to calculate that one needs around 22-25 nodes, less than 200 SX-8 CPUs, to deliver the 1.073 Teraflop/s sustained when running IFS. Remember this was just a thought experiment, as I have not seen any results from the SX-8.

This article reports on some real results on the Earth Simulator and the early results from the IBM Blue Gene/L in the meteorology field. I also briefly report on the progress in providing a unified Earth System Model (ESM) and efforts for standardizing interfaces to allow coupling of various models.

Keiko Takahashi from the Earth Simulator Centre, JAMSTEC, gave a presentation titled: "Non-hydrostatic Atmospheric GCM Development and its Computational Performance on the Earth Simulator (ES) system." (To remind the reader, the ES system has 5120 processors based on the vector parallel NEC SX-6 technology, with a 40Teraflop/s peak performance).

Takahashi stated that they developed a non-hydrostatic coupled atmosphere- ocean simulation code from scratch to maximise the capability of the ES. He went on to describe design details of the new code and show some preliminary validation results with respect to non-hydrostatic atmospheric phenomena. He also presented some excellent high sustained, performance results from this atmospheric/ocean simulation code, attained on the ES.

The new code consists of a reduced grid with orthogonal co-ordinates (same as the latitude and longitude geometry) but with no polar singularity. This is the same grid structure of N and E component, easy to nest, is highly parallel, but one needs to take care of conservation law. To do this, one computes all fluxes on computational grid and applies a correction to the conservation law. For flux FEF on a circular arc EF, computed by the budget of fluxes fN on grid ABCD of N system, and the flux fE is estimated on a circular arc, GHI of E system. Using this conservative scheme, they evaluated that the evolution of relative error of mass over time, changed within the limit of rounding error. Using this method both the solid body rotation field and the 2nd-order accuracy is maintained.

The CIP-CSLR scheme used is a conservative semi-Lagrangian scheme with rational function (Xiao et al. 2002) and based on CIP (Cubic-interpolated pseudo-particle, Yabe et al. 1991). It is used to predict both cell- integration and interface, like CIP for basic variables and their spatial gradient, which makes it more accurate, but slightly increases computation. The scheme is conservative, oscillation-free, and has the merit of no additional limit needed. This provides high-accuracy calculations for one cell.

The approach is to use a nested system with different resolutions. For seasonal/annual, the model uses 2.6Km in the horizontal and 100 layers in the vertical. For days/weeks this is reduced to 1.2Km, 100 layers, for local events 100~500m, 200 layers and urban events 10~200m and 200 layers.

The preliminary validation of the global non-hydrostatic model, using cloud microphysics, was tested on the typhoon, which hit Japan on the 7-11th August 2003. The typhoon caused heavy local rain in the Japanese regions of Kinki, Tokai and Hokkaido.

Takahashi said: "Comparing the statistics between AFES and our new code with 10 km resolution for horizontal, our code is 10 times faster in computational performance without CIP-CSLR. A 2.6km in the horizontal with resolution points of 11,520x3840x96x2, using 512nodes (4,096 processors) on the ES, has attained 19.65Tflop/s, sustained performance, i.e. 59.9% of the available 32.8Teraflop/s peak."

Their immediate future plans are to reproduce/predict non-hydrostatic phenomena such as typhoons, heavy rain in Baiu season, tornadoes and so on. They intend to perform, many validation experiments for each of the atmosphere and ocean components. Also, perform cost-tuning experiments, with CIP-CSLR, using regional coupled simulation codes with ultra high resolution.

The high productivity (sustained) performance of the Earth Simulator was, also reported by Michel Desgagne, environment Canada, when simulating Hurricane EARL (of September 1998 vintage) on the ES, with the Canadian MC2 community model. The forecast at 1km resolution and 11,000x8640x51 grid, took 6.75 days on 495 ES nodes (3960 processors) and achieved 13Tflop/s. This amounts to 3.28Gflop/s per processor (41%) sustained when using, 77% of the total ES system.

Rich Loft, NCAR, gave a presentation titled, "Price and power approaches to advancing atmospheric science." The physics of cloud formation in general circulation models are currently simulated using phenomenological parameterisations. The dream of direct numerical simulation of cloud processes on a global scale awaits at least six orders of magnitude of improvement in computational power. Cloud Resolving Convection Parameterisation (CRCP), also known as super-parameterisation, is a compromise technique that promises to improve the simulation of sub-grid scale cloud processes in climate models. Unfortunately, CRCP is still two to three orders of magnitude more expensive than traditional parameterisation techniques. Supercomputing systems, and the infrastructure that support them, face huge challenges providing the kind of computing power necessary to perform more realistic simulations of cloud processes using the CRCP approach. This situation clearly motivates the demand for future production-quality computing systems that deliver not only dramatically better-sustained price-performance, but also better power- performance as well.

Recently, IBM has created a densely packaged, massively parallel supercomputer call Blue Gene/L with low electrical power requirements that looks promising for CRCP applications. Researchers at NCAR, in collaboration with IBM researchers, have built a CRCP package optimised for Blue Gene/L and have combined this package with a scalable and efficient GCM dynamical core based upon spectral elements. The result is a CRCP-based atmospheric model capable of exploiting Blue Gene/L scalability and computational power to practically realise scientifically useful integration rates for multi-year simulations.

Rich Loft went on to say that Blue Gene/L seems to be an important architecture for many reasons. Power/space & "fuel" efficiency $/Teraflop/s sustained, fast reduction network. On an 80 km, 20 level explicit HOMME model, they achieved 587Gflop/s on 1944 nodes of Blue Gene/L. It produced 13.7 simulated years/day (a useful climate rate). A 0.1degree "eddy permitting" POP ocean model would be the next interesting thing to look at, since barotropic CG-solver for the resultant elliptic equation needs fast global sums.

To remind readers, a Blue Gene/L node consists of 2 processors and 4MBytes memory on a chip, with 5.8GFlop/s peak performance per node. There are 64 processors on a board, and 32 boards in a cabinet. The model was run, on 1944 nodes, (11.275Teraflop/s peak) slightly less than 2 cabinets, but utilizing only 1 processor per node. They have not had the machine access time to evaluate running two processors per node, which cuts memory available per process in half. The computation is embarrassingly parallel which favours T- type systems and by discarding half the processors the test achieved 10.4% (or, in reality 5.2% of the actual system used) efficiency on the Blue Gene/L.

In the view of this (Lazou) reporter, this raises the question as to whether the full configuration of 65,536 processors (360Tflop/s peak) Blue Gene/L system would come anywhere near achieving productivity (sustained performance) as good as the Earth Simulator, on models such as those described by Takahashi and Desgagne, above. The low power consumption and small space of the Blue Gene/L (typical of T-type systems) are impressive, but the recent Linpack results and $/Teraflop/s metrics are likely to be overoptimistic as far as capability and productivity is concerned. The small memory on the node (4MB) will severely restrict efficiency on a large range of applications, not embarrassingly parallel.

Returning back to the Rich Loft presentation, the basic idea for CRCP or super-parameterisation is to represent sub grid scales of the 3D large-scale model (with horizontal resolution ~ 100 km) by embedding a 2D cloud resolving, model in each column of the large-scale model. This involves thousands of 2D cloud resolving models interacting in a way consistent with large-scale dynamics. It is embarrassingly parallel but extremely expensive (150x over traditional physics).

CRCP is the next step in the quest for a cloud-system-resolving AGCM. The computational cost of the following simulations are, approximately the same: A millennium-long simulation using a traditional climate model; a few years-long simulation using a traditional climate model with CRCP; a day-long simulation of a cloud-system-resolving AGCM O(few Km). The cost of each is separated by 3 orders of magnitude.

Using 2D CRCP allows for better representation of convection, clouds and radiation-transfer and surface exchange. It allows for a dynamic response to changes in cloud parameters (particle sizes, aerosol characteristics and precipitation mechanisms at approximately correct length scales. For example, the cloud top long-wave is cooling in response to particle size change.

The talk went on to discuss how exponential trends in scientific applications and computer technology must interact to maximise scientific productivity within fixed budgets.

In the course of this workshop, many other presentations dealt with solving substantive problems by coupling weather and ocean models. Community frameworks for building coupled Earth system models have been an area of intense research and development over the last few years. The GDFL Flexible Modelling System (FMS) has been in active use for about five years. Two broad- based efforts to develop frameworks across the community are now approaching maturity levels that allows for actual deployment in the very near future: The Earth System Modelling Framework (ESMF) in the U.S.A. and the Programme for Integrated Earth System Modelling (PRISM) in Europe. GDFL has been a key participant in the development of both. Specifically the GDFL community ocean model MOM4, initially developed as an FMS component, has since been cast as both a PRISM and ESMF prototype component.

At the other end of the spectrum, forecasts are used to validate the US inter- organizational modelling initiative known as the Weather Research and Forecast (WRF) model. WRF has a three-pronged objective of developing a) the next generation meso-scale Numerical Weather Prediction (NWP) modelling system for research and operations; b) a common modelling infrastructure that facilitates operational NWP collaboration, scientific interoperability, accelerates the transfer of new science from research into operations; and c) a repeatable process that continuously infuses innovations and capabilities into the community meso-scale NWP modelling system.

As principal partners of this (U.S.) national effort, the Air Force Weather Agency (AFWA) and Fleet Numerical Meteorology and Oceanography Centre (FNMOC) have been able to leverage a vast array of resources only available to Department of Defence (DoD) entities. In particular the resources made available through the DoD's High Performance Computing Modernisation Programme (HPCMP), whose objective is to facilitate the rapid application of advance technology into superior war-fighting capabilities. Examples of sand storm predictions over Iraq and Saudi Arabia were presented. This is part of the U.S. doctrine of acquiring full-spectrum dominance of military capabilities, enabling it to impose unilateral solutions at will. In operational mode Teraflop/s and even Petaflop/s computing power is essential, as timeliness is critical.

(Brands and names are the property of their respective owners) Copyright: Christopher Lazou, HiPerCom Consultants, Ltd., UK. November 2004.


Top of Page

  |  Table of Contents  |