HPCwire
 The global publication of record for High Performance Computing / September 19, 2003: Vol. 12, No. 37

  |  Table of Contents  |  

Features:

IBM P4: HANDICAP FOR HIGH RESOLUTION EARTH SYSTEM MODELING?
by Christopher Lazou

Some 90 meteorologists and HPC experts from 15 countries and 4 continents attended the bi-annual CAS2K3 workshop on the use of HPC in meteorology, held at the idyllic Imperial Palace Hotel, Annecy, France, organised by the National Centre for Atmospheric Research (NCAR), USA. This excellent relatively small and friendly workshop provided a tour de force in meteorological and computing techniques by active practitioners striving to maximise the latest HPC technology to refine and improve their climate prediction models. Most presenters came from sites in the USA with large IBM P3/4 systems, while the European contingent included a strong representation from sites with large NEC SX-6 systems. This article highlights a few of the many issues raised by presentations given at this workshop.

There were 43 presentations in 4 days and a live demonstration of the Grid's enabling potential for international collaboration within the community of climate system modelling (CCSM). The talks were crammed with technical information on how to use parallel supercomputers for computation using mathematical models, which describe climate/weather patterns over time. They were interspersed with performance figures, weather maps and video pictures from simulations and these were compared with satellite pictures of the behaviour of actual weather events.

Why are meteorologists doing all this Earth System Modelling and what is the urgency? Dramatic reports of flooding and other climate change events now appear frequently in the press and on television (Hurricane Isabel hitting North Carolina as I write). These images are injecting a political dimension into the proceedings.

Climate simulations show that intensely hot summers and increase in rainfall, causing flooding, are likely to become more common. The reduction of snow in the north, the melting of glaciers, the projection of no snow in the north in year 2100 (even at the North Pole) and the implied rise in sea level raise questions on the state of the atmosphere, the ocean, sea-ice, the land surfaces and mankind. In short, there is a perceived pending catastrophe, because of global warming exacerbated by greenhouse gases and other pollutants from human activities.

A study at the Max Planck Institute, Germany, comparing CO2, parts per million (ppm) and temperature change, shows that over 400,000 years prior to the year 1850 the temperature changed from -8oC to 0oC (rose by 8oC) while the CO2 changed from 200 (ppm) to 280 (ppm). From 1850 to today the temperature rose by 1.4oC and the CO2 rose to 550 (ppm). From today to 2100 year, the CO2 (ppm) is expected to rise from 550 (ppm) to 960 (ppm) and the temperature from 1.4oC to 5.8oC. This is the so-called Vostok curve and the gap indicates the severity of the situation.

Some scenarios show that sea level rise alone could deprive two billion people of food in the next hundred years. Insurance companies cannot protect against consequences of this magnitude. Thus the stakes are high and finding answers to the socio-economic effects of climate change has climbed to the top of the political agenda.

The key goal of the climate change efforts is to develop and enhance our capability to monitor and predict how the Earth System is evolving.

Temporal scales seasonal and inter-annual, weather forecasting and climate change predictions are dominated by initial conditions of the atmosphere, the oceans and by forcing factors (naturally-occurring and human-induced).

Dr. William Collins, from NCAR and Chair of the scientific steering committee for the Community Climate System Model (CCSM), in a keynote address explained that CCSM is a comprehensive system for simulating the past, present and future climates of the Earth. It currently consists of four major components representing the atmosphere, ocean, sea ice, and land surface. The exchange of energy, water and other constituents at the interfaces among these components is simulated using a flux coupler.

The CCSM resulted from a collaborative development effort involving NCAR, university investigators and scientists from several US federal agencies. One of the distinguishing features of CCSM is that the complete source code, documentation and simulation data sets are freely distributed to the international climate research community. A new version of the model, CCSM3, has been developed to facilitate work on a wide variety of scientific problems. These include the interactions between aerosols and climate, the relative importance of natural and anthropogenic forcing from the last millennium, and the nature of abrupt climate change. Results from CCSM3 will form the basis for NCAR's contribution to forthcoming international (IPCC and WMO) climate assessments. This talk focused on major new features and improvements in CCSM3 relative to its predecessors.

These include new radiation and cloud parameterisations in the atmosphere; heating of the ocean surface by chlorophyll and detailed vegetation ecology. The improvements in simulations of present-day climate produced by the new model physics were illustrated with recent coupled experiments.

In the next few years, the CCSM will be expanded to include reactive troposphere chemistry, detailed aerosol physics and microphysics, comprehensive biogeochemistry and ecosystem dynamics, and the effects of urbanization and land use change. These new capabilities will considerably expand the scope of earth system science that can be studied with CCSM and other climate models of similar complexity. The computer requirements, for the next generation of comprehensive climate models, can only be satisfied by major advances in computer hardware, software, and storage.

The major atmospheric research centres now have systems consisting of several hundred NEC SX-6 processors or up to a thousand and more IBM P3/4 processors. In either case they can achieve about a half Teraflop/s sustained and even Teraflop/s on certain application codes. The exemption to this is the Earth Simulator based on NEC SX-6 technologies in Japan, which delivers over 12Teraflop/s sustained performance.

Thus with Teraflop/s sustained computing on the horizon and occasionally on stream, meteorologists are moving from Climate to Earth System Modelling (ESM). This is because feedback loops of climate system with other relevant systems like ecology and socio-economy are not negligible. Climate Modelling is not possible without proper representation of these systems hence ESM. Earth System Modelling is: Multi (time and space) scale, multi process, multi topical (physics, chemistry, biology, geology, economy…). It is both very compute and data intensive. Some people claim it requires several orders of magnitude more computing power to tackle the problem. Petaflop/s and Hexaflop/s are therefore eagerly awaited.

Bill Collins said: "A factor of 150 times the present NCAR computing resources is needed to accommodate CCSM requirements over the next 5 years, i.e. by year 2008. Moore's Law will only deliver an eightfold increase. How this deficiency is to be remedied is a great challenge. Although special architectures, like the IBM Blue Gene for protein folding are in the pipeline, they have limited instruction sets. This is because protein folding deals with very simple equations. This architecture is not suited to ESM, which needs a small number of fat nodes, rather than the thousands of processors as in the Blue Gene." He went on to say that CCSM is forty times slower on the 5.2Teraflop/s IBM P4 NCAR system compared to the Japanese Earth Simulator.

The message, that capacity computers such as the IBM P4+ systems are unsuitable for high resolution Earth System Modelling, was re-enforced by many of the speakers. For example, Dr. Albert Semtner, Naval Postgraduate School, Monterey, California, in his talk described ocean and ice models that are capable of reproducing the observed mean states and variability of the global ocean and its sea ice. It is necessary to use horizontal grid spacing less than 10Km for both ocean and ice, indicated by a comparison of simulated model results with observational statistics. As a result the most advanced computing systems are required to run these models.

Specific results were shown from running the Parallel Ocean Program (POP) and the Sea Ice Model developed at Los Alamos Laboratory. The output from a number of simulations conducted by investigators at the Naval Postgraduate School and their collaborators were evaluated against observations.

The simulations were conducted on large IBM, NEC and Cray machines. His findings are stark. "Only systems that deliver multiple teraflop/s of sustained performance can be used to project climatic conditions out for many centuries, with highly realistic ocean and ice interactions in terms of spatial and temporal evolution On sub-teraflop/s systems, ensemble forecasting of ocean and ice for optimal ship routing and other marine applications can be done for time-scales of months."

He illustrated this by showing results obtained from the IBM P3 and an NEC SX- 6. Using a model of ~6.5Km spacing over the ocean, the simulation on a 500 processors IBM P3, took eight days to simulate fifteen years. This same model simulated 300 years in just eight hours on 960 processors of the Earth Simulator (NEC SX-6).

Utilizing thousands of IBM type processors would not help, according to results from the presentation by Patrick Worley, Oak Ridge National Laboratory. Scaling would act as a major limiting constraint. This is where commodity capacity chip systems are getting problematic. As Walter Zwieflhofer, from ECMWF said: "Power, cooling and space requirements of large systems built out of commercial servers have been growing steadily - this is not sustainable, but this is not the place to write an RFP."

As an aside, the sustained performance on the NCAR workload is around 4.1%, delivering 213Gflop/s sustained out of a 5.2teraflop/s system. More revealing is the statistic derived from the NCAR sustained performance results. It shows that the vector based Earth Simulator (NEC SX-6) is twice more cost efficient (dollars/Gigaflop/s) in both price and electrical power usage than the IBM P690 P4, when using sustained performance as a measure. Thus, the myth that commodity chip computers are cheaper has been debunked. (See details in my next article from CAS2K3).

Dr. Tetsuya Sato, Director of the Earth Simulator, described work on ESM in Japan, including international collaborations. The Earth Simulator delivers around 30% sustained performance, i.e. over 12Teraflop/s. Tetsuya Sato is already thinking how to develop new models with a radically different approach, emulating natural processes using a holistic model. He then illustrated how nature does not discriminate between macroscopic or microscopic events and also how natural structures require at least ten million times more computer power than is currently available on the Earth Simulator. His vision is to install a new Earth Simulator with much more power than Moore's Law predicts.

Although Japanese scientists with the NEC SX product line of systems are well provided with high productivity computers, scientists in the USA are poorly served by commodity chip based systems. The U.S. is however waking up to this strategic deficiency and is now pursuing the DARPA High Productivity Computer Systems (HPCS) programme to deliver a Petaflop/s by year 2009-10. The White House has an inter-agency effort underway, the High End Computing Revitalization Task Force (HEC-RTF), for enabling agencies to submit coordinated budget requests in this area for fiscal year 2005. The IBM system to be offered is expected to be at least two generations later than the Power5 technology being proposed for the Blue Planet system. In my view, the imbalance between processor and memory subsystem, as currently manifested in the IBM Power 4+ series, would not deliver Petaflop/s. The Cray system, based on new generations of their Cray X1 line, is likely to be more promising.

During this workshop a strong emphasis was placed on data management and the challenges this entails.To illustrate the kind of resources required, data assimilation in real-time often requires more resources than the weather forecast models. In order to analyse historical observations, data sets that are as consistent as possible are needed and this can only be done with international collaborations to incorporate the maximum of the available data. The most recent effort by ECMWF was the ERA- 40 (1957 to 2002) project, using conventional observations from 1957 and satellite data from 1973. The analysis system used a 125 km grid and a coupled wave model. The validation and production phase took 3 years on ECMWF's HPC systems. These projects need to be completed within the lifetime of one HPC system to avoid the overheads caused by migration. The resulting data set is close to 40TBs in size.

In the data management area, space-based instruments and high-resolution models produce huge volumes of data; to use this data effectively, it needs to be carefully managed. The archives held by centres such as NCAR (~1000+TBs) and ECMWF (~800+TBs) count as some of their most valuable assets. Both NCAR and ECMWF run dedicated data management systems clearly separated from the HPC resources. Metadata-based access and increasingly faster wide-area network links open these archives to the wider research community. The data problem is not insurmountable, but it does require attention and dedicated human resources.

Next week, I'll summarise the performance issues raised at CAS2K3 and how IBM users in the Earth System Modelling field are being "short-changed", so watch this space.

(Brands and names are the property of their respective owners) Copyright: Christopher Lazou, HiPerCom Consultants, Ltd., UK. Email: Chris@lazou.demon.co.uk September 2003.

The opinions expressed in this feature are those of the author and do not necessarily reflect the views of HPCwire.


Top of Page

  |  Table of Contents  |