
Features:
AN INTERVIEW WITH DAN REED, DIRECTOR, NCSA
By Tim Curns, Assistant Editor, HPCwire
HPCwire: What do you see as the biggest obstacles or hindrances to performance
optimization techniques for large-scale parallel, distributed and Grid-based
computing systems? What do you feel are some options for overcoming these
obstacles?
DAN REED: The largest short-term obstacle to optimizing Grid applications is
undoubtedly the evolving state of Grid software infrastructure, and
concomitantly, the paucity of analysis tools. Similarly, today's parallel
systems suffer from a dearth of robust, easy to use, portable tuning tools.
There is no silver bullet that will improve the performance of parallel and
Grid applications. Instead, we need sustained investment and support for tools
matched to application needs and system characteristics. This is not cheap,
nor will it yield "magic solutions" quickly. Hence, the Alliance and NCSA, via
the PACI Alliance expeditions, are developing and hardening performance tuning
tools for both Grids and Linux clusters.
Concurrently, raising the awareness of the tools and offering training
opportunities for researchers is an on-going national emphasis. NCSA and the
Alliance continue to offer workshops, Access Grid tutorials and online
training materials to engage faculty, post-doctoral research associates and
graduate students and to help them apply the latest technologies in their own
research initiatives.
Finally, as I testified to the House Science Committee this summer (see
http://www.house.gov/science/hearings/full03/jul16/reed.pdf), I believe we must take
a long-term, strategic approach to solving these problems. Reducing the gap
between peak hardware performance and achieved performance for a broad range
of applications will require a long-term strategy that couples academic
research (both systems and applications) with industrial prototyping and
assessment and with a cycle of procurement that enables strategic planning and
system revision based on scientific application experiences. These are 10-20
year challenges -- we need to start now.
HPC: What technological advancements or breakthroughs in research are on the
horizon for the National Computational Science Alliance and/or the NCSA? What
are your projections on the development of the NSF TeraGrid Project?
DR: Exciting things are happening on many fronts. Many of the societal
challenges of the 21st century will require the collaborative skills of
researchers in a diverse set of disciplines. NCSA and its Alliance partners
are building the infrastructure and scientific collaborations to address these
challenges. Let me cite just two examples: HASTAC and LEAD.
The newly launched HASTAC (Humanities, Arts, Science, and Technology Advanced
Collaboratory) is an alliance of scientists, humanists, artists, social
theorists, legal specialists and information technology specialists. HASTAC
was founded on the belief that the future of cyberinfrastructure must be
driven by creative discovery across disciplinary divides, given the profound
impact of new technologies on individuals and society.
On the scientific front, the new LEAD (Linked Environments for Atmospheric
Discovery) NSF ITR award couples Alliance researchers at NCSA, Oklahoma,
Alabama and Indiana with partners at the National Center for Atmospheric
Research, Colorado State, Millersville and Howard. Given the billions of
dollars of annual damage and loss of life from severe storms, LEAD's goal is
to create a Grid framework for assimilating, predicting, managing,
mining/analyzing and displaying meteorological data.
NCSA and its partners also continue to deploy advanced computing
infrastructure. NCSA's 17.7 teraflop Xeon cluster is now being deployed and
will enter production this spring. By allocating 3 teraflop sub-clusters to
research groups for days, weeks or even months, we hope the system will
eliminate one of the most common barriers to shared use of large-scale
computing resources: long queue wait times. For a peek at the NCSA hardware
deployments, see http://clustercam.ncsa.uiuc.edu.
We are also very excited about the status and the future of the TeraGrid.
After two years of planning and development, the first phase of the TeraGrid
will enter production at the beginning of 2004, and users have already been
allocated time on the TeraGrid's distributed resources. Friendly users have
been running applications on the TeraGrid for the past several months, and
research results are already being published based on these computations. The
hardware for phase two TeraGrid deployment is already arriving at NCSA, where
it will be assembled to create a 10 teraflop Itanium family Linux system. We
are soliciting additional Grid applications, both within the U.S. and
international collaborations, for TeraGrid deployment.
HPC: How important are input/output characterizations and parallel file
systems in developing high-performance implementations of parallel
applications?
DR: Optimizing I/O activity is increasingly critical. The explosive growth of
experimental data, from a new generation of scientific instruments, and of
computational data, from high fidelity simulations, means that large-scale
data management and mining are central to gaining scientific insights. Many
sites now have multiple petabyte archives and have or are deploying petabyte
secondary storage systems. This infrastructure supports such projects as the
National Virtual Observatory, LIGO, the upcoming Large Hadron Collider and
biological genomic and protenomics data.
On the technology front, however, disk storage capacities are rising far more
rapidly than disk bandwidths, often leading to "write only" data storage. I/O
has long been the "poor stepchild" of high performance computing, and we are
seeing the effects of this in I/O systems poorly matched to application needs.
We glibly speak of teraflops, but we rarely speak of terabytes/second, most
often because current systems are not architected or procured to sustain such
I/O bandwidths.
We need a deeper understanding of the I/O patterns that occur in parallel and
Grid applications to guide the design of parallel I/O libraries and file
systems. Understanding I/O behavior at scale is an ongoing research topic,
both at NCSA and in my own research group. We are characterizing the I/O
behavior of applications on HPC systems, looking at the effects of multilevel
mediation by I/O libraries and file systems. In turn, we are using these
insights to investigate I/O policies that exploit the temporal and spatial I/O
patterns.
HPC: Why is exploring the utility and performance of game systems
(specifically, Sony PlayStation2 clusters) important? How can research of
these systems benefit both scientific computing and high-resolution
visualization?
DR: The history of computing shows that each computing generation has been
partially or totally supplanted by systems that occupy a different point on
the price/performance curve, expanding the base of possible owners and users.
Mainframes and computer families like the IBM S/360 replaced "one of a kind"
research systems and made computing part of the corporate culture. In turn,
DEC's introduction of the minicomputer gave laboratory groups direct access to
affordable computing. Workstations and PCs, driven by the emergence of
powerful microprocessors, made computing broadly available to individual
researchers and consumers. The common theme across these computing generations
has been a dramatic decrease in price, an associated increase in performance,
emergence of new market niches and a consequent expansion of the number of
units sold.
Game consoles, with price points below $300, performance rivaling or exceeding
that of PCs and graphics capabilities recently found only on high-end
visualization supercomputers, are the vanguard of yet another computing
generation. Moreover, market forces and fierce vendor competition continue to
fuel technical innovation and performance improvements on these game
platforms, creating research and development incentives and deployment
opportunities in new scientific domains.
NCSA's mission is to track technology trends and deploy new infrastructure
that can catalyze scientific discovery. As an early test vehicle, NCSA has
assembled a 0.6 teraflop Linux PlayStation2 cluster. Using this cluster, we
are vectorizing key numerical library kernels and investigating the
partitioning of applications across game platform interactive and vector
processors. We expect insights from these experiments to inform development
and acquisition plans multiple years in the future.
HPC: Feel free to offer any other comments on HPC or related topics!
DR: Several recent developments in high-end computing have stimulated a
re-examination of current U.S. policies and approaches. These developments
include the deployment of Japan's Earth System Simulator, concerns about the
difficulty in achieving substantial fractions of peak hardware performance on
high-end systems, and the ongoing complexity of developing, debugging and
optimizing applications for high-end systems. In addition, there is growing
recognition that a new set of scientific and engineering discoveries could be
catalyzed by access to very large-scale computer systems -- leadership
computing systems in the 100 teraflop to petaflop range. Finally, the need for
high-end systems in support of national defense has led to new interest in
high-end computing research, development and procurement.
This summer, in response to a request from the interagency High-End Computing
Revitalization Task Force (HECRTF), several of us helped organize a community
workshop to provide suggestions on strategic directions for high-end
computing. The slides from the community workshop are available at
http://www.cra.org/Activities/workshops/nitrd and copies of the final report
will be available at SC2003.
In brief, the common theme of the workshop report is the need for sustained
investment in research, development and system acquisition. This sustained
approach also requires deep collaboration among academic researchers,
government laboratories, industrial laboratories and computer vendors.
Short-term strategies and one-time programs are unlikely to develop the
technology pipelines and new approaches needed to realize the petascale
computing systems needed by a range of scientific, defense and national
security applications. Rather, multiple cycles of advanced research and
development, followed by large-scale prototyping and product development, will
be required to develop systems that can consistently achieve a high fraction
of their peak performance on critical applications, while also being easier to
program and operate reliably.
|