
Features:
PART II: IS WINDOWS NT SERVER AN OS FOR HPC?
by Christopher Lazou
With clusters in such a prominent position in the TOP500 list and software
becoming the new bottleneck for delivering the hardware potential to the user
application, below is the full interview with Dr. Gerd Heber from the Cornell
Theory Centre, Ithaca, NY, USA, concerning their work in porting Microsoft
Windows NT server as an OS for HPC clusters. Those who read the first part in
[107395, http://www.tgc.com/hpcwire/hpcwireWWW/04/0409/107395.html], skip to
question 5.
Christopher Lazou: I understand your role at Cornell is developing new
algorithms and high-performance software for multi-scale and multi-physics
simulations. This is a challenging, urgently needed area of research, as its
success should yield enormous benefits for HPC users. The conventional wisdom
is to use highly tuned operating systems with rich functionality, but focused
in scientific technical applications as a vehicle for future developments, yet
you at Cornell chose Windows NT server. Would you briefly explain the
rationale behind this decision?
Dr. Gerd Heber: Unlike operating systems for real-time processing or embedded
systems, the operating systems used in scientific and technical computing are
fairly general purpose. Except for say I/O operations or multithreading, any
OS is more or less "in-the-way" of an application. The systems in the Windows
Server family are tuned enough to get a cluster on to the Top500 list or to
the top of the TPC benchmarks. Tests performed by us in house or by others
have shown that on identical hardware, applications running on a Windows
Server based platform will perform the same or better than on other operating
systems.
CL: As I remember in the mid-nineties Cornell was one of the NSF
supercomputing centres, operating the largest IBM SP system in the world. A
switch to Windows could not have been easy and how did your users react to
this radical departure from the mainstream?
GH: The switch was indeed not easy and the prevailing initial reaction from
users (including myself) was scepticism, in some cases hostility. From a
user's perspective, what made it difficult were the different development
environment and the lack of certain libraries and tools. For example, we had a
considerable investment into project management using make and CVS, and we had
no intention to change this. The C++ compiler, which shipped with Visual
Studio 6.0, 5 years ago, was a decent C compiler, but did not deserve to be
called a C++ compiler. After years of development with Kuck & Associates' KCC
compiler, we spent several months looking around for and testing all kinds of
C++ compilers, just to get our code built for Windows. Fortunately, the code
had been tested extensively on the SP and performed correctly once built.
Tools like Cygwin and GCC were indispensable in those early days.
CL: As one is aware multi-scale systems throw many different problems to those
of medium size systems, what for example were the key challenges while
developing Windows NT for its new role?
GH: Making sure that all necessary things are in place for users, developers,
and administrators was perhaps the key challenge. There were quite large
Windows installations (Windows domains of thousands of servers) out there when
we made the transition, but running them as HPC clusters with all its specific
requirements had not been attempted yet. On the other hand, the administrative
staff at CTC consisted of fabulous AIX administrators, who had only a vague
idea of doing things the "Windows way". What followed was for both users and
administrators a "soul searching process" at the end of which both arrived at
the same conclusion: "You must change your life." (Rilke) - Emulating or
mimicking UNIX on Windows gets you only so far.
CL: To successfully solve multi-scale, multi-physics application problems,
system robustness, fault tolerance and computation integrity (check point
restart) are essential. For example in a system with 2,000 nodes if a node
fails once every two years the system fails about three times a day and most
calculations take much longer than that, so how are you tackling these
challenges?
GH: All our work is done using industry standard soft- and hardware. Despite
the lack of out-of-the-box support for check pointing we think application-
level check pointing is the most promising approach. Of course, this requires
more or less intervention from the user, but at the same time minimizes the
amount of data associated with a checkpoint. The Intelligent Software Systems
project at the Cornell CS department implemented a very convenient compiler
based- approach for MPI and OpenMP programs. In my own work, I use databases
to store sufficient application state- and history information, which also
allows restarting on a different number of nodes.
CL: What is the application domain the system is built for, i.e. the user
community the system is aimed for and what obstacles did you have to overcome
to create an acceptable working environment?
GH: The majority of users are Cornell researchers, faculty, and students. The
applications are as diverse as this community and include more than 100
projects in astronomy, agriculture, computer science, the engineering
disciplines, mathematics, medical sciences, physics, and social sciences.
Major application areas are Computational Finance, Protein Folding and
Structural Biology, Computational Materials, and Astronomy. Application and
tool availability was perhaps the biggest obstacle. Although the majority of
applications were available for Windows, their usage and licensing was geared
towards a desktop environment and not a set of batch nodes. Working closely
with the vendors helped us to overcome those obstacles.
CL: Did you have to invent new tools along the way, or rethink the algorithms
used and how HPC distributed applications are manipulated?
GH: In terms of new tools, the biggest piece was probably the scheduling
system called ClusterCoNTroller. From a user's perspective it resembled the SP
scheduler and it was nearly identical in its functionality. Users developed
and shared a lot of smaller utilities, for example a wrapper called "unixify",
which would map UNIX-style compiler invocation onto its Windows counterpart.
At the top of our current wish list is a parallel debugger a la TotalView. It
is certainly possible to attach the Visual Studio debugger to a remote
process, but one can handle only so many copies of Visual Studio on a desktop.
The good news is that there is development underway and users will soon have a
fully featured parallel debugger on the Windows platform. Besides playing
catch-up, we were pursuing novel approaches to HPC using databases and Web
services, - technologies driven by industry and not the HPC mainstream. At
CTC, we developed a system for real-time visualization tool for interactive
exploration of large-scale 3D solid models and underlying engineering data
based on OpenDX, the .NET Framework, and SQL Server 2000. Our efforts in the
Adaptive Software Project to create stateful Web services predate the OGSI
standard by almost 2 years.
CL: For performance, codes designed for tens of processors have to be
rethought and re-written; they can't just be scaled up, so how did you deal
with legacy codes?
GH: The largest IBM SP ever operated at Cornell had 512 processors. The
largest cluster in production today has 640 processors, which is not a
tremendous increase. Unless they relied on proprietary features of the TB3
interconnect, codes developed on the SP did not require fundamental changes.
Most of the effort went into instrumentation and profiling to understand the
performance impact of SMP nodes, interconnects of comparatively higher latency
and lower bandwidth memory interfaces.
CL: As vendors cannot test and validate software upgrades on system this size,
how do you handle software upgrades?
GH: Besides vendor testing, we do a lot of testing ourselves. There is a
dedicated Windows test domain, which is configured identical to the production
domain. In the production environment, upgrades are typically performed on
smaller clusters first. The scheduler allows for arbitrary logical
partitioning of resources and, on large clusters, updates can be performed a
few nodes at a time, without taking down the entire machine. Upgrades, which
mandate an all, or nothing approach, are the exception; however, standard
imaging techniques allow recovering quickly from an unsuccessful upgrade.
CL: Finally, what are the milestones of this development and how far have you
got in achieving them?
GH: In about 6 months CTC had moved 500 users from UNIX to Windows NT Server
4.0. Today CTC serves about 1,000 users and operates close to 1,000 Windows
servers as cluster nodes, database-, file-, storage- and web servers, and a
Windows based CAVE. Most of the milestones correlate with new hard- and
software releases insofar as they enabled us to do new things or simplify
existing designs. Windows certainly has come a long way since NT 4.0. Windows
2000 turned out to be a much more reliable platform than NT 4.0 and brought
with its 64-bit editions support for Intel's Itanium architecture. Windows
Server 2003 offers great enhancements in scalability, performance, security,
and administration. However, the most important new features in Windows Server
2003 for end-users are the integration of the .NET Framework, Internet
Information Services 6.0, and out-of-the-box support for Universal
Description, Discovery, and Integration (UDDI). One of CTC's primary goals has
always been to deliver HPC and other resources in as seamless a fashion as
possible to virtually any user on campus or collaborator off campus. For us
Windows is a platform enabling the deployment of HPC resources, applications,
and other services as state-full Web services, which brings us ever closer to
reaching that goal.
(Brands and names are the property of their respective owners) Copyright:
Christopher Lazou, HiPerCom Consultants, Ltd., UK. Email:
Chris@lazou.demon.co.uk June 2004.
|