HPCwire
 The global publication of record for High Performance Computing / July 2, 2004: Vol. 13, No. 26

Previous Article   |  Table of Contents  |  

Features:

PART II: IS WINDOWS NT SERVER AN OS FOR HPC?
by Christopher Lazou

With clusters in such a prominent position in the TOP500 list and software becoming the new bottleneck for delivering the hardware potential to the user application, below is the full interview with Dr. Gerd Heber from the Cornell Theory Centre, Ithaca, NY, USA, concerning their work in porting Microsoft Windows NT server as an OS for HPC clusters. Those who read the first part in [107395, http://www.tgc.com/hpcwire/hpcwireWWW/04/0409/107395.html], skip to question 5.


Christopher Lazou: I understand your role at Cornell is developing new algorithms and high-performance software for multi-scale and multi-physics simulations. This is a challenging, urgently needed area of research, as its success should yield enormous benefits for HPC users. The conventional wisdom is to use highly tuned operating systems with rich functionality, but focused in scientific technical applications as a vehicle for future developments, yet you at Cornell chose Windows NT server. Would you briefly explain the rationale behind this decision?

Dr. Gerd Heber: Unlike operating systems for real-time processing or embedded systems, the operating systems used in scientific and technical computing are fairly general purpose. Except for say I/O operations or multithreading, any OS is more or less "in-the-way" of an application. The systems in the Windows Server family are tuned enough to get a cluster on to the Top500 list or to the top of the TPC benchmarks. Tests performed by us in house or by others have shown that on identical hardware, applications running on a Windows Server based platform will perform the same or better than on other operating systems.

CL: As I remember in the mid-nineties Cornell was one of the NSF supercomputing centres, operating the largest IBM SP system in the world. A switch to Windows could not have been easy and how did your users react to this radical departure from the mainstream?

GH: The switch was indeed not easy and the prevailing initial reaction from users (including myself) was scepticism, in some cases hostility. From a user's perspective, what made it difficult were the different development environment and the lack of certain libraries and tools. For example, we had a considerable investment into project management using make and CVS, and we had no intention to change this. The C++ compiler, which shipped with Visual Studio 6.0, 5 years ago, was a decent C compiler, but did not deserve to be called a C++ compiler. After years of development with Kuck & Associates' KCC compiler, we spent several months looking around for and testing all kinds of C++ compilers, just to get our code built for Windows. Fortunately, the code had been tested extensively on the SP and performed correctly once built. Tools like Cygwin and GCC were indispensable in those early days.

CL: As one is aware multi-scale systems throw many different problems to those of medium size systems, what for example were the key challenges while developing Windows NT for its new role?

GH: Making sure that all necessary things are in place for users, developers, and administrators was perhaps the key challenge. There were quite large Windows installations (Windows domains of thousands of servers) out there when we made the transition, but running them as HPC clusters with all its specific requirements had not been attempted yet. On the other hand, the administrative staff at CTC consisted of fabulous AIX administrators, who had only a vague idea of doing things the "Windows way". What followed was for both users and administrators a "soul searching process" at the end of which both arrived at the same conclusion: "You must change your life." (Rilke) - Emulating or mimicking UNIX on Windows gets you only so far.

CL: To successfully solve multi-scale, multi-physics application problems, system robustness, fault tolerance and computation integrity (check point restart) are essential. For example in a system with 2,000 nodes if a node fails once every two years the system fails about three times a day and most calculations take much longer than that, so how are you tackling these challenges?

GH: All our work is done using industry standard soft- and hardware. Despite the lack of out-of-the-box support for check pointing we think application- level check pointing is the most promising approach. Of course, this requires more or less intervention from the user, but at the same time minimizes the amount of data associated with a checkpoint. The Intelligent Software Systems project at the Cornell CS department implemented a very convenient compiler based- approach for MPI and OpenMP programs. In my own work, I use databases to store sufficient application state- and history information, which also allows restarting on a different number of nodes.


CL: What is the application domain the system is built for, i.e. the user community the system is aimed for and what obstacles did you have to overcome to create an acceptable working environment?

GH: The majority of users are Cornell researchers, faculty, and students. The applications are as diverse as this community and include more than 100 projects in astronomy, agriculture, computer science, the engineering disciplines, mathematics, medical sciences, physics, and social sciences. Major application areas are Computational Finance, Protein Folding and Structural Biology, Computational Materials, and Astronomy. Application and tool availability was perhaps the biggest obstacle. Although the majority of applications were available for Windows, their usage and licensing was geared towards a desktop environment and not a set of batch nodes. Working closely with the vendors helped us to overcome those obstacles.

CL: Did you have to invent new tools along the way, or rethink the algorithms used and how HPC distributed applications are manipulated?

GH: In terms of new tools, the biggest piece was probably the scheduling system called ClusterCoNTroller. From a user's perspective it resembled the SP scheduler and it was nearly identical in its functionality. Users developed and shared a lot of smaller utilities, for example a wrapper called "unixify", which would map UNIX-style compiler invocation onto its Windows counterpart. At the top of our current wish list is a parallel debugger a la TotalView. It is certainly possible to attach the Visual Studio debugger to a remote process, but one can handle only so many copies of Visual Studio on a desktop. The good news is that there is development underway and users will soon have a fully featured parallel debugger on the Windows platform. Besides playing catch-up, we were pursuing novel approaches to HPC using databases and Web services, - technologies driven by industry and not the HPC mainstream. At CTC, we developed a system for real-time visualization tool for interactive exploration of large-scale 3D solid models and underlying engineering data based on OpenDX, the .NET Framework, and SQL Server 2000. Our efforts in the Adaptive Software Project to create stateful Web services predate the OGSI standard by almost 2 years.

CL: For performance, codes designed for tens of processors have to be rethought and re-written; they can't just be scaled up, so how did you deal with legacy codes?

GH: The largest IBM SP ever operated at Cornell had 512 processors. The largest cluster in production today has 640 processors, which is not a tremendous increase. Unless they relied on proprietary features of the TB3 interconnect, codes developed on the SP did not require fundamental changes. Most of the effort went into instrumentation and profiling to understand the performance impact of SMP nodes, interconnects of comparatively higher latency and lower bandwidth memory interfaces.

CL: As vendors cannot test and validate software upgrades on system this size, how do you handle software upgrades?

GH: Besides vendor testing, we do a lot of testing ourselves. There is a dedicated Windows test domain, which is configured identical to the production domain. In the production environment, upgrades are typically performed on smaller clusters first. The scheduler allows for arbitrary logical partitioning of resources and, on large clusters, updates can be performed a few nodes at a time, without taking down the entire machine. Upgrades, which mandate an all, or nothing approach, are the exception; however, standard imaging techniques allow recovering quickly from an unsuccessful upgrade.

CL: Finally, what are the milestones of this development and how far have you got in achieving them?

GH: In about 6 months CTC had moved 500 users from UNIX to Windows NT Server 4.0. Today CTC serves about 1,000 users and operates close to 1,000 Windows servers as cluster nodes, database-, file-, storage- and web servers, and a Windows based CAVE. Most of the milestones correlate with new hard- and software releases insofar as they enabled us to do new things or simplify existing designs. Windows certainly has come a long way since NT 4.0. Windows 2000 turned out to be a much more reliable platform than NT 4.0 and brought with its 64-bit editions support for Intel's Itanium architecture. Windows Server 2003 offers great enhancements in scalability, performance, security, and administration. However, the most important new features in Windows Server 2003 for end-users are the integration of the .NET Framework, Internet Information Services 6.0, and out-of-the-box support for Universal Description, Discovery, and Integration (UDDI). One of CTC's primary goals has always been to deliver HPC and other resources in as seamless a fashion as possible to virtually any user on campus or collaborator off campus. For us Windows is a platform enabling the deployment of HPC resources, applications, and other services as state-full Web services, which brings us ever closer to reaching that goal.

(Brands and names are the property of their respective owners) Copyright: Christopher Lazou, HiPerCom Consultants, Ltd., UK. Email: Chris@lazou.demon.co.uk June 2004.


Top of Page

Previous Article   |  Table of Contents  |