HPCwire
 The global publication of record for High Performance Computing / September 24, 2004: Vol. 13, No. 38

Previous Article   |  Table of Contents  |  

Features:

FASTER SYSTEMS NOT ALWAYS BETTER: ENSURING HPC PRODUCTIVITY
by Tim Curns, Editor

Vendors today often spend much of their time producing systems that can secure a position among the fastest in the world. While measurements of speed and benchmarking ensure competition and push vendors to create more powerful systems, practical productivity of these systems is sometimes compromised. Are we sacrificing progress in order to gain recognition?

Tom Quinn, VP of East Coast Operations at Linux Networx, (a provider of computing systems for high-performance simulation, analysis and modeling) recently spoke to HPCwire regarding the relationship between the productivity and speed of high- end systems.


HPCwire: Linux Networx has been vocal about the difference between Flops (speed) and the concept of high productivity computing. Can you explain the difference between these two issues?

Tom Quinn: There seems to be a trend with the media that cover the HPC industry to place the most value and emphasis on the speed of the system, rather than on the overall productivity of the machine. It's not that performance does not matter, in fact it is probably one of the most critical areas that affects productivity, but just because a computer is fast, doesn't mean it's productive. This trend is propagated by supercomputer vendors wanting their name listed among the top 10 fastest supercomputers, which prompts them to sell their systems at extremely low costs (often times losing money on the deal). Customers, also wanting their name on the top 10, buy the cheapest hardware they can just to get a system that can run a blazingly fast benchmark, but little else. Computing productivity is sometimes being sacrificed for bragging rights.

HPCwire: So, keeping in mind that productivity is sometimes sacrificed for speed, do you think the Top500 List is an accurate representation of the world's fastest computers?

TQ: As a supporter and participant of the Top500 organization, Linux Networx realizes the importance of recognizing the world's fastest systems, but is the number of operations per second a computer is able to produce on a benchmark test the true measure of a computer's value? Certainly the speed of a system is important, but it doesn't measure the actual productivity of the machine. The Top500 list provides a meaningful representation of the world's fastest systems, as measured in operations per second using one specific benchmark, however most HPC users are most concerned about productivity-or the number of jobs run over the life of the system.

HPCwire: Well if speed is an eye-catching factor, what makes computing productivity so important?

TQ: Purchasing a "fast" system that doesn't help solve an organization's problems is a waste of time and money. It won't take long before the system's inability to run production codes becomes painfully obvious, deadlines aren't met, and management gets upset (usually at the technical folks). Suddenly having the fastest system last year won't matter much when the CEO is wondering why product development or research is behind schedule.

An obsession with speed rather than productivity can also lead to a loss of meaningful research and scientific advancement. When the focus is taken off productivity and computing speed becomes the main objective, the quality and quantity of research and development is compromised. Ultimately, the speed of a system has little meaning if it is not productive.

HPCwire: In addition to a loss of meaningful research as you say, what are some of the other ramifications if there continues to be a focus on speed over productivity?

TQ: Focusing on speed instead of productivity has the potential to hinder HPC innovation. If customers solely demand fast systems, rather than a system that can efficiently and productively run applications and solve problems, then a computing vendor's motivation to develop improved technology is diminished. The HPC industry has traditionally rewarded innovative companies that push the technical envelope. Vendors that focus on improving the productivity of computers are motivated to further invest in developing the right technologies.

HPCwire: So how does Linux Networx ensure highly productive systems to its customers?

TQ: Every aspect of Linux Networx systems are designed, integrated and optimized for maximum productivity. Linux Networx pays attention to the factors that influence system productivity that many vendors miss, such as systems designed for maximum cooling and density, total cluster management, validation and integration of the latest components, full pre-ship system buildup and testing, followed by rapid on-site installation, plus ongoing service and training programs to help customer maximize their cluster's ROI. All these efforts are designed to help customers get the most production possible from their cluster during its lifetime.

HPCwire: What should organizations look for in a cluster vendor to ensure they receive a highly productive machine?

TQ: Selecting the right cluster vendor is crucial in determining the success of your computational goals. Work with a cluster vendor that has a proven track record of providing turnkey solutions, starting with rigorous Q/A and validation processes, pre-ship system buildup and testing, optimized applications, the latest cluster technologies, total cluster management solutions, training programs, and professional services offerings..

Keep in mind, customers have organizational goals tied to their computer systems. The right vendor can mitigate the risk of installing a high-powered computing system, as well as deliver a highly productive computing machine that helps bring a product to market more quickly or advance important research.

HPCwire: Once a system is installed and running, what additional steps can be taken to make sure the system is productive over its lifetime?

TQ: Whenever a new system is purchased, in particular if there is minimal institutional knowledge or experience with the types of systems being considered, education is critical both before and after the system is procured. Education is important for optimizing the procurement (making sure you receive the best solution for your money) and ensuring optimal productivity after the system is installed. Linux Networx offers a variety of technical cluster training courses to help users maximize their cluster's productivity.

In terms of impact on price/productivity, education for system administrators on the system's management tools is difficult to prove quantitatively. It may yield reduction in down time and/or enable them to get users to run jobs faster and more efficiently. For instance, a developer could go to a three- day class, go back and look at their code, and squeeze out an addition 20% performance. If the system cost $1 million and the class cost $3000 - that's an excellent price/productivity trade. Education and training are rarely a bad investment.

HPCwire: And finally, how can a user's application be optimized to improve productivity?

TQ: In order for an application to run effectively on a cluster system, it needs to be structured, optimized, and/or re-compiled to take advantage of the cluster's architecture. A Linux cluster vendor should work with their customer to focus both the system and the application on productivity optimization. Linux Networx offers application parallelization consultants to help customers not only know if their application will run effectively on a cluster system, but also parallelize their application and provide recommendations to improve overall throughput. This detailed technical analysis can help maximize an application's productivity throughout the life of the cluster.

HPCwire: Thanks, Tom. It's clear to see how system speed and productivity can be both beneficial and, simultaneously, mutually exclusive. I'm sure vendors and institutions will continue to monitor both ends of the competition.


Top of Page

Previous Article   |  Table of Contents  |