
Features:
FASTER SYSTEMS NOT ALWAYS BETTER: ENSURING HPC PRODUCTIVITY
by Tim Curns, Editor
Vendors today often spend much of their time producing systems that can secure
a position among the fastest in the world. While measurements of speed and
benchmarking ensure competition and push vendors to create more powerful
systems, practical productivity of these systems is sometimes compromised. Are
we sacrificing progress in order to gain recognition?
Tom Quinn, VP of East Coast Operations at Linux Networx, (a provider of
computing systems for high-performance simulation, analysis and modeling)
recently spoke to HPCwire regarding the relationship between the productivity
and speed of high- end systems.
HPCwire: Linux Networx has been vocal about the difference between Flops
(speed) and the concept of high productivity computing. Can you explain the
difference between these two issues?
Tom Quinn: There seems to be a trend with the media that cover the HPC
industry to place the most value and emphasis on the speed of the system,
rather than on the overall productivity of the machine. It's not that
performance does not matter, in fact it is probably one of the most critical
areas that affects productivity, but just because a computer is fast, doesn't
mean it's productive. This trend is propagated by supercomputer vendors
wanting their name listed among the top 10 fastest supercomputers, which
prompts them to sell their systems at extremely low costs (often times losing
money on the deal). Customers, also wanting their name on the top 10, buy the
cheapest hardware they can just to get a system that can run a blazingly fast
benchmark, but little else. Computing productivity is sometimes being
sacrificed for bragging rights.
HPCwire: So, keeping in mind that productivity is sometimes sacrificed for
speed, do you think the Top500 List is an accurate representation of the
world's fastest computers?
TQ: As a supporter and participant of the Top500 organization, Linux Networx
realizes the importance of recognizing the world's fastest systems, but is the
number of operations per second a computer is able to produce on a benchmark
test the true measure of a computer's value? Certainly the speed of a system
is important, but it doesn't measure the actual productivity of the machine.
The Top500 list provides a meaningful representation of the world's fastest
systems, as measured in operations per second using one specific benchmark,
however most HPC users are most concerned about productivity-or the number of
jobs run over the life of the system.
HPCwire: Well if speed is an eye-catching factor, what makes computing
productivity so important?
TQ: Purchasing a "fast" system that doesn't help solve an organization's
problems is a waste of time and money. It won't take long before the system's
inability to run production codes becomes painfully obvious, deadlines aren't
met, and management gets upset (usually at the technical folks). Suddenly
having the fastest system last year won't matter much when the CEO is
wondering why product development or research is behind schedule.
An obsession with speed rather than productivity can also lead to a loss of
meaningful research and scientific advancement. When the focus is taken off
productivity and computing speed becomes the main objective, the quality and
quantity of research and development is compromised. Ultimately, the speed of
a system has little meaning if it is not productive.
HPCwire: In addition to a loss of meaningful research as you say, what are
some of the other ramifications if there continues to be a focus on speed over
productivity?
TQ: Focusing on speed instead of productivity has the potential to hinder HPC
innovation. If customers solely demand fast systems, rather than a system
that can efficiently and productively run applications and solve problems,
then a computing vendor's motivation to develop improved technology is
diminished. The HPC industry has traditionally rewarded innovative companies
that push the technical envelope. Vendors that focus on improving the
productivity of computers are motivated to further invest in developing the
right technologies.
HPCwire: So how does Linux Networx ensure highly productive systems to its
customers?
TQ: Every aspect of Linux Networx systems are designed, integrated and
optimized for maximum productivity. Linux Networx pays attention to the
factors that influence system productivity that many vendors miss, such as
systems designed for maximum cooling and density, total cluster management,
validation and integration of the latest components, full pre-ship system
buildup and testing, followed by rapid on-site installation, plus ongoing
service and training programs to help customer maximize their cluster's ROI.
All these efforts are designed to help customers get the most production
possible from their cluster during its lifetime.
HPCwire: What should organizations look for in a cluster vendor to ensure they
receive a highly productive machine?
TQ: Selecting the right cluster vendor is crucial in determining the success
of your computational goals. Work with a cluster vendor that has a proven
track record of providing turnkey solutions, starting with rigorous Q/A and
validation processes, pre-ship system buildup and testing, optimized
applications, the latest cluster technologies, total cluster management
solutions, training programs, and professional services offerings..
Keep in mind, customers have organizational goals tied to their computer
systems. The right vendor can mitigate the risk of installing a high-powered
computing system, as well as deliver a highly productive computing machine
that helps bring a product to market more quickly or advance important
research.
HPCwire: Once a system is installed and running, what additional steps can be
taken to make sure the system is productive over its lifetime?
TQ: Whenever a new system is purchased, in particular if there is minimal
institutional knowledge or experience with the types of systems being
considered, education is critical both before and after the system is
procured. Education is important for optimizing the procurement (making sure
you receive the best solution for your money) and ensuring optimal
productivity after the system is installed. Linux Networx offers a variety of
technical cluster training courses to help users maximize their cluster's
productivity.
In terms of impact on price/productivity, education for system administrators
on the system's management tools is difficult to prove quantitatively. It may
yield reduction in down time and/or enable them to get users to run jobs
faster and more efficiently. For instance, a developer could go to a three-
day class, go back and look at their code, and squeeze out an addition 20%
performance. If the system cost $1 million and the class cost $3000 - that's
an excellent price/productivity trade. Education and training are rarely a
bad investment.
HPCwire: And finally, how can a user's application be optimized to improve
productivity?
TQ: In order for an application to run effectively on a cluster system, it
needs to be structured, optimized, and/or re-compiled to take advantage of the
cluster's architecture. A Linux cluster vendor should work with their customer
to focus both the system and the application on productivity optimization.
Linux Networx offers application parallelization consultants to help customers
not only know if their application will run effectively on a cluster system,
but also parallelize their application and provide recommendations to improve
overall throughput. This detailed technical analysis can help maximize an
application's productivity throughout the life of the cluster.
HPCwire: Thanks, Tom. It's clear to see how system speed and productivity can
be both beneficial and, simultaneously, mutually exclusive. I'm sure vendors
and institutions will continue to monitor both ends of the competition.
|