
Features:
LINUX CLUSTER BREAKTHROUGH: THE BIG DEAL IS BIG BANDWIDTH
by Tim Curns, Editor
Linux cluster users are due for a big boost in input/output speed,
scalability, and reliability because of a new Lustre-based HP StorageWorks
Scalable File Share product from HP. HPCwire recently talked with Kent
Koeninger, product manager in HP's High Performance Technical Computing
Division, about HP's two-year tuning of the Lustre protocol and how the
software solves I/O bottleneck issues in Linux cluster computing enterprises.
HPCwire: Please give our readers a quick overview of Lustre and how it is used
in the new scalable file share product.
Kent Koeninger: Lustre is an open, standards-based software technology that is
well funded and backed by the U. S. Department of Energy (DoE), the greater
open source Linux community, Clustered File Systems, Inc. (CFS) , and HP.
Lustre's open standards figure prominently in HP's new file system, the
StorageWorks Scalable File Share (HP SFS). Lustre is a major breakthrough for
fast I/O on Linux clusters.
The StorageWorks Scalable File Share is a self-contained fileserver built from
multiple, industry standard HP ProLiant servers and HP StorageWorks disk
arrays. The HP SFS server runs a combination of Lustre and HP-specific value
added software. HP organizes these parallel servers, storage arrays and
software into a highly reliable product with the simplicity of a single,
scalable file system. HP SFS is typically paired with Linux compute clusters,
such as high-performance technical computing (HPTC) clusters of HP ProLiant
and HP Integrity servers. HP SFS support is built into all HP XC clusters and
is an integrated option for all cluster configurations in the HP HPTC cluster
portfolio.
The result is a powerful file server delivering faster time to solution and
ten to 100 times more bandwidth than existing solutions.
HPCwire: How do you manage to fulfill your promise of 100 times more
bandwidth?
KK: We set out to find a solution to the Linux cluster I/O problem because we
knew existing global file systems did not support today's computational and
data demands. Linux clusters are scalable, computational engines that can
deliver immense computing power-trillions of calculations per second-to meet
the most demanding compute-intensive research projects. However, many Linux
clusters use slow, shared I/O techniques, such as Network File System (NFS),
the current defacto standard for sharing files. The resulting slow I/O can
limit the speed and throughput of the Linux cluster. So, you have these
powerful compute engines, roaring and ready to process, but stuck in traffic.
Additionally, programmers must often implement numerous time consuming and
difficult techniques to use hundreds of disjointed, distributed file systems.
As a result, applications run slowly and users waste valuable time and effort
on file system housekeeping. The HP SFS high-bandwidth shared file system can
span dozens to thousands of Linux clients, dramatically simplifying the
ability to run clustered applications. HP SFS removes the I/O bottleneck,
saving users hours of programming time. With HP SFS, users avoid the
complexity of running applications on many individual file systems.
HPCwire: Lustre is Open Source technology implemented on equipment from many
vendors, including HP. How is HP different?
KK: HP invested early and substantially in Lustre technology and is the first
and only tier-one vendor to offer a fully supported, integrated hardware and
software product based on Lustre technology, , which resulted from its joint
research and development project with the DoE and CFS. The DoE selected HP
to provide program management, and development services to support the Lustre
project.
HP is a storage pioneer and, as such, we also bring differentiation though our
innovative StorageWorks grid architecture. Let me explain. The standards-
based StorageWorks grid architecture allows storage services to be delivered
across a massively scalable, centrally managed system. StorageWorks Scalable
File Share is the second product we've introduced that is based on the
StorageWorks grid architecture. This architecture divides storage, indexing,
search and retrieval tasks across a distinct set of computing nodes or storage
"smart cells" that cooperate to form a single, shared file system. Each HP
smart cell is an intelligent storage server running the Lustre protocol, which
works in parallel with other smart cells on a shared StorageWorks grid. To
scale to the desired bandwidth level, users simply add smart cells. The smart
cell is built with the agility of an object-based storage framework, which is
architected for increasingly sophisticated features that improve
responsiveness, security, reliability, and resiliency. The StorageWorks grid
strategy follows the larger HP Adaptive Enterprise strategy that delivers
unique capabilities to help customers leverage their HPTC and general IT
infrastructure.
HPCwire: How expensive are these implementations?
Actually, not very. New, high-capacity and low-cost parts are combining with
Lustre's open and scalable storage technology to deliver dramatic reductions
in storage prices while delivering significant increases in terabytes of
capacity. Lustre-based scalable storage can lower prices several times
compared to conventional Fibre Channel SAN storage systems.
HPCwire: Please describe your work with Pacific Northwest National Laboratory.
KK: The U.S. Department of Energy's Pacific Northwest National Laboratory
(PNNL) has been using Lustre technology for more than a year on one of the 10
largest Linux clusters in the world. The HP Linux super cluster at PNNL, with
more than 1,800 Itanium 2 processors, is rated at more than 11 TFLOPS. The
Lustre implementation on PNNL's super cluster allows them to achieve faster,
more accurate analysis on biological, chemical and environmental cleanup
scenarios. PNNL gets answers faster from complex, I/O hungry applications
because Lustre scales the high-bandwidth I/O needed to match the large data
files produced and consumed by the scalable simulations. HP also worked with
PNNL to ensure the Lustre-based system was highly reliable, stable, and that
no data was lost during processing.
HPCwire: What kind of performance numbers can you provide?
KK: PNNL currently sustains over 3 GB/s of bandwidth- running production loads
on a 53-terabyte Lustre-based file share. Individual Linux clients are able to
write data to the parallel Lustre servers at more than 650 MB/s. The system is
designed to make the enormous PNNL cluster centralized, easy to use and
manage, and simple to expand.
HPCwire: What do you mean by scalable resiliency? Is it another term for high
availability?
KK: Resiliency means there is no single point of failure, which results in
high availability. Lustre technology in HP SFS is designed to scale while
maintaining resiliency. As servers are added to a typical cluster environment,
failures become more likely because of the increasing number of physical
components. Lustre's support for resilient, redundant hardware provides
protection from inevitable hardware failures through transparent fail-over and
recovery. To increase the protection, we add resilient HP hardware,
components, and thoroughly test configurations to ensure that reliability is
delivered as the servers are scaled.
|