
Features:
INTERVIEW WITH WU-CHUN FENG, LANL & OSU
by Alan Beck, Editor-in-Chief, HPCwire
HPCwire: Is it realistic to expect a single interconnect technology to take
precedence over the others for the bulk of HPC? Why or why not?
WU-CHEN FENG: No. If you take a look at the current Top500 List , you will
find at least half a dozen different interconnects being used in HPC. In fact,
the first four supercomputers on the list are each powered by a different
interconnect: custom, Quadrics, Infiniband and Myrinet. And while Gigabit
Ethernet does not make its first explicit appearance until No. 25, it is
pretty clear that Ethernet is the most widely deployed network interconnect on
the Top500 List.
In the future, I still expect there to be choices for the HPC community as I
don't think that there will ever be a "Swiss Army knife" networking technology
that will address every need of every HPC researcher. Each of the above
interconnect technologies has its own set of benefits and drawbacks. For
instance, the source-routed networking technologies that are prevalently used
in the supercomputers on the Top500 List (e.g., Myrinet and Quadrics) will
obviously not scale to large-scale distributed computing or Grid computing
environments. However, they will generally perform better and more efficiently
than an IP-routed network running atop Ethernet.
If the "Battle of the Network Stars!" panel goes as I expect it will, we won't
necessarily see a clear winner amongst the interconnects. My hope, however, is
that the winner will be the HPC systems and applications community. By having
the panel (with active participation from the audience) engage in a frank
discussion on the benefits and drawbacks of each interconnect, systems and
applications researchers will be better equipped to make the appropriate
choice in interconnect technology.
HPC: How will Grid computing change the face of interconnect preferences?
WF: This will depend (in part) on how the Grid is used. From my perspective,
using the Grid implies the coordinated use of geographically distributed
resources. Access to these resources over the wide-area network (WAN) relies
on IP (Internet Protocol) for routing. When the distributed resource is a
high-performance cluster that is source routed in its system-area network
(e.g., Myrinet and Quadrics), you must be able to bridge or translate traffic
between the source-routed network and the IP-routed network (WAN) as these
networks effectively "speak different languages."
In the most likely scenario, the computational granularity of a Grid task
running on a cluster will be quite large. As a result, communication over the
Grid going to/from the cluster will be small, meaning that the performance
benefits of a source-routed network will outweigh the inefficiencies of having
to bridge between the source-routed and IP-routed networks.
Perhaps a complementary question to ask is "How will Grid computing change the
face of software protocols and network infrastructure over the WAN?" For
example, with the TeraGrid and National Lambda Rail efforts resulting in
long-haul, fiber-optic links, the WAN research community must re-examine the
fundamentals of network transport because TCP, as it exists today, will not
scale over high-bandwidth, long-haul links. The seeds of this problem were
exposed back in SC2000 in a paper entitled "The Failure of TCP in
High-Performance Computational Grids. Since then, we have seen a proliferation
of high-speed WAN protocol work -- FAST TCP, High-Speed TCP, Scalable TCP,
SABUL, Tsunami and RB-UDP, just to name a few -- as well as the launch of a
workshop that is dedicated toward addressing the scalability problems of
today's TCP -- The International Workshop on Protocols for Fast Long-Distance
Networks.
HPC: What is the appropriate way to analyze costs vs benefits for HPC
interconnects?
WF: You need to be able to evaluate the requirements of your end users as well
as the level of investment in equipment and people resources that you are
willing to make to support the end-user requirements. For example, given that
one of the most important codes in bioinformatics -- BLAST: Basic Local
Alignment Sequence Tool -- is embarrassingly parallel, it does not require the
extraordinarily low latencies that source-routed network technologies provide,
but it does oftentimes require the ability to seamlessly move large amounts of
data to/from the SAN and across the WAN. If both the SAN and WAN are
IP-routed, system and network administrators only have to deal with one type
of network infrastructure rather than two. Thus, Gigabit Ethernet would be an
ideal, low-cost choice for this particular application.
HPC: What evolutionary patterns have clearly emerged as you view the HPC
networking picture over the last five years?
WF: 1) The elimination of host-interface network bottlenecks that were
obstacles in achieving high-speed network performance from end host to end
host. 2) The push to make commodity networking technologies (i.e. Ethernet)
competitive for HPC. (Note: This is not unlike how Intel and AMD have pushed
their commodity processors into the HPC mainstream.)
With respect to the first item, the HPC community realized the need for
OS-bypass protocols back in the early '90s. In the past few years, we are
finally seeing the networking community as a whole embrace OS-bypass
protocols, or more specifically, remote direct memory access (RDMA). For
instance, the Internet Engineering Task Force (IETF) is now pushing its
"remote direct data placement" (RDDP) protocol, which is effectively just
another name for RDMA.
With respect to the second item, I predict that you will see commodity
networking technologies like 10-Gigabit Ethernet make even more in-roads into
HPC. That is, it is striving to move from being a low-cost commodity
interconnect to also being one that can be used in HPC, much like what Intel
and AMD have done in transforming their processors from being commodity
processors to also being processors for HPC. (It wasn't that long ago that one
would only expect to see an Intel or an AMD on the desktop. Now we see them in
a sizable number of HPC clusters.) One of the mechanisms that will help
Ethernet bridge the "commodity-to-HPC" gap is the aforementioned RDDP/RDMA
protocol.
HPC: Is there anything else the readers should consider about this topic?
WF: The two HPC trends in networking to look out for are 1) the transformation
of commodity Ethernet to better support HPC requirements (e.g. RDMA over
TCP/IP over PCI Express) and 2) the continued development of Infiniband,
specifically for HPC and transaction processing.
|