SANs: A MEETING OF THE REVOLUTIONARY AND THE EVOLUTIONARY
by Philip Buonadonna
Computer systems today are on the verge of a major architectural revolution,
one that might fundamentally change the way systems are built and utilized:
the I/O network. The impetus behind this change is the need for systems that
perform in the face of ever-increasing interconnectivity. The Internet, the
evolution of Post PC devices, computing clusters and other factors are
increasing the demand for I/O bandwidth. However, present system I/O
architectures form a bottleneck that limits performance. Processing units
communicate with I/O devices using simple load/store semantics across a shared
bandwidth bus, such as the Peripheral Component Interconnect (PCI), which
extends at most a few feet from the processor, supports a limited number of
devices and minimal fault-tolerance. Additionally, it supports limited
management in terms of monitoring and on-line maintenance capability. Thus
systems are now faced with a new obstacle: a "tyranny of I/O" in which the
demands far outpace present capabilities.
A proposed solution to this issue is to merge present I/O architectures and
modern networks into a new paradigm of I/O based networks. Processor/Memory
combinations and I/O devices (e.g. disks and WAN adapters) are connected
directly to a switched interconnect fabric and communicate through
network-oriented protocols. The switch-based design permits a large array of
devices to be connected in a manner that provides scalable throughput and the
network protocols provide for a high degree of fault-tolerance. The I/O
network also provides advanced manageability in terms of both monitoring and
on-line repair. Finally, the I/O network additionally supports
processor-to-processor communication to provide both high compute capacity and
redundancy. All this through a single network connection. The class of
networks that support this concept of networked I/O are called System Area
Networks (SAN).
There is a spectrum of efforts aimed at SAN designs for networked I/O. At one
end are the revolutionary architectures that are specifically tailored for
this application. The prime example here is the Infiniband network
architecture, www.infinibandta.org.
Infiniband is a bottom-up
network design that provides operating system independent communication over a
switched network fabric. The specification is broad and includes the physical
link, network and transport protocols, management functions and the
programmatic abstractions. At the other end of the scale are the more
evolutionary approaches that seek to apply existing architectures to networked
I/O. Modern inter-network protocols, i.e. TCP/IP and its related applications
and protocols, over traditional links such as Ethernet form the majority here.
Thus, the question that arises is what is the architecture that enables the
goals of the I/O network to the largest space of applications. Possible
scenarios include a legacy TCP/IP based system, a future variant of Infiniband
or perhaps something completely different. This article looks at Infiniband
and modern inter-networks and how we are examining a possible combination of
the two to solve the networked I/O problem.
Infiniband
Infiniband is the logical merger of several industry efforts (i.e., Next
Generation I/O and Future I/O) in network based I/O architectures. The core
concept of this design is to separate host processors/memory combinations and
I/O devices by a switched network fabric, effectively eliminating the
traditional I/O bus. A Host Channel Adapter (HCA) connects the host memory
controller to the network while a Target Channel Adapter (TCA) is the
interface for the individual I/O devices. The TCA is similar to the HCA, but
can be simplified according to the requirements of the attached device(s).
Interconnecting channel adapters are the Infiniband switches themselves. The
switch is intelligent and provides several functions including inter-subnet
routing, management, topology discovery and differentiated service.
The fundamental transport interface supported by the HCA/TCA is the work queue
pair (QP). Each QP consists of a dual queue of send and receive communication
metadata known as work requests. The semantics permit matched send-receive
operations in which a send & receive work requests are implicitly paired
between the source and destination. The QP also supports remote DMA operations
that permit a source to read or write directly from a target's address space.
The data payload exchanged between QPs is sourced/sinked to special memory
regions established by the application that are registered with the
communication provider.
Both channel adapters and switches implement Infiniband specific network and
transport protocols. Connections between QPs on channel adapters may be
one-to-one or one-to-many and have either reliable or unreliable delivery
guarantees. Message-level flow control is provided based on receive credits
and NAK's.
Infiniband claims numerous benefits for SAN applications. Principal among
these is that it provides a standard for high performance computing and I/O
communication. The network is now a first class citizen connecting processors
and devices in a fault-tolerant and scalable manner. The QP interface
abstraction provides a flexible, user-level interface that permits high
performance independent of the operating system. The communication protocols
are lightweight and make effective use of available resources. To ensure
scalability, individual Infiniband links can be concatenated to form
"fat-pipes" as the network grows. At higher-levels, the architecture includes
provisions for service discovery to simplify communication between specific
applications. It also has an extensive management specification that enables
comprehensive control of the SAN and it's resources. And the list goes on...
Infiniband vs. TCP/IP
The case for a completely new architecture such as Infiniband stems from
claimed shortcomings of existing systems. Of particular interest is
Infiniband's departure from established network protocols, namely the
inter-network protocol suite.
The inter-network protocols, i.e. the TCP,UDP,IP suite of protocols, was
developed on the basis of being network independent. Inter-network protocols
make no assumptions about the underlying network in terms of reliability
mechanisms, error detection/correction, transmission capacity or other link
features. Hence, the TCP/IP suite implements it's own mechanisms. To ensure
portability, the protocol stack is implemented in software on the host with
specialized kernel drivers added to connect to various links. The end-to-end
nature of these protocols leads to cautious flow and congestion control
schemes and inhibits the use of cooperative intelligence in the network
infrastructure. Also, the TCP/IP suite utilizes the fairly generic sockets
interface with a simple byte-stream data transfer model. All of these lead to
perceived performance concerns within the SAN regime in terms of host
processing overhead, network dynamics and application efficiency. Infiniband
attacks these problems through link-aware protocols, hardware assist and the
queue pair abstraction.
However, the success that inter-network protocols have seen to date cannot be
overlooked. Much of this success, ironically, is attributable to the fact
that TCP/IP provides connectivity between heterogeneous systems. This has
enabled a wide array of applications to be deployed across different connected
platforms. The inter-network protocols are also an established set of
standards that, thorough organizations such as the IETF, have a continuous
cycle of open feedback and development. Additionally, a great deal of effort,
and expense, has been invested into understanding and managing inter-network
based systems. Introducing an entirely new network, such as Infiniband, would
require a new cycle of training and management costs.
Also, it is not clear that performance concerns surrounding inter-networks in
the SAN regime are completely justified. A case in point is the modern network
routing switch. Using hardware assist mechanisms, these devices are capable of
switching IP packets at gigabit-plus speeds over commodity links such as
Ethernet. These switch-routers are also becoming highly intelligent and can
perform a variety of high-level tasks such as load balancing and web indexing.
This coupled with the speed and reliability these devices provide are well on
par with the requirements for SANs. Presently there are groups exploring the
use of inter-network standards for SANs and networked I/O. Notable among these
are the IP Storage working groups of the IETF,
www.ietf.org/html.charters/ips-charter.html, and the Netstation group
at the USC Information Sciences Institute, www.isi.edu/div7/netstation.
A Unified Theory?
The above discussion brings us back to the question posed at the beginning of
this article: what is the architecture that will drive the future of SANs?
Some might argue for native Infiniband while others would staunchly support
inter-network protocols over high-speed links. However, it might also be
possible to adopt the best of both architectures towards a more unified
approach.
Our own hypothesis is that the QP interface over standard inter-network
protocols with hardware assist will adequately support the demands of the SAN
and networked I/O. On one side, the Infiniband QP interface provides a
lightweight and flexible memory-based abstraction for network communication at
user-level. It is a natural interface for hardware based communication
adapters to export as opposed to standard sockets. The QP abstraction itself
can be used to implement other interfaces such as the Message Passing
Interface or sockets. For example, Microsoft's Windows Sockets Direct Path,
www.microsoft.com/WINDOWS2000/en/datacenter/help/WSD_Def.htm,
technology layers sockets over SAN APIs including the virtual interface which
is an antecedent of the QP and very similar in design. On the other side,
using standard inter-network protocols (i.e. TCP/IP,etc) beneath brings all
the advantages of established network to the SAN. It permits use of existing
infrastructure, methodology and personnel training. It also brings an open
forum of discussion and development from which it can evolve.
As part of the Millennium Project at the University of California, Berkeley,
www.millennium.berkeley.edu, we are working to study how architectures
like Infiniband and modern inter-networks can be combined. Using a
programmable network interface, we are prototyping a system that exports a QP
abstraction directly to applications and uses basic inter-network protocols
over a switched network. Although still in the early stages of research, we
have built a system that provides basic QP methods over a subset of UDP/IP
(Note: we chose to use the IPv6 standard as we believe it applies more to the
next generation of networks we are studying). Next, we plan to study reliable
transport by implementing a subset of TCP into the adapter. Once this is done,
we intend to examine various applications over this prototype communication
architecture. Principal among these is distributed storage such as a
user-level file system. The intent here is to better understand the inherent
implications of the QP/inter-network combination.
Conclusion
Present day computer systems are on the verge of an architectural
"about-face". Where once the goal was to put as many components (processors,
memory, devices) into a single box, there is now an impetus to do the reverse.
Realizing this change requires a SAN architecture that can support the demands
of IPC and I/O, management, fault-tolerance, and scalability. There are many
possibilities here: some revolutionary like Infiniband, while others more
evolutionary like the Internet. History indicates that the systems that
succeed tend to be more evolutionary than revolutionary. Take, for example,
Ethernet and the x86 processor architecture. Networks are certainly not immune
to this trend. Our own hypothesis is that the QP interface contributed from
the Infiniband efforts combined with modern hardware assisted inter-network
protocols will support the future demands of the SAN. This model essentially
evolves inter-network concepts to be compatible with SAN applications. While,
he debate over the "ideal" design will continue for some time, our own hope is
to shed more light on the subject.
|