Next Article Table of Contents Previous Article

SANs: A MEETING OF THE REVOLUTIONARY AND THE EVOLUTIONARY
by Philip Buonadonna

Computer systems today are on the verge of a major architectural revolution, one that might fundamentally change the way systems are built and utilized: the I/O network. The impetus behind this change is the need for systems that perform in the face of ever-increasing interconnectivity. The Internet, the evolution of Post PC devices, computing clusters and other factors are increasing the demand for I/O bandwidth. However, present system I/O architectures form a bottleneck that limits performance. Processing units communicate with I/O devices using simple load/store semantics across a shared bandwidth bus, such as the Peripheral Component Interconnect (PCI), which extends at most a few feet from the processor, supports a limited number of devices and minimal fault-tolerance. Additionally, it supports limited management in terms of monitoring and on-line maintenance capability. Thus systems are now faced with a new obstacle: a "tyranny of I/O" in which the demands far outpace present capabilities.

A proposed solution to this issue is to merge present I/O architectures and modern networks into a new paradigm of I/O based networks. Processor/Memory combinations and I/O devices (e.g. disks and WAN adapters) are connected directly to a switched interconnect fabric and communicate through network-oriented protocols. The switch-based design permits a large array of devices to be connected in a manner that provides scalable throughput and the network protocols provide for a high degree of fault-tolerance. The I/O network also provides advanced manageability in terms of both monitoring and on-line repair. Finally, the I/O network additionally supports processor-to-processor communication to provide both high compute capacity and redundancy. All this through a single network connection. The class of networks that support this concept of networked I/O are called System Area Networks (SAN).

There is a spectrum of efforts aimed at SAN designs for networked I/O. At one end are the revolutionary architectures that are specifically tailored for this application. The prime example here is the Infiniband network architecture, www.infinibandta.org. Infiniband is a bottom-up network design that provides operating system independent communication over a switched network fabric. The specification is broad and includes the physical link, network and transport protocols, management functions and the programmatic abstractions. At the other end of the scale are the more evolutionary approaches that seek to apply existing architectures to networked I/O. Modern inter-network protocols, i.e. TCP/IP and its related applications and protocols, over traditional links such as Ethernet form the majority here.

Thus, the question that arises is what is the architecture that enables the goals of the I/O network to the largest space of applications. Possible scenarios include a legacy TCP/IP based system, a future variant of Infiniband or perhaps something completely different. This article looks at Infiniband and modern inter-networks and how we are examining a possible combination of the two to solve the networked I/O problem.

Infiniband

Infiniband is the logical merger of several industry efforts (i.e., Next Generation I/O and Future I/O) in network based I/O architectures. The core concept of this design is to separate host processors/memory combinations and I/O devices by a switched network fabric, effectively eliminating the traditional I/O bus. A Host Channel Adapter (HCA) connects the host memory controller to the network while a Target Channel Adapter (TCA) is the interface for the individual I/O devices. The TCA is similar to the HCA, but can be simplified according to the requirements of the attached device(s). Interconnecting channel adapters are the Infiniband switches themselves. The switch is intelligent and provides several functions including inter-subnet routing, management, topology discovery and differentiated service.

The fundamental transport interface supported by the HCA/TCA is the work queue pair (QP). Each QP consists of a dual queue of send and receive communication metadata known as work requests. The semantics permit matched send-receive operations in which a send & receive work requests are implicitly paired between the source and destination. The QP also supports remote DMA operations that permit a source to read or write directly from a target's address space. The data payload exchanged between QPs is sourced/sinked to special memory regions established by the application that are registered with the communication provider.

Both channel adapters and switches implement Infiniband specific network and transport protocols. Connections between QPs on channel adapters may be one-to-one or one-to-many and have either reliable or unreliable delivery guarantees. Message-level flow control is provided based on receive credits and NAK's.

Infiniband claims numerous benefits for SAN applications. Principal among these is that it provides a standard for high performance computing and I/O communication. The network is now a first class citizen connecting processors and devices in a fault-tolerant and scalable manner. The QP interface abstraction provides a flexible, user-level interface that permits high performance independent of the operating system. The communication protocols are lightweight and make effective use of available resources. To ensure scalability, individual Infiniband links can be concatenated to form "fat-pipes" as the network grows. At higher-levels, the architecture includes provisions for service discovery to simplify communication between specific applications. It also has an extensive management specification that enables comprehensive control of the SAN and it's resources. And the list goes on...

Infiniband vs. TCP/IP

The case for a completely new architecture such as Infiniband stems from claimed shortcomings of existing systems. Of particular interest is Infiniband's departure from established network protocols, namely the inter-network protocol suite.

The inter-network protocols, i.e. the TCP,UDP,IP suite of protocols, was developed on the basis of being network independent. Inter-network protocols make no assumptions about the underlying network in terms of reliability mechanisms, error detection/correction, transmission capacity or other link features. Hence, the TCP/IP suite implements it's own mechanisms. To ensure portability, the protocol stack is implemented in software on the host with specialized kernel drivers added to connect to various links. The end-to-end nature of these protocols leads to cautious flow and congestion control schemes and inhibits the use of cooperative intelligence in the network infrastructure. Also, the TCP/IP suite utilizes the fairly generic sockets interface with a simple byte-stream data transfer model. All of these lead to perceived performance concerns within the SAN regime in terms of host processing overhead, network dynamics and application efficiency. Infiniband attacks these problems through link-aware protocols, hardware assist and the queue pair abstraction.

However, the success that inter-network protocols have seen to date cannot be overlooked. Much of this success, ironically, is attributable to the fact that TCP/IP provides connectivity between heterogeneous systems. This has enabled a wide array of applications to be deployed across different connected platforms. The inter-network protocols are also an established set of standards that, thorough organizations such as the IETF, have a continuous cycle of open feedback and development. Additionally, a great deal of effort, and expense, has been invested into understanding and managing inter-network based systems. Introducing an entirely new network, such as Infiniband, would require a new cycle of training and management costs.

Also, it is not clear that performance concerns surrounding inter-networks in the SAN regime are completely justified. A case in point is the modern network routing switch. Using hardware assist mechanisms, these devices are capable of switching IP packets at gigabit-plus speeds over commodity links such as Ethernet. These switch-routers are also becoming highly intelligent and can perform a variety of high-level tasks such as load balancing and web indexing. This coupled with the speed and reliability these devices provide are well on par with the requirements for SANs. Presently there are groups exploring the use of inter-network standards for SANs and networked I/O. Notable among these are the IP Storage working groups of the IETF, www.ietf.org/html.charters/ips-charter.html, and the Netstation group at the USC Information Sciences Institute, www.isi.edu/div7/netstation.

A Unified Theory?

The above discussion brings us back to the question posed at the beginning of this article: what is the architecture that will drive the future of SANs? Some might argue for native Infiniband while others would staunchly support inter-network protocols over high-speed links. However, it might also be possible to adopt the best of both architectures towards a more unified approach.

Our own hypothesis is that the QP interface over standard inter-network protocols with hardware assist will adequately support the demands of the SAN and networked I/O. On one side, the Infiniband QP interface provides a lightweight and flexible memory-based abstraction for network communication at user-level. It is a natural interface for hardware based communication adapters to export as opposed to standard sockets. The QP abstraction itself can be used to implement other interfaces such as the Message Passing Interface or sockets. For example, Microsoft's Windows Sockets Direct Path, www.microsoft.com/WINDOWS2000/en/datacenter/help/WSD_Def.htm, technology layers sockets over SAN APIs including the virtual interface which is an antecedent of the QP and very similar in design. On the other side, using standard inter-network protocols (i.e. TCP/IP,etc) beneath brings all the advantages of established network to the SAN. It permits use of existing infrastructure, methodology and personnel training. It also brings an open forum of discussion and development from which it can evolve.

As part of the Millennium Project at the University of California, Berkeley, www.millennium.berkeley.edu, we are working to study how architectures like Infiniband and modern inter-networks can be combined. Using a programmable network interface, we are prototyping a system that exports a QP abstraction directly to applications and uses basic inter-network protocols over a switched network. Although still in the early stages of research, we have built a system that provides basic QP methods over a subset of UDP/IP (Note: we chose to use the IPv6 standard as we believe it applies more to the next generation of networks we are studying). Next, we plan to study reliable transport by implementing a subset of TCP into the adapter. Once this is done, we intend to examine various applications over this prototype communication architecture. Principal among these is distributed storage such as a user-level file system. The intent here is to better understand the inherent implications of the QP/inter-network combination.

Conclusion

Present day computer systems are on the verge of an architectural "about-face". Where once the goal was to put as many components (processors, memory, devices) into a single box, there is now an impetus to do the reverse. Realizing this change requires a SAN architecture that can support the demands of IPC and I/O, management, fault-tolerance, and scalability. There are many possibilities here: some revolutionary like Infiniband, while others more evolutionary like the Internet. History indicates that the systems that succeed tend to be more evolutionary than revolutionary. Take, for example, Ethernet and the x86 processor architecture. Networks are certainly not immune to this trend. Our own hypothesis is that the QP interface contributed from the Infiniband efforts combined with modern hardware assisted inter-network protocols will support the future demands of the SAN. This model essentially evolves inter-network concepts to be compatible with SAN applications. While, he debate over the "ideal" design will continue for some time, our own hope is to shed more light on the subject.

Top of Page


Previous Article  |  Table of Contents  |  Next Article