![]() |
|
| The global publication of record for High Performance Computing - LIVEwire Edition / November 18, 2003: Vol. 10, No. 1 | |
|
||||
GRIDS@SC:TerraGrid: UNLEASHING CLUSTERED COMPUTINGTerrascale recently announced TerraGrid: a revolutionary new I/O delivery platform for Linux clusters and heterogeneous distributed computing environments. TerraGrid eliminates the requirement for custom-built distributed/clustered file systems currently required to facilitate a unified namespace and provides the highest performing, most scalable and easiest to manage I/O delivery platform currently available. TerraGrid has been designed from the ground up to increase performance, deliver linear scalability and provide ease of manageability of huge datasets, while enabling clients to leverage the vast body of open-source tools, utilities and applications currently available for Linux. Implemented as an intelligent network driver, TerraGrid enables existing "standalone" open source file systems to be deployed on thousands of cluster nodes while simultaneously accessing a unified namespace at tens of GB per second of bandwidth and sustaining millions of I/O operations per second. TerraGrid delivers block-level parallelism, thereby ensuring that all meta-data and data accesses are striped across the installed pool of storage servers. Because TerraGrid is compatible with existing Linux tools and utilities, it can be installed and managed by Linux administrators with little or no additional training. High availibility is built in at the core of the design when existing Linux software RAID facilities are appropriately configured. In most cases during failure, applications proceed normally to completion without disruption. Deployment is extremely flexible and employs low cost commodity off-the-shelf hardware components. No specalized box products, adaptors or equipment are required for deployment. Since TerraGrid is purely a software software product, clients are free to work with their hardware vendor of choice, thereby avoiding being locked into proprietary architectures or paying non-commodity prices for commodity parts. This approach ensures that TerraGrid deployments have the lowest TCO of any storage delivery platform. As CPU horsepower continues to accelerate up the ramp defined by Moore's Law and network performance continues to grow in log-scale leaps, a TerraGrid investment delivers even higher performance and actually provides higher returns year after year. TerraGrid provides parallel data paths between the compute nodes and I/O nodes in a Linux cluster and does away with the requirement for metadata controllers, thereby eliminating the inherent performance and capacity bottlenecks found in other vendors' solutions. The result is that the even sub-entry TerraGrid deployments deliver multiple Gigabytes/sec of sustained bandwidth while competing vendors with products that cost a multiple of TerraGrid licenses continue to quote performance numbers in Gigabits/sec. Initial performance measurements show that each TerraGrid -enabled compute node can sustain 100 MByte/sec of sustained single stream I/O and can scale linearly to hundreds of nodes until either the network and/or the pool of I/O servers is completely saturated. The consequence of linear scaling is that TerraGrid, on the high end, delivers more than 100X the bandwidth of the alternatives. TerraGrid also dramatically lowers the cost of managing data storage by supporting Gigabytes to Petabytes of data capacity growth within a single, easily managed namespace without requiring any dowtime on the client nodes. From a connectivity perspective, TerraGrid fully supports native file systems on Linux, NFS for Unix client access and CIFS for Windows client access. This enables seamless growth of a TerraGrid installation within all user environments while consolidating TerraGrid connected storage into a unified, fast and resilient namespace. ArchitectureThe following figure provides an overview of a TerraGrid deployment:
It bears mention that, in a TerraGrid deployment, the same node can act as a client and a server. Architectural BenefitsPerformance: TerraGrid eliminates the requirement for serial meta-data controllers and provides block-level striping of all meta-data and data accesses. Consequently, linearly scalable wire-speed bandwidth is delivered to all client nodes until the either the network or the aggregated physical storage are completely saturated. In preliminary testing, a testbed consisting of 12 single-CPU clients and 11 single-CPU servers connected to a Gigabit Ethernet switch, it was possible to attain 1,214 MBytes per second of sustained throughput even though each client was generating only a single stream of I/O requests. This translates to over 100 MBytes/sec of sustained throughput per client over a link with a theoretical limit of 125 MBytes/sec. The following figure provides a synopsis of performance measurements taken on a cluster with 12 servers: Management: The architecture of TerraGrid provides opportunities to minimize or eliminate time-consuming management tasks. For instance, the core tasks associated with managing and optimizing data layout are eliminated. The use of software striping of all clients ensures that data is distributed evenly across I/O servers. Also, when additional physical storage is incorporated into the storage system, it can be incorporated into the existing namespace by merely creating a new volume and dynamically expanding the file system to encompass this volume. Traditional Linux administration tools are utilized to manage TerraGrid, elminating the need for specilized training of systems administrators. Security: Storage has typically relied on private networks to guarantee the security of the system and authentication of the clients. Terrascale's TerraGrid enhanced Linux filesystem suports and enforces the standard Linux/Unix security features for read write and execute. In addition TerraGrid can deploy standards based authentication and encryption including Kerberos and IPSec. This comprehensive security gives you the confidence to use more easily accessible networks, such as Ethernet for storage transactions. Standards: iSCSI, approved as a standard in early 2003 and is in the process of becoming a mass-marketed technology. Terrascale is committed to drive the deployment of standards for high performance storage platforms. Currently we are tracking the iSCSI working group and interacting with industry vendors to ensure that open-source implementations of iSCSI are available and driven through an open standards process. Block level protocols, and in particular iSCSI, are frequently hailed as the genesis of IP-based storage networking. While protocol developments are encouraging, they are only a piece of what is required to deliver the true benefits of cost-effective storage networking. A true IP SAN can deliver the benefits of shared storage-server consolidation, increased capacity utilization, increased performance, more efficient data protection, and high availability -- but much more is required than the base protocol. Although there are many switch and fabric vendors, applications have not been able to effectively exploit fabric capabilities due to the absence of fabric enablers that permit real-world applications to perform efficiently and scalably. Terrascale's TerraGrid product is the first fabric enabler that bridges the gap between network hardware capabilities and achievable I/O throughput -- it turns your commodity TCP/IP switch into a massively scalable IO fabric and allows existing applications to instantly exploit the full power of the network with out any modifications to the application or the network. Flexibility: TerraGrid net utilizes widely available TCP/IP networking technologies, commodity servers and serves as an enabler that unleashes the full power of open source Linux. Given that there are no proprietary bits and pieces required, a number of deployment scenarios are made possible, enabling the most optimal configuration to be chosen for a given site. The three major deployment scenarios are: Hierarchical Cluster: Consists of diskless client nodes and dedicated I/O server nodes. This configuration is best suited for larger clusters (>64 nodes) wherein management complexity is reduced by separating I/O nodes from compute nodes. The compute nodes access a common namespace via open-source Linux file systems such as ext2. A subset of the "compute" nodes can run NFS/CIFS as the application, providing connectivity to non-Linux clients running Unix or Windows. Fault tolerance is achieved by configuring each compute node to run RAID5 across the I/O servers, thereby allowing continued access to data even if an entire I/O server fails or is deliberately taken offline for upgrades. Flat Cluster: This deployment scenario is ideal for smaller clusters (<64 nodes) wherein cost is of paramount importance but throughput and high availability are still required. In this scenario, each compute node runs the TerraGrid client and server -- locally attached storage (internal disk drives or external direct attached storage) is exported to all other nodes. The storage that is locally attached to each node becomes part of a global storage pool with a unified nodes. If software RAID5 is configured on each compute node, the failure of a single compute node will not bring down the entire cluster. Scalable NAS: For environments wherein a scalable NAS platform is desired, a variant of the flat cluster, can be deployed -- "compute" nodes now function exclusively as NAS servers. In this scenario, clients and "plug and scale" NAS bandwidth by merely adding additional NAS servers with low-cost direct attached storage. All NAS clients will have access to the same unified namespace via NFS and CIFS protocols. Summary Of TerraGrid Benefits
|
||||
| | Table of Contents | |