THE DATA WAREHOUSE ON S/390: PART 2
By Dan Graham, global strategy and operations executive,
Global Business Intelligence Solutions, IBM; and
Malcolm Nolan, marketing manager, Business Intelligence, S/390, IBM
In our previous installment, we put forward the thesis that, because about 75 percent of warehouse data resides in mainframes, those mainframes could better leverage return on investment by serving the warehouse. Here we discuss growth.
The S/390 Parallel Enterprise Server has remarkable scalability. You can add gigabytes of storage without having to add another engine - meaning you can add direct access storage devices (DASD) without instantly buying more processing power.
For clarity in this discussion we call the whole machine the server. Typically it is composed of a number of servers each of which may contain up to ten engines or processors. In effect we are describing what has long been known as an SMP. The S/390 is a proud descendant of the SMP pedigree. A customer can start with a single processor and add more. This means that a user can follow the demand curve, not have to invest before growth takes place.
When you connect up to 32 of these ten-engine servers together - that's to say multiple SMPs running OS/390 - the whole unit becomes an S/390 Parallel Sysplex cluster, which a user can configure dynamically to meet workload demands. In a Parralel Sysplex cluster, each "server" is a complete SMP running its own copy of the operating system. When we ran tests at IBM's Teraplex Center, we found as we added more servers or digital access storage devices our scalability was in the 95 to 96 percent range (where 100 percent represents perfect scalability). We have yet to reach a point of diminishing returns.
In 1996, ITG published a paper on scalability. With 50 to 99 users, ITG calculated the cost of growing the capacity of S/390 servers at about 1.5 times lower than centralized UNIX servers; with 1,000 users it was over 3 times more cost-efficient.
Another advantage we gain is that partitioning on the S/390 is a very mature technology. Starting in 1984, it has been honed to a fine art. You can use nearly every utility, function or tool to manage one or more partitions. In the S/390 world you create a new partition, load it with new data, and within seconds the server has integrated the additional data into the system.
By contrast, the UNIX community discovered partitioning more recently, so it has not yet reached into every utility that a customer is likely to need if he or she goes the UNIX route.
One way to think of the partitioning challenge is to imagine the difference between a photograph and a holographic image. If you cut a corner off a photograph you have lost that piece of visual information. If you cut the same area off a holographic gel the remaining part retains the whole image: the only difference is that the complete picture is a bit fuzzier than it used to be.
What we mean is that if you need to do a repair job on (say) Partition 27, you can do it without disrupting the whole system. You have what we call "partition independence." In many UNIX systems you often have to perform an operation on the entire database or relational table, which means that, in order to clean up a small part of the database, you end up reprocessing the whole thing. Usually, the time to do this is prohibitive.
As far as the S/390 servers are concerned, this is the best part of the story for customers whose system confronts major demands from concurrent users.
The best/worst case scenario goes like this: Let's say we have a parallel database engine for parallel data warehousing. This is wonderful for User 1, who can access every central processing unit (CPU) and I/O channel in the system. But the paradox is that User 2 already has trouble, since User 1 has commandeered the bells and whistles. User 3 barely gets a look in.
Some of our competitors have gone to great trouble to build and rebuild their priority subsystems in order to get resources to work on a share-efficient basis. The trouble is that they have not managed to achieve a capability of managing multiple users concurrently. So the "Query from Hell" ties up a host of resources while users with 30 second queries end up taking 30 minutes.
S/390 can effectively manage large numbers of concurrent users without compromising response time or data access.
When it comes to server selection, customers pride themselves on being able to assess the total cost of ownership. We have both had customers tell us that the entry price of a UNIX machine is less than a mainframe. It may be. But in terms of total ownership and operational costs we think that the S/390 is competitive.
One customer said: I have additional costs associated with the UNIX data warehouse that I don't have with a mainframe. I know, for example, that the disk drive of a mainframe has a higher reliability factor. With a UNIX warehouse, I have to create additional indexes, mirroring and duplicate copies which can double my cost and time of management. The mainframe also gives the added benefit of being able to run data compression separate from the applications and from DB2. So, if I'm talking S/390, I can run a 300 GB warehouse with about 500 GB. If I ran the equivalent warehouse on UNIX, I might need a terabyte or more to handle the same application. In short, while UNIX disks are less expensive, the total costs may be similiar in the end.
Vendors and customers are always wrestling with this conundrum: Which system truly costs less overall? Well, let's yield some ground here: even if both systems cost the same, the S/390 would cost less in terms of system outages and downtime - don't forget S/390's leading-edge reliability - plus data compression and a reduced worry load.
Much earlier we talked about the historical segregation of the warehouse from the operations server. But now we have the technology to reunite the different functions; and some customers are beginning to explore the notion of running their warehouse and operational needs off the same server. Combine Workload Manager with Parallel Sysplex scaling capacity and you can bring analytics into the transaction path. We are already there.
With DB2 Version 5 you can split a query across the engines on several servers. By 1999, Version 6 will fully enable multimedia for text, audio and video. Any customer who is thinking about an enterprise class server with full Web enablement should bear S/390 in mind. From the standpoint of infrastructure, anything an end user might need is available on the mainframe.