[ PREVIOUS ARTICLE | Table of Contents | NEXT ARTICLE ]

BALANCING THE LOAD FOR SCALING UP TO BIG TASKS


There's a performance ceiling that individual servers struggle to pierce to meet the demands of an on-line world.

You can only stuff so many processors in a box where they share resources and competition for these shared resources becomes severe.

Each new processor spends much time fighting for resources and much management effort has to be devoted to sorting out the competition, managing where all the data is, and which processor to send it to. So much so that you typically can't get beyond four processors.

So enter clusters. Just connect lots of four-way processor boxes together in clever ways, get them to share the load and failover to each other if one box fails, and make it look to a user as if it were one large server.

All this without them being aware that if one server went down, others would take over the job.

Theory is proving easier than practice, as the clever ways of connecting rely on Microsoft and Novell, which are having problems getting beyond two nodes in a cluster.

Moves are afoot to borrow some mainframe ideas and introduce switched dedicated communication channels between components and processors, instead of sharing a common bus: this will push Intel servers to eight processors a box.

There are some proprietary technologies which can push NT to 72 processors. UNIX can regularly beat up Microsoft and Novell in the number of processors it can support, and so has less need to cluster.

Another neat trick in getting these servers to scale up to very large tasks - especially for large web sites - is to load balance across several servers.

Special software, or specially modified network devices, can route requests to the least busy of identically-configured servers. Should a server go down, it can be ignored and the request routed to the next server.

All this appears puny compared with supercomputers which are built from many processors and bolted together in massive, parallel architectures.

Most MPP applications have been for problems which can be broken down into many separate, independent operations on vast quantities of data.

In data mining, there is a need to perform multiple searches of a static database. In artificial intelligence, there is the need to analyse multiple alternatives, as in a chess game.

MPP machines are very powerful, but also very expensive.

A recent 1Tflop (tera floating point operation per sec - that's about 5,000 times as fast as your typical Pentium III PC) - machine installed by IBM at a nuclear research lab cost pounds 12.5 million. This was an updated version of Deep Blue, the chess computer which outplayed Gary Kasparov.

IBM has built Blue Pacific with 5,856 processors and 3.8 Tflops of performance and have an order for a 10 Tflop machine and a design on the drawing board for a 100 Tflop machine.

But cash-strapped computer scientists have an ingenious system known as - Beowulf -clustering that grew out of a NASA project.

Beowulf enables MPP systems to be assembled from Linux and cheap commodity Intel PCs and Ethernet.

The first Beowulf ran on 16 486 DX4s in 1994. By 1997 a Beowulf cluster of 140 P6 processors was achieving 10Gflops, a lot less than the IBM stuff but it was only costing pounds 62,500.

There is even a Beowulf built from 128 discarded (that is free) 486 machines.

Just recently oil firm Amerada Hess replaced a pounds 1.25 million IBM system with a pounds 81,250, 32-node 500Mhz Pentium III Beowulf cluster to render 3-D images of the seabed from terabytes of data.

Amerada only gets 80 per cent of the performance of the IBM system it junked, but at that price it is well-pleased.

Perhaps the most imaginative 'supercomputer' is the seti@home screensaver downloaded by over 500,000 users from http://setiathome.ssl.berkely.edu which grabs chunks of radio telescope data over the Internet and then uses spare processor cycles on desktop PCs for analysis in the search of extra terrestrial intelligence and uploads the results.

Not many institutions have that many processor cycles at their disposal.


[ PREVIOUS ARTICLE | Table of Contents | NEXT ARTICLE ]