What is a supercomputer? There’s no fixed definition for that one. At any given point of time, a supercomputer is the one that can do more number crunching per second than most others. Note, maximum number crunching and not the fastest. In fact, most of the supercomputers are based on comparatively slower processors. Coming back, as the processing capability of computers increases, the basis of achieving supercomputerdom also moves forward. Many PCs today have more processing power than the record-breaking supercomputers of yesterday.
Incidentally, the IT industry does not call these machines supercomputers. They are better known as high performance machines and the area is known as HPC (High-performance Computing).
How exactly do you create a supercomputer? There are three popular architectures for a supercomputer. The earliest is vector processing, made famous by Cray. Then came parallel processing, which is today the most widely used method for creating supercomputers. Beowulf, using commodity machines to build high-performance clusters, emerged out of this. The third architecture is grid computing. Distributed computing, using idle CPU time, may be considered to be the forerunner of grid computing. Let’s look at each of them in detail.
Vector supercomputers
Initial supercomputers were vector supercomputers. These are machines optimized for applying arithmetic operations to large arrays (vectors) of data. The original Crays are the best known of vector supercomputers. Vector supercomputers find application in engineering areas like automobile design and crash testing, and in solving scientific problems like weather prediction.
|
Increasingly, clustered supercomputers have overtaken vector supercomputers in peak performance capabilities.
High-performance parallel processing Instead of a single machine with supercomputing capabilities, what you have here is a group of machines (or processors) that split the computer load amongst themselves. Unlike vector supercomputers, parallel-processing machines can scale significantly. For example, ASCI White, the IBM supercomputer that is currently the top of the supercomputing charts, has 8000 processors!
In effect, this is a cluster of computers. Talking of clusters, most of us are familiar with failover clustering, where, if one server in a cluster were to fail, then another would take over. Then there is load-balancing clustering, like in Web servers, where different requests are sent to different servers in the cluster. This is the third type of clustering, called high-performance clustering, with all machines in the cluster operating simultaneously, with the objective being to get better number- crunching capabilities. Here, some machines in the cluster split the computational load, while some others handle data I/O, and yet others act as controllers. All of them are connected by multiple high-speed networks.
HPCs (High-performance Clusters) have overtaken vector supercomputers in peak performance capability, and most of the top performers in the list of supercomputers (see the CD for the top 500) today are high performance clusters. Vector
supercomputer vendors claim that clusters cannot produce the sustained performance that their machines can.
Clusters with over a thousand processors are generally called MPPs (Massively Parallel Processors).
Traditionally, high-performance clusters have been deployed in research installations as against industrial usage, Slowly, clusters are entering into industrial usage, the most visible being in the movie industry.
Commercial-off-the-shelf clusters
One offshoot of high-performance clusters is the attempt to make high-performance clusters out of COTS (Commercial Off The Shelf) parts. Basically, you are trying to create supercomputers out of plain vanilla PCs, in an attempt to drive down the cost of the cluster, and the cost per GFlop of computational power.
|
The Beowulf cluster is the result of this effort to create a one Gflops machine under $50k. The Beowulf project was started at the Goddard Space Flight Centre, USA by Donald Becker. In 1994, Becker and Thomas Sterling built the first Beowulf cluster with 16 nodes, using Intel 486 based PCs, at a cost of $40,000.
Today, Beowulf indicates a genre of high-performance clusters built out of commonly available parts, running Linux or Windows. One of the plus points of a Beowulf cluster is that almost anyone can build a high-performance cluster, as we will be showing you later on how to.
Grid processing
Grid computing is the new buzzword in supercomputing. The beginnings of grid computing can be traced to the SETI@home project (see Searching for ET). The basic idea is like this: All PCs are idle for significant amounts of time. Can you use the idle time across millions of PCs (and networks) to do useful work? The SETI@home project showed that you could indeed do it, and that in the process, you can beat the most powerful of supercomputers hollow in processing power.
Grid computing takes this idea one step further, from a do it if you are interested activity to a way of harnessing idle computing power in large networks. An HPC has to be built at one location with dedicated buildings and machines. A grid, on the other hand, connects existing networks across locations over the Internet. Grid computing is still very much in its infancy, with the first grid function for a short time in 1995, connecting 17 sites across North America using high-speed links. The standards for grid computing are still being evolved.
Grid computing holds promise not only for research institutions, but also for corporates who have large networks and large quantities of number crunching to do. Theoretically, with grid computing, you can use your networked PCs to do your
datamining, instead of having huge servers specially installed for the purpose.
Krishna Kumar