Implementation Guides

The Tech behind Supercomputing

PCQ Bureau

10 May 2002 11:34 IST

New Update

What is a supercomputer? There’s no fixed definition for that one. At any given point of time, a supercomputer is the one that can do more number crunching per second than most others. Note, maximum number crunching and not the fastest. In fact, most of the supercomputers are based on comparatively slower processors. Coming back, as the processing capability of computers increases, the basis of achieving supercomputerdom also moves forward. Many PCs today have more processing power than the record-breaking supercomputers of yesterday.

Advertisment

Incidentally, the IT industry does not call these machines supercomputers. They are better known as high performance machines and the area is known as HPC (High-performance Computing).

How exactly do you create a supercomputer? There are three popular architectures for a supercomputer. The earliest is vector processing, made famous by Cray. Then came parallel processing, which is today the most widely used method for creating supercomputers. Beowulf, using commodity machines to build high-performance clusters, emerged out of this. The third architecture is grid computing. Distributed computing, using idle CPU time, may be considered to be the forerunner of grid computing. Let’s look at each of them in detail.

Vector supercomputers

Initial supercomputers were vector supercomputers. These are machines optimized for applying arithmetic operations to large arrays (vectors) of data. The original Crays are the best known of vector supercomputers. Vector supercomputers find application in engineering areas like automobile design and crash testing, and in solving scientific problems like weather prediction.

Advertisment

The

first supercomputer

The Cray-1 is the first acknowledged supercomputer. Designed and built by

Seymour Cray, the father of supercomputing, the first Cray-1 cost $8.8

million and was installed at the Los Alamos National Laboratory, USA, in

1976. It had 160 megaflops (million floating point operations per second) of

computing power and 8 MB of main memory!

The Cray-1 had a unique C shape, which enabled the circuitry to be close together (no wire was longer than 4 feet), and used a Freon-based refrigeration system.

The first multi-processor supercomputer is also from Cray, the Cray X-MP, introduced in 1982. The first computer to cross a gigaflop was also a Cray, the Cray Y-MP of 1988. The first wireless supercomputer was yet again a Cray, the T90.

Cray, the company, has had an interesting history. Seymour Cray founded Cray Research in 1972. He left the company in 1980 to work as an independent contractor to develop successors to Cray-1. In 1989, Cray Research spun off Cray Computer Corporation under Seymour Cray. In 1996, Cray Research merged with Silicon Graphics (now SGI). In 1999, SGI created a separate Cray unit, which was sold to Tera Computer Company in 2000. The company was subsequently renamed Cray Inc.

Increasingly, clustered supercomputers have overtaken vector supercomputers in peak performance capabilities.

High-performance parallel processing Instead of a single machine with supercomputing capabilities, what you have here is a group of machines (or processors) that split the computer load amongst themselves. Unlike vector supercomputers, parallel-processing machines can scale significantly. For example, ASCI White, the IBM supercomputer that is currently the top of the supercomputing charts, has 8000 processors!

Advertisment

In effect, this is a cluster of computers. Talking of clusters, most of us are familiar with failover clustering, where, if one server in a cluster were to fail, then another would take over. Then there is load-balancing clustering, like in Web servers, where different requests are sent to different servers in the cluster. This is the third type of clustering, called high-performance clustering, with all machines in the cluster operating simultaneously, with the objective being to get better number- crunching capabilities. Here, some machines in the cluster split the computational load, while some others handle data I/O, and yet others act as controllers. All of them are connected by multiple high-speed networks.

HPCs (High-performance Clusters) have overtaken vector supercomputers in peak performance capability, and most of the top performers in the list of supercomputers (see the CD for the top 500) today are high performance clusters. Vector

supercomputer vendors claim that clusters cannot produce the sustained performance that their machines can.

Clusters with over a thousand processors are generally called MPPs (Massively Parallel Processors).

Traditionally, high-performance clusters have been deployed in research installations as against industrial usage, Slowly, clusters are entering into industrial usage, the most visible being in the movie industry.

Advertisment

Commercial-off-the-shelf clusters

One offshoot of high-performance clusters is the attempt to make high-performance clusters out of COTS (Commercial Off The Shelf) parts. Basically, you are trying to create supercomputers out of plain vanilla PCs, in an attempt to drive down the cost of the cluster, and the cost per GFlop of computational power.

Myth vs reality
There are many misconceptions about supercomputing. What are they and what’s the reality they hide?
	Supercomputers need very fast processors. Not true. Often supercomputers use very slow processors. India’s Flowsolver uses 75 MHz CPUs. IBM’s Bluegene is to be built around embedded PowerPC processors
	Supercomputers are costly to build. This one is true, though not fully. Supercomputers usually take millions to build. But then they deliver a lot of computing power, too. A more useful measure of supercomputer costs is cost per GFlop. Then again, the SETI@home project that has the largest number-crunching power has spent only under $500,000 to build its infrastructure
	A supercomputer is one huge, single machine. Some of the older ones were. Today, supercomputers are typically built out of modules. In most cases, at least storage and processing are separate. In the case of parallel systems, you actually put many individual computers together. So, they are not one single entity
	Supercomputers are only for military work. No. While a lot of supercomputers are used for military work. A larger number is used for civilian applications like automobile design, scientific research, movie creation, weather forecasting and structural design
	Supercomputers use a different OS. No. Supercomputers use all types of OSs. You can build one on Windows or on Linux or on AIX or Solaris. But none of these OSs are used as-is. You will need to put in extra software to make all the units communicate to each other. In dedicated high-performance machines, you will need to use high-performance OS components, like a high-performance kernel
	Not everybody can build a supercomputer: Absolute rubbish. Even you can build one. In the second half of this story, we show you how to build your own supercomputer

The Beowulf cluster is the result of this effort to create a one Gflops machine under $50k. The Beowulf project was started at the Goddard Space Flight Centre, USA by Donald Becker. In 1994, Becker and Thomas Sterling built the first Beowulf cluster with 16 nodes, using Intel 486 based PCs, at a cost of $40,000.

Advertisment

Today, Beowulf indicates a genre of high-performance clusters built out of commonly available parts, running Linux or Windows. One of the plus points of a Beowulf cluster is that almost anyone can build a high-performance cluster, as we will be showing you later on how to.

Grid processing

Grid computing is the new buzzword in supercomputing. The beginnings of grid computing can be traced to the SETI@home project (see Searching for ET). The basic idea is like this: All PCs are idle for significant amounts of time. Can you use the idle time across millions of PCs (and networks) to do useful work? The SETI@home project showed that you could indeed do it, and that in the process, you can beat the most powerful of supercomputers hollow in processing power.

Grid computing takes this idea one step further, from a do it if you are interested activity to a way of harnessing idle computing power in large networks. An HPC has to be built at one location with dedicated buildings and machines. A grid, on the other hand, connects existing networks across locations over the Internet. Grid computing is still very much in its infancy, with the first grid function for a short time in 1995, connecting 17 sites across North America using high-speed links. The standards for grid computing are still being evolved.

Grid computing holds promise not only for research institutions, but also for corporates who have large networks and large quantities of number crunching to do. Theoretically, with grid computing, you can use your networked PCs to do your

datamining, instead of having huge servers specially installed for the purpose.

Krishna Kumar

Advertisment