The Tech behind Supercomputing

High-performance Computing with Beowulf Clusters

Designed to Fly

The Champs

Supercomputing is an interesting field, which, unlike the turbulent PC scenario, is associated with slow but steady improvements. However, every year a new performance record ends up being created. This year ASCI White from IBM replaced Intel’s ASCI Red as the fastest supercomputer. Another aspect of supercomputing is the cost of computing. Constant endeavors have been on to bring this down. Beowulf clusters have been used to build supercomputers cheap, and a price-performance ratio of less than $1,000 per GFlops (giga, or one billion, floating point operations per second) has been reached during the year. Traditional supercomputers cost about $10,000 per GFlops, while Beowulf clusters till now have costed around $3,000 per GFlops.

ASCI White

Installed at the Lawrence Livermore National Laboratory (LLNL), US, this massive machine is part of the Advanced Strategic Computer Initiative (ASCI) of the US Department of Energy. It’s used to test nuclear weapons without actually conducting explosions, and other energy-related research. The machine has 512 nodes, each containing 16 Power3-III CPUs running at 375 MHz each, totaling to 8,192 processors. It has 6.2 terabytes of memory, and 160 terabytes of storage. ASCI White takes up floor space equivalent to two basketball courts, and weighs 106 tons. Its maximal performance, measured using the Linpack benchmark (see box "How to benchmark a supercomputer?"), is 4,938 GFlops.

Is it built from ground up? Not exactly. The machine is a scaled-up version of IBM’s RS/6000 SP (Scalable Parallel) Power 3 server. Some of the fastest supercomputers in the world today use this architecture.


Built last year, this system is installed at Sandia National Labs, Albuquerque, US, and is employed in research by the US Department of Energy. It clocks a maximal performance of 2,379 GFlops, and is the second-fastest supercomputer. This is a distributed memory MIMD (multiple instruction, multiple data), message-passing machine. That is, each CPU in the machine has its own memory, and is connected to other CPUs to enable data exchange between their respective memories. The machine executes several instruction streams in parallel on different data, which are related to each other. The ASCI Red has 9,632 PentiumPro processors, 594 GB RAM, a total of 2 terabytes of storage space, and occupies 1,600 square feet of floor space.

ASCI Blue-Pacific

The number three supercomputer is also installed at LLNL, and clocks a maximal performance of 2,144 GFlops. It’s a hypercluster of 1,464 IBM SP uniform memory access SMPs. Each of these nodes has four IBM PowerPC 604e processors running at 332 MHz each. The system has 2.6 terabytes of memory and 75 terabytes of storage space.

ASCI Blue Mountain

Positioned at number four, this system was built by SGI and is installed at the Los Alamos National Laboratory, US. It consists of 48 Silicon Graphics Origin 2000 shared memory multi-processor systems. Each system has 128 250 MHz processors, giving a total of 6,144 processors. Total memory is 1.5 terabytes, while total storage space is 76 terabytes. It has a measured maximal performance of 1,608 GFlops, and a peak performance of 3,072 GFlops.

Hitachi SR8000-F1/112

Installed at Leibniz Rechenzentrum—a department in the Bavarian Academy of Sciences—Munich, Germany, this machine is used for academic research in areas like physics and geophysics, chemistry, astronomy, meteorology, engineering, and software engineering. It has a measured performance of 1,035 GFlops and a peak performance of 1,344 GFlops. This is a RISC-based distributed memory multi-processor system, and can support both parallel and vector processing. The system has 112 processors, about 1 terabyte of main memory and 10 terabytes of storage space. The machine is supposed to be among the most powerful supercomputers in Europe.

Cray T3E 1200

Installed at the US Army HPC Research Center, Minneapolis, US, this machine does 892 GFlops, and has a peak performance of 1300.8 GFlops. It’s used for research in defense technology. It’s a distributed memory MIMD system, and has 1, 084 processors. The system has 557 GB of memory, and is the largest Cray T3E system in the world.

For a list of the current top 500 supercomputers, visit

A parallel development has been to use Beowulf clusters to build systems that are powerful, but not costly. Two such notable efforts were the Klat2project at the University of Kentucky, UK, and the Bunyip project in Australia. The Bunyip, in fact, won the Gordon Bell prize for price-performance ratio for a real supercomputing application.


An acronym for Kentucky Linux Athlon Testbed2, this was the second Beowulf cluster built at the University of Kentucky using Athlon processors. Beowulfs, in simple terms, are clusters of PCs that are configured to work together as a single supercomputer. Beowulfs are built out of commodity hardware components, run free-software OSs like Linux or FreeBSD, and are interconnected by a high-speed network.

Klat2’s configuration consisted of 64 nodes, plus two "hot spare" nodes. The latter, along with an additional switch layer, are used for fault tolerance and system-level I/O. Each node contained one 700 MHz AMD Athlon processor, and dual-fan heat sink, 128 MB PC100 SDRAM, an FIC SD11 motherboard, four RealTek-based Fast Ethernet NICs, a floppy drive (for boot floppy, as there were no hard disks on the nodes), and 300W power supply and mid-tower case with an extra fan. Besides, it had 10 Fast Ethernet 32-way switches (31 ports, plus one uplink port), more than 264 Cat5 Fast Ethernet cables, and ran Red Hat Linux 6 with updated kernel. One distinguishing feature of the cluster is that while most clusters use high-performance gigabit networks to interconnect PCs, Klat2 used 100 Mbps Ethernet hardware in a new configuration called Flat Neighborhood Network.

The nodes contained no video cards, keyboards, or mice.

With this configuration, Klat2 clocked a maximal performance of 64 GFlops on a 32-bit ScaLAPACK.


This project was sponsored by the Australian National University and other Australian organizations. The cluster had 96 dual-CPU machines divided into four groups of 24 machines (nodes) each. Each machine or node had two PIII/550 MHz processors, an EpoX KP6-BS Dual Slot 1 motherboard bas on the Intel 440BX AGPset, 384 MB PC100 SDRAM, a 13.6 GB 7,200-rpm EIDE hard drive with 2 MB cache, a 10/100 Mbps Fast Ethernet NIC with ACPI based on Intel 211432 chipset, two 10/100 Fast Ethernet PCI adapters, and a miditower case with 250W ATX power supply. The nodes contained no video cards, keyboards, floppy drive, CD-ROM drive, etc.

The hardware on the two servers for the Beowulf cluster consisted of the hardware on the nodes, plus a video card, 17" monitor, keyboard, mouse, floppy drive, CD-ROM drive, and a gigabit Ethernet card. Bunyip used Linux as its OS.

The machine clocked a maximal performance of 163 GFlops and a peak performance of 193 GFlops.

In summary, supercomputers are moving well in the two desirable directions—higher speeds and lower costs. Let’s watch how far they can push the envelope.

Pragya Madan

  • Follow PCQuest on
  • become a fan on
  • Stay updated via
  • RSS


Notify me of follow-up comments via e-mail address

Post Comment

Survey Box

Now that Microsoft has finally discontinued support for Windows XP, which OS are you likely to upgrade to?

Send this article by email