Supercomputing is an interesting field, which, unlike the turbulent PC
scenario, is associated with slow but steady improvements. However, every year a
new performance record ends up being created. This year ASCI White from IBM
replaced Intel’s ASCI Red as the fastest supercomputer. Another aspect of
supercomputing is the cost of computing. Constant endeavors have been on to
bring this down. Beowulf clusters have been used to build supercomputers cheap,
and a price-performance ratio of less than $1,000 per GFlops (giga, or one
billion, floating point operations per second) has been reached during the year.
Traditional supercomputers cost about $10,000 per GFlops, while Beowulf clusters
till now have costed around $3,000 per GFlops.
ASCI White
Installed
at the Lawrence Livermore National Laboratory (LLNL), US, this massive machine
is part of the Advanced Strategic Computer Initiative (ASCI) of the US
Department of Energy. It’s used to test nuclear weapons without actually
conducting explosions, and other energy-related research. The machine has 512
nodes, each containing 16 Power3-III CPUs running at 375 MHz each, totaling to
8,192 processors. It has 6.2 terabytes of memory, and 160 terabytes of storage.
ASCI White takes up floor space equivalent to two basketball courts, and weighs
106 tons. Its maximal performance, measured using the Linpack benchmark (see box
"How to benchmark a supercomputer?"), is 4,938 GFlops.
Is it built from ground up? Not exactly. The machine is a
scaled-up version of IBM’s RS/6000 SP (Scalable Parallel) Power 3 server. Some
of the fastest supercomputers in the world today use this architecture.
ASCI Red
Built last year, this system is installed at Sandia National
Labs, Albuquerque, US, and is employed in research by the US Department of
Energy. It clocks a maximal performance of 2,379 GFlops, and is the
second-fastest supercomputer. This is a distributed memory MIMD (multiple
instruction, multiple data), message-passing machine. That is, each CPU in the
machine has its own memory, and is connected to other CPUs to enable data
exchange between their respective memories. The machine executes several
instruction streams in parallel on different data, which are related to each
other. The ASCI Red has 9,632 PentiumPro processors, 594 GB RAM, a total of 2
terabytes of storage space, and occupies 1,600 square feet of floor space.
ASCI Blue-Pacific
The number three supercomputer is also installed at LLNL, and
clocks a maximal performance of 2,144 GFlops. It’s a hypercluster of 1,464 IBM
SP uniform memory access SMPs. Each of these nodes has four IBM PowerPC 604e
processors running at 332 MHz each. The system has 2.6 terabytes of memory and
75 terabytes of storage space.
ASCI Blue Mountain
Positioned at number four, this system was built by SGI and
is installed at the Los Alamos National Laboratory, US. It consists of 48
Silicon Graphics Origin 2000 shared memory multi-processor systems. Each system
has 128 250 MHz processors, giving a total of 6,144 processors. Total memory is
1.5 terabytes, while total storage space is 76 terabytes. It has a measured
maximal performance of 1,608 GFlops, and a peak performance of 3,072 GFlops.
Hitachi SR8000-F1/112
Installed at Leibniz Rechenzentrum–a department in the
Bavarian Academy of Sciences–Munich, Germany, this machine is used for
academic research in areas like physics and geophysics, chemistry, astronomy,
meteorology, engineering, and software engineering. It has a measured
performance of 1,035 GFlops and a peak performance of 1,344 GFlops. This is a
RISC-based distributed memory multi-processor system, and can support both
parallel and vector processing. The system has 112 processors, about 1 terabyte
of main memory and 10 terabytes of storage space. The machine is supposed to be
among the most powerful supercomputers in Europe.
Cray T3E 1200
Installed at the US Army HPC Research Center, Minneapolis,
US, this machine does 892 GFlops, and has a peak performance of 1300.8 GFlops.
It’s used for research in defense technology. It’s a distributed memory MIMD
system, and has 1, 084 processors. The system has 557 GB of memory, and is the
largest Cray T3E system in the world.
For a list of the current top 500 supercomputers, visit www.top500.org
A parallel development has been to use Beowulf clusters to
build systems that are powerful, but not costly. Two such notable efforts were
the Klat2project at the University of Kentucky, UK, and the Bunyip project in
Australia. The Bunyip, in fact, won the Gordon Bell prize for price-performance
ratio for a real supercomputing application.
Klat2
An acronym for Kentucky Linux Athlon Testbed2, this was the
second Beowulf cluster built at the University of Kentucky using Athlon
processors. Beowulfs, in simple terms, are clusters of PCs that are configured
to work together as a single supercomputer. Beowulfs are built out of commodity
hardware components, run free-software OSs like Linux or FreeBSD, and are
interconnected by a high-speed network.
Klat2’s configuration consisted of 64 nodes, plus two
"hot spare" nodes. The latter, along with an additional switch layer,
are used for fault tolerance and system-level I/O. Each node contained one 700
MHz AMD Athlon processor, and dual-fan heat sink, 128 MB PC100 SDRAM, an FIC
SD11 motherboard, four RealTek-based Fast Ethernet NICs, a floppy drive (for
boot floppy, as there were no hard disks on the nodes), and 300W power supply
and mid-tower case with an extra fan. Besides, it had 10 Fast Ethernet 32-way
switches (31 ports, plus one uplink port), more than 264 Cat5 Fast Ethernet
cables, and ran Red Hat Linux 6 with updated kernel. One distinguishing feature
of the cluster is that while most clusters use high-performance gigabit networks
to interconnect PCs, Klat2 used 100 Mbps Ethernet hardware in a new
configuration called Flat Neighborhood Network.
The nodes contained no video cards, keyboards, or mice.
With this configuration, Klat2 clocked a maximal performance
of 64 GFlops on a 32-bit ScaLAPACK.
Bunyip
This project was sponsored by the Australian National
University and other Australian organizations. The cluster had 96 dual-CPU
machines divided into four groups of 24 machines (nodes) each. Each machine or
node had two PIII/550 MHz processors, an EpoX KP6-BS Dual Slot 1 motherboard bas
on the Intel 440BX AGPset, 384 MB PC100 SDRAM, a 13.6 GB 7,200-rpm EIDE hard
drive with 2 MB cache, a 10/100 Mbps Fast Ethernet NIC with ACPI based on Intel
211432 chipset, two 10/100 Fast Ethernet PCI adapters, and a miditower case with
250W ATX power supply. The nodes contained no video cards, keyboards, floppy
drive, CD-ROM drive, etc.
The hardware on the two servers for the Beowulf cluster
consisted of the hardware on the nodes, plus a video card, 17" monitor,
keyboard, mouse, floppy drive, CD-ROM drive, and a gigabit Ethernet card. Bunyip
used Linux as its OS.
The machine clocked a maximal performance of 163 GFlops and a
peak performance of 193 GFlops.
In summary, supercomputers are moving well in the two
desirable directions–higher speeds and lower costs. Let’s watch how far they
can push the envelope.
Pragya Madan