Advertisment

Supercomputers

author-image
PCQ Bureau
New Update

Supercomputing is an interesting field, which, unlike the turbulent PC

scenario, is associated with slow but steady improvements. However, every year a

new performance record ends up being created. This year ASCI White from IBM

replaced Intel’s ASCI Red as the fastest supercomputer. Another aspect of

supercomputing is the cost of computing. Constant endeavors have been on to

bring this down. Beowulf clusters have been used to build supercomputers cheap,

and a price-performance ratio of less than $1,000 per GFlops (giga, or one

billion, floating point operations per second) has been reached during the year.

Traditional supercomputers cost about $10,000 per GFlops, while Beowulf clusters

till now have costed around $3,000 per GFlops.

Advertisment

ASCI White

Installed

at the Lawrence Livermore National Laboratory (LLNL), US, this massive machine

is part of the Advanced Strategic Computer Initiative (ASCI) of the US

Department of Energy. It’s used to test nuclear weapons without actually

conducting explosions, and other energy-related research. The machine has 512

nodes, each containing 16 Power3-III CPUs running at 375 MHz each, totaling to

8,192 processors. It has 6.2 terabytes of memory, and 160 terabytes of storage.

ASCI White takes up floor space equivalent to two basketball courts, and weighs

106 tons. Its maximal performance, measured using the Linpack benchmark (see box

"How to benchmark a supercomputer?"), is 4,938 GFlops.

Is it built from ground up? Not exactly. The machine is a

scaled-up version of IBM’s RS/6000 SP (Scalable Parallel) Power 3 server. Some

of the fastest supercomputers in the world today use this architecture.

Advertisment

ASCI Red

Built last year, this system is installed at Sandia National

Labs, Albuquerque, US, and is employed in research by the US Department of

Energy. It clocks a maximal performance of 2,379 GFlops, and is the

second-fastest supercomputer. This is a distributed memory MIMD (multiple

instruction, multiple data), message-passing machine. That is, each CPU in the

machine has its own memory, and is connected to other CPUs to enable data

exchange between their respective memories. The machine executes several

instruction streams in parallel on different data, which are related to each

other. The ASCI Red has 9,632 PentiumPro processors, 594 GB RAM, a total of 2

terabytes of storage space, and occupies 1,600 square feet of floor space.

ASCI Blue-Pacific

Advertisment

The number three supercomputer is also installed at LLNL, and

clocks a maximal performance of 2,144 GFlops. It’s a hypercluster of 1,464 IBM

SP uniform memory access SMPs. Each of these nodes has four IBM PowerPC 604e

processors running at 332 MHz each. The system has 2.6 terabytes of memory and

75 terabytes of storage space.

ASCI Blue Mountain

Positioned at number four, this system was built by SGI and

is installed at the Los Alamos National Laboratory, US. It consists of 48

Silicon Graphics Origin 2000 shared memory multi-processor systems. Each system

has 128 250 MHz processors, giving a total of 6,144 processors. Total memory is

1.5 terabytes, while total storage space is 76 terabytes. It has a measured

maximal performance of 1,608 GFlops, and a peak performance of 3,072 GFlops.

Advertisment

Hitachi SR8000-F1/112

Installed at Leibniz Rechenzentrum–a department in the

Bavarian Academy of Sciences–Munich, Germany, this machine is used for

academic research in areas like physics and geophysics, chemistry, astronomy,

meteorology, engineering, and software engineering. It has a measured

performance of 1,035 GFlops and a peak performance of 1,344 GFlops. This is a

RISC-based distributed memory multi-processor system, and can support both

parallel and vector processing. The system has 112 processors, about 1 terabyte

of main memory and 10 terabytes of storage space. The machine is supposed to be

among the most powerful supercomputers in Europe.

Cray T3E 1200

Advertisment

Installed at the US Army HPC Research Center, Minneapolis,

US, this machine does 892 GFlops, and has a peak performance of 1300.8 GFlops.

It’s used for research in defense technology. It’s a distributed memory MIMD

system, and has 1, 084 processors. The system has 557 GB of memory, and is the

largest Cray T3E system in the world.

For a list of the current top 500 supercomputers, visit www.top500.org

A parallel development has been to use Beowulf clusters to

build systems that are powerful, but not costly. Two such notable efforts were

the Klat2project at the University of Kentucky, UK, and the Bunyip project in

Australia. The Bunyip, in fact, won the Gordon Bell prize for price-performance

ratio for a real supercomputing application.

Advertisment

Klat2

An acronym for Kentucky Linux Athlon Testbed2, this was the

second Beowulf cluster built at the University of Kentucky using Athlon

processors. Beowulfs, in simple terms, are clusters of PCs that are configured

to work together as a single supercomputer. Beowulfs are built out of commodity

hardware components, run free-software OSs like Linux or FreeBSD, and are

interconnected by a high-speed network.

Klat2’s configuration consisted of 64 nodes, plus two

"hot spare" nodes. The latter, along with an additional switch layer,

are used for fault tolerance and system-level I/O. Each node contained one 700

MHz AMD Athlon processor, and dual-fan heat sink, 128 MB PC100 SDRAM, an FIC

SD11 motherboard, four RealTek-based Fast Ethernet NICs, a floppy drive (for

boot floppy, as there were no hard disks on the nodes), and 300W power supply

and mid-tower case with an extra fan. Besides, it had 10 Fast Ethernet 32-way

switches (31 ports, plus one uplink port), more than 264 Cat5 Fast Ethernet

cables, and ran Red Hat Linux 6 with updated kernel. One distinguishing feature

of the cluster is that while most clusters use high-performance gigabit networks

to interconnect PCs, Klat2 used 100 Mbps Ethernet hardware in a new

configuration called Flat Neighborhood Network.

Advertisment

The nodes contained no video cards, keyboards, or mice.

With this configuration, Klat2 clocked a maximal performance

of 64 GFlops on a 32-bit ScaLAPACK.

Bunyip

This project was sponsored by the Australian National

University and other Australian organizations. The cluster had 96 dual-CPU

machines divided into four groups of 24 machines (nodes) each. Each machine or

node had two PIII/550 MHz processors, an EpoX KP6-BS Dual Slot 1 motherboard bas

on the Intel 440BX AGPset, 384 MB PC100 SDRAM, a 13.6 GB 7,200-rpm EIDE hard

drive with 2 MB cache, a 10/100 Mbps Fast Ethernet NIC with ACPI based on Intel

211432 chipset, two 10/100 Fast Ethernet PCI adapters, and a miditower case with

250W ATX power supply. The nodes contained no video cards, keyboards, floppy

drive, CD-ROM drive, etc.

The hardware on the two servers for the Beowulf cluster

consisted of the hardware on the nodes, plus a video card, 17" monitor,

keyboard, mouse, floppy drive, CD-ROM drive, and a gigabit Ethernet card. Bunyip

used Linux as its OS.

The machine clocked a maximal performance of 163 GFlops and a

peak performance of 193 GFlops.

In summary, supercomputers are moving well in the two

desirable directions–higher speeds and lower costs. Let’s watch how far they

can push the envelope.

Pragya Madan

Advertisment