Advertisment

Fault Tolerant Computing

author-image
PCQ Bureau
New Update

“The on-board (space) shuttle software runs on two

pairs of primary computers, with one pair being in control as long as the simultaneous

computations on both agree with each other, with control passing to the other pair in the

case of a mismatch. All four primary computers run identical programs. To prevent

catastrophic failures in which both pairs fail to perform (for example, if the software

were wrong), the shuttle has a fifth computer that is programmed with different code by

different programmers from a different company, but using the same specifications and the

same compiler (HAL/S). Cutover to the backup computer would have to be done manually by

the astronauts.”

Advertisment

PG Neuman in “Computer Related

Risks”, Published by Addison-Wesley, 1995

This is an example of fault tolerance at its

best. But then, businesses are not as mission-critical or life-critical as the space

shuttle. So in most cases, they don’t go to such extremes. Fault tolerance can be

implemented with software, hardware, or as in the case of the space shuttle, with both.

Here, we’ll talk about fault-tolerant hardware.

Traditionally, three hardware companies are

associated with fault-tolerant computing–Tandem, Stratus, and Sequent. Tandem was

acquired by Compaq in 1997. Its machines are typically used in financial institutions like

stock exchanges–the Tandem Himalaya being the most commonly-used in Indian stock

exchanges. These are many units clustered together, with each unit implementing fault

tolerance, right up to the CPU level, with multiple CPUs.

Incidentally, how much would it cost to own a

fault-tolerant system? A basic Tandem Himalaya system (we couldn’t get details about

Sequent or Stratus) will set you back by only a crore-and-a-half, including storage, OS,

databases, etc, but excluding applications. These machines would run NSK (Non- stop

kernel), NT or even SCO Unix. As for the processor, you have a choice of Intel or Alpha.

They used to run MIPS, but are now moving to Alpha. The Intel line is a recent addition.

And maintenance and upgrades could cost you a bomb every year. In fact, I’m told that

these companies make more money out of old installations than out of new sites each year.

And finally, when they install one at your place, they don’t go by a box-count as is

done with normal servers. They count the number of processors installed. That is, if you

were to ask them about the number of servers they’ve installed in India, and if they

were to answer (which is unlikely), then the answer won’t be in number of servers or

number of sites, but number of processors.

Want to buy one?

Advertisment