“The on-board (space) shuttle software runs on two
pairs of primary computers, with one pair being in control as long as the simultaneous
computations on both agree with each other, with control passing to the other pair in the
case of a mismatch. All four primary computers run identical programs. To prevent
catastrophic failures in which both pairs fail to perform (for example, if the software
were wrong), the shuttle has a fifth computer that is programmed with different code by
different programmers from a different company, but using the same specifications and the
same compiler (HAL/S). Cutover to the backup computer would have to be done manually by
the astronauts.”
PG Neuman in “Computer Related
Risks”, Published by Addison-Wesley, 1995
This is an example of fault tolerance at its
best. But then, businesses are not as mission-critical or life-critical as the space
shuttle. So in most cases, they don’t go to such extremes. Fault tolerance can be
implemented with software, hardware, or as in the case of the space shuttle, with both.
Here, we’ll talk about fault-tolerant hardware.
Traditionally, three hardware companies are
associated with fault-tolerant computing–Tandem, Stratus, and Sequent. Tandem was
acquired by Compaq in 1997. Its machines are typically used in financial institutions like
stock exchanges–the Tandem Himalaya being the most commonly-used in Indian stock
exchanges. These are many units clustered together, with each unit implementing fault
tolerance, right up to the CPU level, with multiple CPUs.
Incidentally, how much would it cost to own a
fault-tolerant system? A basic Tandem Himalaya system (we couldn’t get details about
Sequent or Stratus) will set you back by only a crore-and-a-half, including storage, OS,
databases, etc, but excluding applications. These machines would run NSK (Non- stop
kernel), NT or even SCO Unix. As for the processor, you have a choice of Intel or Alpha.
They used to run MIPS, but are now moving to Alpha. The Intel line is a recent addition.
And maintenance and upgrades could cost you a bomb every year. In fact, I’m told that
these companies make more money out of old installations than out of new sites each year.
And finally, when they install one at your place, they don’t go by a box-count as is
done with normal servers. They count the number of processors installed. That is, if you
were to ask them about the number of servers they’ve installed in India, and if they
were to answer (which is unlikely), then the answer won’t be in number of servers or
number of sites, but number of processors.
Want to buy one?