Implementation Guides

Cluster and Reduce Network Downtime

PCQ Bureau

01 Aug 2000 03:00 IST

New Update

Clustering means linking several servers together, through

special hardware and software, so that they appear as one to clients accessing

them. They share the entire load amongst themselves, so if one goes down, the

remaining take up its share of the load. This can be useful for applications

requiring 100 percent uptime, Websites for example.

Advertisment

Clustering ensures that your services never go offline. Even

if a service running on a server fails, it’s resumed on another server.

Administering a cluster rather than multiple servers becomes easier, as you

would manage one entity.

Moreover, as the load increases on a cluster, you can scale

it up by adding more processors or computers.

Clustering configurations

Advertisment

Cluster hardware configurations vary depending on the

technology and the operating system used. It comes in three flavors:

Shared Disk: This approach utilizes central I/O devices,

accessible to all computers within the cluster. They rely on a common bus for

disk access. Because all nodes are writing data simultaneously to the disks,

data integrity is difficult to maintain. Thus, clustering software is required

to maintain the coherence of data.

Shared disk clusters provide high system availability, as

even if one node goes down, the others don’t get affected. On the downside,

these kinds of clusters suffer from inherent bottlenecks involved in shared

hardware that can affect performance. Shared Disk clusters are typically used by

Oracle and AIX.

Advertisment

Shared Nothing: In these clusters, there is no central data

storage. All nodes work independently with their own disks, but have the

capability to take over the functioning of other disks, in case the node

handling those disks ceases to function. They typically use a shared SCSI

connection between the nodes. This type of clustering is not to be confused with

the "shared disk" approach, since here there are no concurrent

accesses being made to these disks by multiple nodes. Shared Nothing cluster

solutions include MSCS (Microsoft Cluster Server) for Win NT/2k.

Mirrored Disk: Mirroring involves replicating all the data

from a primary storage to a secondary storage device for availability purposes.

Replication occurs while the primary system is online. If a failure occurs, the

fail-over process (explained later) transfers control to the secondary system.

However, some applications can lose some data during the fail-over process. One

of the advantages of using mirroring is that your network doesn’t crash due to

disk failure, nor is there any data loss. However, it may not be economical due

to the redundant disks.

Terminologies and concepts

Advertisment

Members of a cluster are referred to as nodes. The Cluster

Service is a collection of software on each node that manages all

cluster-specific activity. A Resource is an item managed by the Cluster Service.

Resources may include physical hardware devices such as disk drives and network

cards, or logical items such as logical disk volumes, TCP/IP addresses, entire

applications, and databases. A resource is said to be online when it’s

providing its service on a node. A group is a collection of resources to be

managed as a single unit. Operations performed on a group affect all resources

contained in it.

A Group can be owned by only one node at a time. You can’t

have resources within a group owned by multiple nodes simultaneously. If a

particular node fails, then its group can be failed over or moved to another

node as an atomic unit. Each group has a cluster-wide policy about which node it’ll

run on, and the system it’ll move to in case of failure.

In case a node fails, a fail-over process automatically

starts, which is responsible for distributing the workload to other nodes in the

cluster. This implementation differs for different operating systems. When a

node recovers from failure, a new fail-back process ensures that the node gets

back its load.

Advertisment

Clustering in Windows 2000

In the Windows 2000 family, clustering is supported by the

Advanced Server and Data Center versions. There are two flavors of clustering

called MSCS (Microsoft Cluster Service) and NLB (Network Load Balancing). The

first is meant to provide fail-over support for applications, such as databases,

messaging systems, and file/print services while the second distributes load

amongst nodes. MSCS can handle two-node clustering in Advanced Server and four

nodes in Data Center. NLB can go up to 32 nodes in each.

MSCS uses software "heartbeats" to detect failed

applications or servers. In case of failure, it uses the "shared

nothing" clustering architecture that automatically transfers ownership of

resources from a failed node to a surviving node. If an individual application

fails (and not a node), MSCS will typically try to restart it on the same node.

If that also fails, then it moves the application’s resources to the other

node.

Advertisment

NLB as the name suggests, balances the load of incoming

traffic across clusters of up to 32 nodes. One advantage of this setup is that

you can add servers as per your requirement.

Both these clustering technologies can be used in conjunction

for higher availability. Take an example of a large Internet site. You could

have a Web server farm with Network Load Balancing as the front end, while the

back-end, say the database application, is handled by the cluster service.

Clustering under NetWare

Novell introduced NCS (Netware Cluster Services) for NetWare

5, last fall. You can create up to 32-node clusters with the service using the

shared disk architecture. It requires NetWare support pack 4 or higher to run.

You can’t mix NetWare 5.x versions in a cluster. All nodes must be configured

with TCP/IP, and be on the same subnet. Each server needs at least 64 MB RAM and

should be part of the same NDS tree. In addition, each server must have at least

one local disk device (not shared) to be used as volume SYS, and the NDS tree

must be replicated on at least two servers in the cluster. The latest release of

the Cluster Service includes fail-over for DHCP servers, which was not present

earlier.

Anuj Jain

Advertisment