Advertisment

Understanding Clustering

author-image
PCQ Bureau
New Update

You cannot keep on building more and more processors into the same box. The design gets immensely complicated. Clustering comes to the rescue. 

Advertisment

Many processors, many machines



How about knitting together several uniprocessor or multi processor machines such that processors in each can be used to execute one of the sub tasks. The physical networking medium can be anything from a 10/100 Mbit Ethernet or Gigabit Ethernet or even a high-speed fiber-optic channel. Harnessing more processors would be as simple as adding a new machine to the network. 

This type of setup is called computational cluster. Its a computational cluster because the primary objective of the cluster will be processing or computing.

A cluster provides a totally different environment for the execution of an application, hence different concepts and different libraries or APIs also come into play. 

Advertisment

In case of SMP, we had a single machine in front of us, which is the only platform on which we will be running our resource hungry apps. But in the case of clusters, we have many machines. So where do we deploy our applications? 

One of the machines in the cluster is designated as a server and the other as clients or nodes. The server is the machine on which we will deploy (and develop) all our applications. As discussed in the previous section, the application will consists of defined and ideally independent subtasks. These subtasks will be dispatched by the server to the individual nodes for processing. Finally, the processing results will be aggregated at the server and the result will be displayed (or stored) on the server. As easy as it’s written?

In case of SMP, the application/subtasks is run the same machine under the supervision of a single copy of OS running on that machine. Hence, all types of resource and process management are done by the OS on that machine, not to forget the scheduling of the application/subtasks in case of other running applications/ subtasks. But in clusters we don’t have a single OS governing the entire system. Each node will be running its own OS. Hence before dispatching the subtasks to the nodes, we must take care of the availability of resources (memory) and scheduling of the subtasks amongst already running subtasks on the nodes. 

Advertisment

One of the open source and free solutions for these are PBS (Portable Batch System) and PBS-Maui scheduler. The PBS consists of a server component running on the cluster’s server machine. The client component is a resource monitor, which runs on the nodes. When an application is to be executed, the PBS server calls the PBS-Maui scheduler for scheduling the subtasks. The scheduler can schedule a subtask for dispatch on the nodes only if the nodes have the resource to run the subtasks on them. Hence the PBS Maui scheduler contacts the PBS resource monitor called pbs_ mom running on each node. If pbs_mom running on a node say n1 informs the scheduler that there is available resource to run the subtask, then the scheduler contacts back the PBS server with a positive response. The PBS server then dispatches the job directly to the pbs_mom running on n1. The scheduler may also impose resource restrictions — like which, what and how resources can be used by the subtasks on the nodes.

In SMP, we talked about inter-thread communication. Similarly, the subtasks running on different nodes may need to communicate. This kind of intercommunication takes place through passing of messages. A standard specification for message passing is MPI (Message Passing Interface). LAM and MPICH are implementation of the MPI specification. That is, these two provide the API comprising library files (adhering to MPI specification) to develop programs where subtasks can communicate through messages. Apart from MPI, PVM (Parallel Virtual Machine) is another specification as well as implementation for message passing. 

One point that needs to be mentioned is that all nodes in the cluster can have similar (homogenous) or different

(heterogeneous) software environment. By software environment we primarily refer to the OS running on nodes. Usually the above-mentioned APIs for cluster computing are available with source code. Hence they can be compiled and installed on any OS. Thus, in a heterogeneous cluster a PBS server running on Linux server can communicate with a pbs_mom running on a Windows node. Obviously, having single operating environment across the server and the nodes eases up the set up process. 

Shekhar Govindarajan

Advertisment