Advertisment

Now You can afford a Supercomputer in your Enterprise

author-image
PCQ Bureau
New Update


Advertisment

From universities to the enterprise, from high end to mid level. Machines that can deliver very high computing powers is no longer limited to a lucky few. Ordinary hardware can now be combined into clusters to give the same power. And enterprises can now utilize the same technologies to improve the productivity of their applications manifold. Starting this month, we will take you through how to set up these clusters and how to run useful applications on them

Three and a half years back, in the May 2002 issue, for the

first time, we talked about how to build your own supercomputer. Since then,

many changes have happened in the supercomputing/clustering space.

Supercomputing, like a studious child, having passed its childhood in colleges

and universities has now grown up and is ready to face the corporate world.

Advertisment

Till some time back, supercomputers were affordable only by

research facilities, universities or government departments like defense and

space research, due to their cost as well as the complexities involved in

setting up and running these systems. One would go as far to guess that it was

more the latter than the cost which kept many enterprises away from using

supercomputers for meeting their computing requirements.

But that is now in the past. Today, adopting easy-to-use

clustering technologies and software along with low cost, commodity hardware,

creating a supercomputer (or one that is almost there) is very easy.

The technology and the implementation have become so simple

that there are those who have networked two or three home PCs and notebooks to

create simple clusters so that they can rip their music faster. Obviously,

building a high-performance cluster is not at all rocket science! And using one

impacts the productivity immensely (not to count the bragging rights).

Advertisment

So, starting with this issue, we are dedicating some of our

pages to take you on a journey to understand where and how such systems can be

used in the enterprise. We will tell you how to build one and how to put it to

good use.

Let's start off on this journey by understanding the

various the types of clusters.

Clusters?



You must have noticed that we have used the terms cluster (rather

high-performance clusters) and supercomputer synonymously.

Why? Before that, to bring in a sense of perspective, let us look briefly

at the evolution of supercomputers. A supercomputer is loosely designed as one

of the fastest computers available.

Advertisment

Remember that today's fastest ones may be left far behind

tomorrow.

The earliest of supercomputers were built up of scalar

processors. Scalar processors are those that process only one item at a time, as

against vector processors which can process multiple items in parallel. Most

CPUs traditionally do scalar processing while GPUs are vectors. Modern CPUs

include some vector capabilities. The second generation of supercomputers were

again monolithic machines, but built using vector processors. The third

generation is where the monolithic architecture gave way to the current parallel

design, with a large super computer being built of many smaller units. In the

beginning of the cycle, even these building blocks were actually specialized,

high-performance units. It is only recently that cheap, off the shelf building

blocks started getting used to make high performance machines, or

high-performance clusters.

So, what is a cluster?



A cluster is a set of computers that are interconnected (networked) to

perform as one. High performance computing is just one of the things that a

cluster can do. You can have failover clusters, load-balancing clusters, and so

on, using the same commodity hardware.

Advertisment
Here, the server is fully loaded, while the cluster is only under a 10% load (simultaneously converting 75 WAV files to OGG). The performance deference is about 50% less time on the cluster.


Here the cluster is loaded up to full load and the same load was applied on the server too (zipping and taring 55 MB files in batches of 100, 150 and so on up to 300. The cluster reached full load at 250 files). The cluster gives about 6 times better performance.

In this series, the term cluster will mean a HPC (High

Performance Cluster) and not a Fail Safe or other cluster unless specified
explicitly.

Within high performance clusters, you can have two types of

setups.

Advertisment

SSI based clusters



No, we are not recommending clusters to small scale industries, at

least not yet. SSI (Single System Image) is a clustering technology which can

use a number of nodes (say n) on a network and make them work like a single

virtual machine with 'n' processors. The best thing about SSI is that it doesn't

require any specific modifications to your applications to run on the new

virtual machine. But because of this, it has some drawbacks too. SSI works very

well when you run many tasks simultaneously on the virtual machine, for

instance, like converting hundreds of media files from one format to another. In

such a situation, the SSI cluster will migrate all tasks evenly to all the

machines available in the cluster and complete the job significantly faster than

if it were on a single machine.

On the other hand, if you deploy a single job or thread

which requires a large amount of number crunching, then SSI cluster will not

give you the performance improvement. This is because it cannot divide a single

task into multiple threads and spread them across the nodes of the cluster.

An example of clustering middleware for SSI is OpenMosix.

Advertisment

The charm of SSI based clusters is that you can deploy any

standard (Linux) application on the cluster. And it doesn't require any

modification to be made in the application. So for enterprises which want to

migrate an existing application (mainly batch processing applications) on to a

cluster, but don't want to invest in re-creating their applications with the

PVM/MPI library, or are using third-party application where they do not have

access to the code, then SSI based clusters are the best solution for them.

Talking of Linux running these clusters, SSI based clusters

offer another benefit. There are some live distributions available which can

convert an existing network into an SSI cluster almost on the fly, and once the

application has run, you can reboot back into the original environment and

continue working as before. In this case you do not even have to invest for any

additional hardware to build your cluster. Your existing network is already the

cluster.

This kind of live CD based approach is best for those who

need huge computing power temporarily and particularly for testing whether your

applications will benefit from being deployed on a cluster.

There are some projects currently on, using which let you

create a cluster out of a heterogeneous environment. We will explore one such

project later on in this article.

What are the typical applications you can put an SSI

cluster to do? You could speed up your batch processing jobs.  For example, backing up (basically taring and zipping a large

number of files).  Or you could

create a Web content filtering cluster using software like Dansguardian or even

a mail virus scanning cluster using AMAVIS.

PCQCluster shopping cart

How much does it cost to set up a high performance

cluster? The answer depends on the number of nodes you want to deploy.

Here is what it cost us to deploy our 20 node cluster.

The costs do not include cooling and power. Also,

depending on the make of rack used, your costs could go up by another

half a lakh or so for this setup.

One monitor is always connected to the cluster

manager machine, and the other is used for troubleshooting.

We are not using a KVM switch, mostly to keep the

costs down. What we do instead is use Rdesktop and SSH on Linux (Rdesktop

for Linux to Windows and SSH for Linux to Linux). We use the Remote

desktop client on Windows for Windows to Windows and Putty for Windows

to Linux management. Doing away with the KVM switch does make us go on

occasional trips to the cluster to physically connect the monitor and

keyboard for trouble-shooting activities.

Note that we are not using this cluster in this

month's article. We will see how to deploy this cluster next month.

Item Configuration Number Unit

Cost
Total
Nodes Standard PC, 2.4GHz P IV processor, 40GB IDE

hard disk, 256MB RAM and CD-Drive
20 12,000 240,000
Switch 24

port, gigabit
1 25,000 25,000
Monitors 14'

color
1 4,500 4,500
Keyboard 101

standard
2 200 400



Sub Total

269,900            

Option

I
Low

Cost
 
  Angel

rack
2 2500 5000
  Power

strips - 15 amp
5 150 750
  Ethernet

cabling - 50 m
1 500 500

Sub

Total

6250              

Option

II
High

Cost
 
  Server

Racks installed
2 30000 60000

PVM clusters



PVM (Parallel Virtual Machine) is the other type of clustering technology. The
biggest difference from SSI is that, here you need to recompile or build the

application which you want to run on this cluster with PVM/MPI support. This

means that you cannot run any ordinary existing application as is on this kind

of cluster.

The commonly used clustering middleware is OSCAR.

The benefit of using such a cluster is that if you are

running a single application which needs a huge number crunching capability on a

PVM cluster, then the application will automatically take care of thread

management and job migration between the nodes.

What would you use a PVM cluster for? Scientific

applications for one are best suited for PVM clusters. If you want to build a

cluster which can do genome mapping for example, then PVM is your best answer.

Similarly, data modeling and forecasting jobs are best run on this kind of

cluster.

Specialized clusters



These are some special types of clustering software which are

designed to do some very specific tasks. For example the Deep Blue supercomputer

is designed to just play chess, or Deep Crack is designed to do only DES

cracking.

In a similar fashion, we have some software and even live

CDs which can do some specific tasks rapidly. One example is Cinelerra, which is

a video rendering application but having the capability of running on clusters

to execute the renders real quick. Similarly ChaOS can be deployed for checking

security levels (for password cracking).

Economics first



Before we set out to build the cluster, it is important to know, how much it

would cost us.

The true answer is, it depends. Depends on your budget, and

the effort you are willing to put into it. If you are low on the budget front,

but have a decent network and are willing to put in the effort, you can have a

working cluster for next to nothing. On the other hand, if the concept is proven

to give you good benefits, then budgets should be the least of your limitations.

As an indicator, let us look at how much we spent on the 20

node HPC we built for doing this story.

We built this cluster using an array of 20 entry level P4

PCs. Of course we did away with luxuries like extra graphics cards, keyboards

and monitors. The cluster cost us under Rs 3 Lakhs, without racks and cooling,

which is very much comparable to a decent dual processor server in the market

and will give many times the comparable performance.

Software

required

For

Windows machines

mso-fareast-font-family:"Times New Roman";mso-ansi-language:EN-US;mso-fareast-language:

EN-US;mso-bidi-language:AR-SA">coLinux Exe file
- http://prdownloads.sf.net/colinux/coLinux-0.6.2.exe
mso-fareast-font-family:"Times New Roman";mso-ansi-language:EN-US;mso-fareast-language:

EN-US;mso-bidi-language:AR-SA">File system image
- http://prdownloads.sf.net/colinux/colinux_minimal_fedora_core_1.zip

mso-fareast-font-family:"Times New Roman";mso-ansi-language:EN-US;mso-fareast-language:

EN-US;mso-bidi-language:AR-SA">Kernel
- http://www.minet.uni-jena.de/~gentryx/harpy.tgz

mso-fareast-font-family:"Times New Roman";mso-ansi-language:EN-US;mso-fareast-language:

EN-US;mso-bidi-language:AR-SA">WinpCap
- mso-fareast-font-family:"Times New Roman";mso-ansi-language:EN-US;mso-fareast-language:

EN-US;mso-bidi-language:AR-SA">http://www.winpcap.org/install/bin/WinPcap_3_1.exe
mso-fareast-font-family:"Times New Roman";mso-ansi-language:EN-US;mso-fareast-language:

EN-US;mso-bidi-language:AR-SA">Userland tools
- mso-fareast-font-family:"Times New Roman";mso-ansi-language:EN-US;mso-fareast-language:

EN-US;mso-bidi-language:AR-SA">http://prdownloads.sourceforge.net/openmosix/openmosix-tools-0.2.4-1.i386.rpm

For

Linux machines

mso-fareast-font-family:"Times New Roman";mso-ansi-language:EN-US;mso-fareast-language:

EN-US;mso-bidi-language:AR-SA">Kernel
- mso-fareast-font-family:"Times New Roman";mso-ansi-language:EN-US;mso-fareast-language:

EN-US;mso-bidi-language:AR-SA">http://prdownloads.sf.net/openmosix/openmosix-kernel-2.4.26-openmosix1.i686.rpm
mso-fareast-font-family:"Times New Roman";mso-ansi-language:EN-US;mso-fareast-language:

EN-US;mso-bidi-language:AR-SA">Openmosixview
- http://www.openmosixview.com/download/openmosixview-1.5-redhat90.i386.rpm     



(only for one machine)



                     

We also did some comparisons of this cluster (using 18

nodes) with an IBM e-server with duel Xeon processors and 1GB of RAM (see

PCQuest, October 2005)  and the

results that we got were astonishing. For benchmarking, we did a large number of

backup tasks at the same time and measured the time taken to complete the

process for both the server and the cluster.

The cluster makes more sense from a productivity

perspective as well as an economics perspective. The most obvious negatives for

the cluster we created are the space it consumed (twenty boxes against one), the

power consumed and the heat generated. Once our cluster started getting

regularly used, we realized that enough heat was being generated to increase the

temperature of Labs to such an extent that we had to do extra air conditioning

in summer and could do away with heating during winter!

What if you do not have a budget to start building a

cluster? Remember that we had said that you can start with your existing network

and no budget in hand. Let's see how you can do that.

Let's suppose that you have around 50 PCs which has say

Windows installed, sitting on a network and that your office hours are 9 am to 6

pm. So the systems are not in use for about 15 hrs per day. You can convert this

setup into a cluster by night. The only thing missing here is software which can

convert this network into a cluster when needed. There are quite a few tool-sets

which can do the job.

Let us say you want to build an SSI based cluster which can

run in your heterogeneous (Linux and Windows) network to speed up your backup

jobs after office hours.  Then what

we are going to create is an OpenMosix based cluster running natively on the

Linux machines and over a virtualization layer on the Windows machines. You can

use this kind of a setup during normal working hours as well, because this uses

your idle CPU time available. The only problem is that the network loads will go

up and performance could degrade. So, it is advisable that you do not do this

during working hours, or alternatively that your network has enough head room to

accommodate the load.

A heterogeneous cluster



Like we just discussed, we are going to run OpenMosix natively on Linux

machines and on a virtualisation layer on Windows ones. The challenge is to

select the right virtualization tool for this purpose. In this case we need a

virtualization tool which can run on windows with minimal footprint, so that it

can use the maximum resource of the node for the cluster. We chose coLinux.

 To top it all



Like with everything else that PCQuest does, this is also a hands-on
implementation story, where we will first implement what we are talking

about.

"Times New Roman";mso-ansi-language:EN-US;mso-fareast-language:EN-US;

mso-bidi-language:AR-SA">In order to do this story, we have set up a twenty node

cluster at Cybermedia Labs. Talking of bragging rights, this makes us

the only magazine in South Asia (or possibly the world to have its own,

dedicated, full time, high performance cluster.

Why coLinux?



coLinux (www.colinux.org) or Cooperative Linux is a port of
the Linux kernel that allows it to run cooperatively along with another

operating system. It actually runs as a Windows kernel driver. Unlike most

virtualization software, which does full machine virtualization and uses up a

good amount of system resources, this small piece of software allows the Linux

kernel to run natively (as a Windows kernel driver). So there is no

"bridge" between the host kernel and the guest kernel. That's why

the Linux guest OS can run at a relatively near-native speed.

Lets get on with the job



The first step is to install coLinux on all the windows

machines. For this, download the EXE file from 'http://

prdownloads.sourceforge.net/colinux/coLinux-0.6.2.exe' and install it on C:\coliux

on all your windows machines. You will need some more software to get going-a

file system image for the virtual machine. You have two options here, either you

the Fedora Core 1 file system image or the Debian image. We used the FC1 file

system and that worked pretty well for us. You can download it from “http://prdownloads.sourceforge.net/colinux/colinux_minimal_fedora_

core_1.zip”.

This zip file will yield some bz2 files.

So you must first download this file on a Linux machine,

unzip and unbzip it and then move it to the Windows 'c:\colinux'

folder. Unzipping and unbzipping this file will generate a 2GB image file. You

need this much free space in each machine on the cluster. 

Now you have to get a custom kernel for your coLinux which

has an OpenMosix patch.

For the geeks-the only way is to build one yourself by

first patching the kernel-2.4.26 source with the coLinux patch and then patching

it with the OM patch and then building it.

But that will take a huge amount of time and cause a huge

amount of heartburn, when things do not work exactly as advertised. So, the

smart way is to download a pre-compiled kernel with the coLinux and openMosix

patch from 'http://www.minet.uni-jena.de/~gentryx/harpy.tgz'. On your Linux

machine, you can extract it by running the following command

#tar —zxvf harpy.tgz

After extracting, you will get the kernel file called 'vmlinux'.

Now copy and replace this file in the existing vmlinux file at 'c:\colinux'.

You have to now install WinPCap. This is required for the

network driver in coLinux to work properly. You are creating a bridge network

between coLinux and the host Windows. Download and Install WinPCap from the

following link http://www.winpcap.org/install/bin/WinPcap_3_1.exe.

Now you have to modify the default.colinux.xml file. This

file has all the settings for coLinux. You have to make changes to reflect the

following.

  1.   The path to

    the FC1 filesystem image file image on your machine

  2.   The path to

    the new kernel

  3.   Change the

    network type from 'taped' to 'bridged'

  4.   Give your network card type

If you have followed the steps mentioned above exactly,

then file in the box “coLinux configuration file” will work perfectly for

you with just one change: the name of the network card.

For this check the model number of your LAN card installed

on the windows machine and replace the 'name=” RTL8139” value with your

model name in the last but one line. You can identify the model name in Windows

by right clicking on the LAN connection icon and checking its properties.

 Now run

coLinux with the following command in Windows.

C:\colinux\>colinux-daemon.exe

—c default.colinux.xml

And coLinux will start running on the Windows box. Now you

need to install the openmosix tool (userland tool) on top of coLinux.

For this, download the OpenMosix userland tools from the

URL 'http:// prdownloads.sourceforge.net/openmosix/openmosix-tools-0.2.4-1.i386

.rpm' and install it with the following command:

#rpm-ivh

openmosix-tools-0.2.4-1.i386.rpm

Note that we are using an older version of the tool set

because the patched kernel we are downloading is not compatible with the newer

versions of the tools. If you want to use the newer version, you will have to

compile your own patched kernel.

Now modify /etc/mosix.map and enter the list of all the

nodes present in the network. This is nothing but a list of all the nodes that

are going to be in the cluster, with their IP address. You can use their names

if you have a DNS server or host file that OpenMosix can recognize. To be safe,

use IP addresses.

coLinux configuration file







      encoding="UTF-8"?>





   




   
path="\DosDevices\c:\coLinux\fc1_2GB_root"

enabled="true" />






   







    enabled="true" />







   
root=/dev/cobd0







   







   







   
type="bridged" name="RTL8139"/>

 






 If your

network has 40 nodes in the subnet 192.168.1.0 and the IP addresses are in the

range of 1 to 40, then the format of mosix.map is like this:

1

192.168.0.1    40

If there is another set of 10 machines on the same subnet,

that you want to add to the cluster, but with IP addresses starting at 50 (you

do not want to add the intervening 10 machines to the cluster because they are

notebooks), then you add another line to the file saying

41

192.168.0.50  10

Remember that coLinux doesn't support OM auto-discover.

So this file is a must. Now start OpenMosix with the following command

#service openmosix start

Your Windows node is now cluster ready.

For Linux machines, you have to install kernel

2.4.26-openmosix from 'http://prdownloads.sf.net/openmosix

/openmosix-kernel-2.4.26-openmosix1 .i686.rpm' and reboot the machine with

this kernel.

Now if you have any Linux machine which is running the same

version of OM kernel (kernel 2.4.26-openmosix) and it has the openmosixview

installed on it then you will be able to see this windows node in the cluster

list and will be able to use it as a part of the cluster.

One Linux machine (running the X Window GUI system) should

have OpenMosixView installed from 'http:// www.openmosixview.com/download/

openmosixview-1.5-redhat90.i386 .rpm', so that you can monitor the nodes in

your cluster. With this your heterogeneous SSI based, cluster-by-night is up and

running. In the next issue we will see how much this cluster can speed up your

routine batch jobs. We will also see how to deploy our full time, dedicated

cluster.

¨

By Anindya Roy, Krishna Kumar with help from Vijay Chauhan

Advertisment