Trends Watch

Now You can afford a Supercomputer in your Enterprise

PCQ Bureau

21 Jan 2006 09:54 IST

New Update

From universities to the enterprise, from high end to mid level. Machines that can deliver very high computing powers is no longer limited to a lucky few. Ordinary hardware can now be combined into clusters to give the same power. And enterprises can now utilize the same technologies to improve the productivity of their applications manifold. Starting this month, we will take you through how to set up these clusters and how to run useful applications on them

Three and a half years back, in the May 2002 issue, for the
first time, we talked about how to build your own supercomputer. Since then,
many changes have happened in the supercomputing/clustering space.
Supercomputing, like a studious child, having passed its childhood in colleges
and universities has now grown up and is ready to face the corporate world.

Till some time back, supercomputers were affordable only by
research facilities, universities or government departments like defense and
space research, due to their cost as well as the complexities involved in
setting up and running these systems. One would go as far to guess that it was
more the latter than the cost which kept many enterprises away from using
supercomputers for meeting their computing requirements.

But that is now in the past. Today, adopting easy-to-use
clustering technologies and software along with low cost, commodity hardware,
creating a supercomputer (or one that is almost there) is very easy.

The technology and the implementation have become so simple
that there are those who have networked two or three home PCs and notebooks to
create simple clusters so that they can rip their music faster. Obviously,
building a high-performance cluster is not at all rocket science! And using one
impacts the productivity immensely (not to count the bragging rights).

So, starting with this issue, we are dedicating some of our
pages to take you on a journey to understand where and how such systems can be
used in the enterprise. We will tell you how to build one and how to put it to
good use.

Let's start off on this journey by understanding the
various the types of clusters.

Clusters?

You must have noticed that we have used the terms cluster (rather
high-performance clusters) and supercomputer synonymously.
Why? Before that, to bring in a sense of perspective, let us look briefly
at the evolution of supercomputers. A supercomputer is loosely designed as one
of the fastest computers available.

Remember that today's fastest ones may be left far behind
tomorrow.

The earliest of supercomputers were built up of scalar
processors. Scalar processors are those that process only one item at a time, as
against vector processors which can process multiple items in parallel. Most
CPUs traditionally do scalar processing while GPUs are vectors. Modern CPUs
include some vector capabilities. The second generation of supercomputers were
again monolithic machines, but built using vector processors. The third
generation is where the monolithic architecture gave way to the current parallel
design, with a large super computer being built of many smaller units. In the
beginning of the cycle, even these building blocks were actually specialized,
high-performance units. It is only recently that cheap, off the shelf building
blocks started getting used to make high performance machines, or
high-performance clusters.

So, what is a cluster?

A cluster is a set of computers that are interconnected (networked) to
perform as one. High performance computing is just one of the things that a
cluster can do. You can have failover clusters, load-balancing clusters, and so
on, using the same commodity hardware.

Here, the server is fully loaded, while the cluster is only under a 10% load (simultaneously converting 75 WAV files to OGG). The performance deference is about 50% less time on the cluster.

Here the cluster is loaded up to full load and the same load was applied on the server too (zipping and taring 55 MB files in batches of 100, 150 and so on up to 300. The cluster reached full load at 250 files). The cluster gives about 6 times better performance.

In this series, the term cluster will mean a HPC (High
Performance Cluster) and not a Fail Safe or other cluster unless specified
explicitly.

Within high performance clusters, you can have two types of
setups.

SSI based clusters

No, we are not recommending clusters to small scale industries, at
least not yet. SSI (Single System Image) is a clustering technology which can
use a number of nodes (say n) on a network and make them work like a single
virtual machine with 'n' processors. The best thing about SSI is that it doesn't
require any specific modifications to your applications to run on the new
virtual machine. But because of this, it has some drawbacks too. SSI works very
well when you run many tasks simultaneously on the virtual machine, for
instance, like converting hundreds of media files from one format to another. In
such a situation, the SSI cluster will migrate all tasks evenly to all the
machines available in the cluster and complete the job significantly faster than
if it were on a single machine.

On the other hand, if you deploy a single job or thread
which requires a large amount of number crunching, then SSI cluster will not
give you the performance improvement. This is because it cannot divide a single
task into multiple threads and spread them across the nodes of the cluster.

An example of clustering middleware for SSI is OpenMosix.

The charm of SSI based clusters is that you can deploy any
standard (Linux) application on the cluster. And it doesn't require any
modification to be made in the application. So for enterprises which want to
migrate an existing application (mainly batch processing applications) on to a
cluster, but don't want to invest in re-creating their applications with the
PVM/MPI library, or are using third-party application where they do not have
access to the code, then SSI based clusters are the best solution for them.

Talking of Linux running these clusters, SSI based clusters
offer another benefit. There are some live distributions available which can
convert an existing network into an SSI cluster almost on the fly, and once the
application has run, you can reboot back into the original environment and
continue working as before. In this case you do not even have to invest for any
additional hardware to build your cluster. Your existing network is already the
cluster.

This kind of live CD based approach is best for those who
need huge computing power temporarily and particularly for testing whether your
applications will benefit from being deployed on a cluster.

There are some projects currently on, using which let you
create a cluster out of a heterogeneous environment. We will explore one such
project later on in this article.

What are the typical applications you can put an SSI
cluster to do? You could speed up your batch processing jobs. For example, backing up (basically taring and zipping a large
number of files). Or you could
create a Web content filtering cluster using software like Dansguardian or even
a mail virus scanning cluster using AMAVIS.

PCQCluster shopping cart

How much does it cost to set up a high performance
cluster? The answer depends on the number of nodes you want to deploy.
Here is what it cost us to deploy our 20 node cluster.

The costs do not include cooling and power. Also,
depending on the make of rack used, your costs could go up by another
half a lakh or so for this setup.

One monitor is always connected to the cluster
manager machine, and the other is used for troubleshooting.

We are not using a KVM switch, mostly to keep the
costs down. What we do instead is use Rdesktop and SSH on Linux (Rdesktop
for Linux to Windows and SSH for Linux to Linux). We use the Remote
desktop client on Windows for Windows to Windows and Putty for Windows
to Linux management. Doing away with the KVM switch does make us go on
occasional trips to the cluster to physically connect the monitor and
keyboard for trouble-shooting activities.

Note that we are not using this cluster in this
month's article. We will see how to deploy this cluster next month.

Item	Configuration	Number	Unit Cost	Total
Nodes	Standard PC, 2.4GHz P IV processor, 40GB IDE hard disk, 256MB RAM and CD-Drive	20	12,000	240,000
Switch	24 port, gigabit	1	25,000	25,000
Monitors	14' color	1	4,500	4,500
Keyboard	101 standard	2	200	400
Sub Total 269,900
Option I	Low Cost
	Angel rack	2	2500	5000
	Power strips - 15 amp	5	150	750
	Ethernet cabling - 50 m	1	500	500
Sub Total 6250
Option II	High Cost
	Server Racks installed	2	30000	60000

PVM clusters

PVM (Parallel Virtual Machine) is the other type of clustering technology. The
biggest difference from SSI is that, here you need to recompile or build the
application which you want to run on this cluster with PVM/MPI support. This
means that you cannot run any ordinary existing application as is on this kind
of cluster.

The commonly used clustering middleware is OSCAR.

The benefit of using such a cluster is that if you are
running a single application which needs a huge number crunching capability on a
PVM cluster, then the application will automatically take care of thread
management and job migration between the nodes.

What would you use a PVM cluster for? Scientific
applications for one are best suited for PVM clusters. If you want to build a
cluster which can do genome mapping for example, then PVM is your best answer.
Similarly, data modeling and forecasting jobs are best run on this kind of
cluster.

Specialized clusters

These are some special types of clustering software which are
designed to do some very specific tasks. For example the Deep Blue supercomputer
is designed to just play chess, or Deep Crack is designed to do only DES
cracking.

In a similar fashion, we have some software and even live
CDs which can do some specific tasks rapidly. One example is Cinelerra, which is
a video rendering application but having the capability of running on clusters
to execute the renders real quick. Similarly ChaOS can be deployed for checking
security levels (for password cracking).

Economics first

Before we set out to build the cluster, it is important to know, how much it
would cost us.

The true answer is, it depends. Depends on your budget, and
the effort you are willing to put into it. If you are low on the budget front,
but have a decent network and are willing to put in the effort, you can have a
working cluster for next to nothing. On the other hand, if the concept is proven
to give you good benefits, then budgets should be the least of your limitations.

As an indicator, let us look at how much we spent on the 20
node HPC we built for doing this story.

We built this cluster using an array of 20 entry level P4
PCs. Of course we did away with luxuries like extra graphics cards, keyboards
and monitors. The cluster cost us under Rs 3 Lakhs, without racks and cooling,
which is very much comparable to a decent dual processor server in the market
and will give many times the comparable performance.

Software
required

For
Windows machines

mso-fareast-font-family:"Times New Roman";mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA">coLinux Exe file

http://prdownloads.sf.net/colinux/coLinux-0.6.2.exe

mso-fareast-font-family:"Times New Roman";mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA">File system image

http://prdownloads.sf.net/colinux/colinux_minimal_fedora_core_1.zip

mso-fareast-font-family:"Times New Roman";mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA">Kernel

http://www.minet.uni-jena.de/~gentryx/harpy.tgz

mso-fareast-font-family:"Times New Roman";mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA">WinpCap

mso-fareast-font-family:"Times New Roman";mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA">http://www.winpcap.org/install/bin/WinPcap_3_1.exe

mso-fareast-font-family:"Times New Roman";mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA">Userland tools

mso-fareast-font-family:"Times New Roman";mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA">http://prdownloads.sourceforge.net/openmosix/openmosix-tools-0.2.4-1.i386.rpm

For
Linux machines

mso-fareast-font-family:"Times New Roman";mso-ansi-language:EN-US;mso-fareast-language: EN-US;mso-bidi-language:AR-SA">Kernel	-	mso-fareast-font-family:"Times New Roman";mso-ansi-language:EN-US;mso-fareast-language: EN-US;mso-bidi-language:AR-SA">http://prdownloads.sf.net/openmosix/openmosix-kernel-2.4.26-openmosix1.i686.rpm
mso-fareast-font-family:"Times New Roman";mso-ansi-language:EN-US;mso-fareast-language: EN-US;mso-bidi-language:AR-SA">Openmosixview	-	http://www.openmosixview.com/download/openmosixview-1.5-redhat90.i386.rpm (only for one machine)

We also did some comparisons of this cluster (using 18
nodes) with an IBM e-server with duel Xeon processors and 1GB of RAM (see
PCQuest, October 2005) and the
results that we got were astonishing. For benchmarking, we did a large number of
backup tasks at the same time and measured the time taken to complete the
process for both the server and the cluster.

The cluster makes more sense from a productivity
perspective as well as an economics perspective. The most obvious negatives for
the cluster we created are the space it consumed (twenty boxes against one), the
power consumed and the heat generated. Once our cluster started getting
regularly used, we realized that enough heat was being generated to increase the
temperature of Labs to such an extent that we had to do extra air conditioning
in summer and could do away with heating during winter!

What if you do not have a budget to start building a
cluster? Remember that we had said that you can start with your existing network
and no budget in hand. Let's see how you can do that.

Let's suppose that you have around 50 PCs which has say
Windows installed, sitting on a network and that your office hours are 9 am to 6
pm. So the systems are not in use for about 15 hrs per day. You can convert this
setup into a cluster by night. The only thing missing here is software which can
convert this network into a cluster when needed. There are quite a few tool-sets
which can do the job.

Let us say you want to build an SSI based cluster which can
run in your heterogeneous (Linux and Windows) network to speed up your backup
jobs after office hours. Then what
we are going to create is an OpenMosix based cluster running natively on the
Linux machines and over a virtualization layer on the Windows machines. You can
use this kind of a setup during normal working hours as well, because this uses
your idle CPU time available. The only problem is that the network loads will go
up and performance could degrade. So, it is advisable that you do not do this
during working hours, or alternatively that your network has enough head room to
accommodate the load.

A heterogeneous cluster

Like we just discussed, we are going to run OpenMosix natively on Linux
machines and on a virtualisation layer on Windows ones. The challenge is to
select the right virtualization tool for this purpose. In this case we need a
virtualization tool which can run on windows with minimal footprint, so that it
can use the maximum resource of the node for the cluster. We chose coLinux.

To top it all

Like with everything else that PCQuest does, this is also a hands-on
implementation story, where we will first implement what we are talking
about.

"Times New Roman";mso-ansi-language:EN-US;mso-fareast-language:EN-US;
mso-bidi-language:AR-SA">In order to do this story, we have set up a twenty node
cluster at Cybermedia Labs. Talking of bragging rights, this makes us
the only magazine in South Asia (or possibly the world to have its own,
dedicated, full time, high performance cluster.

Why coLinux?

coLinux (www.colinux.org) or Cooperative Linux is a port of
the Linux kernel that allows it to run cooperatively along with another
operating system. It actually runs as a Windows kernel driver. Unlike most
virtualization software, which does full machine virtualization and uses up a
good amount of system resources, this small piece of software allows the Linux
kernel to run natively (as a Windows kernel driver). So there is no
"bridge" between the host kernel and the guest kernel. That's why
the Linux guest OS can run at a relatively near-native speed.

Lets get on with the job

The first step is to install coLinux on all the windows
machines. For this, download the EXE file from 'http://
prdownloads.sourceforge.net/colinux/coLinux-0.6.2.exe' and install it on C:\coliux
on all your windows machines. You will need some more software to get going-a
file system image for the virtual machine. You have two options here, either you
the Fedora Core 1 file system image or the Debian image. We used the FC1 file
system and that worked pretty well for us. You can download it from “http://prdownloads.sourceforge.net/colinux/colinux_minimal_fedora_
core_1.zip”.

This zip file will yield some bz2 files.
So you must first download this file on a Linux machine,
unzip and unbzip it and then move it to the Windows 'c:\colinux'
folder. Unzipping and unbzipping this file will generate a 2GB image file. You
need this much free space in each machine on the cluster.

Now you have to get a custom kernel for your coLinux which
has an OpenMosix patch.

For the geeks-the only way is to build one yourself by
first patching the kernel-2.4.26 source with the coLinux patch and then patching
it with the OM patch and then building it.

But that will take a huge amount of time and cause a huge
amount of heartburn, when things do not work exactly as advertised. So, the
smart way is to download a pre-compiled kernel with the coLinux and openMosix
patch from 'http://www.minet.uni-jena.de/~gentryx/harpy.tgz'. On your Linux
machine, you can extract it by running the following command

#tar —zxvf harpy.tgz

After extracting, you will get the kernel file called 'vmlinux'.
Now copy and replace this file in the existing vmlinux file at 'c:\colinux'.

You have to now install WinPCap. This is required for the
network driver in coLinux to work properly. You are creating a bridge network
between coLinux and the host Windows. Download and Install WinPCap from the
following link http://www.winpcap.org/install/bin/WinPcap_3_1.exe.

Now you have to modify the default.colinux.xml file. This
file has all the settings for coLinux. You have to make changes to reflect the
following.

The path to
the FC1 filesystem image file image on your machine
The path to
the new kernel
Change the
network type from 'taped' to 'bridged'
Give your network card type

If you have followed the steps mentioned above exactly,
then file in the box “coLinux configuration file” will work perfectly for
you with just one change: the name of the network card.

For this check the model number of your LAN card installed
on the windows machine and replace the 'name=” RTL8139” value with your
model name in the last but one line. You can identify the model name in Windows
by right clicking on the LAN connection icon and checking its properties.

Now run
coLinux with the following command in Windows.

C:\colinux\>colinux-daemon.exe
—c default.colinux.xml

And coLinux will start running on the Windows box. Now you
need to install the openmosix tool (userland tool) on top of coLinux.

For this, download the OpenMosix userland tools from the
URL 'http:// prdownloads.sourceforge.net/openmosix/openmosix-tools-0.2.4-1.i386
.rpm' and install it with the following command:

#rpm-ivh
openmosix-tools-0.2.4-1.i386.rpm

Note that we are using an older version of the tool set
because the patched kernel we are downloading is not compatible with the newer
versions of the tools. If you want to use the newer version, you will have to
compile your own patched kernel.

Now modify /etc/mosix.map and enter the list of all the
nodes present in the network. This is nothing but a list of all the nodes that
are going to be in the cluster, with their IP address. You can use their names
if you have a DNS server or host file that OpenMosix can recognize. To be safe,
use IP addresses.

coLinux configuration file

      encoding="UTF-8"?>



    path="\DosDevices\c:\coLinux\fc1_2GB_root"
enabled="true" />

enabled="true" />

root=/dev/cobd0

type="bridged" name="RTL8139"/>

If your
network has 40 nodes in the subnet 192.168.1.0 and the IP addresses are in the
range of 1 to 40, then the format of mosix.map is like this:

1
192.168.0.1 40

If there is another set of 10 machines on the same subnet,
that you want to add to the cluster, but with IP addresses starting at 50 (you
do not want to add the intervening 10 machines to the cluster because they are
notebooks), then you add another line to the file saying

41
192.168.0.50 10

Remember that coLinux doesn't support OM auto-discover.
So this file is a must. Now start OpenMosix with the following command

#service openmosix start

Your Windows node is now cluster ready.

For Linux machines, you have to install kernel
2.4.26-openmosix from 'http://prdownloads.sf.net/openmosix
/openmosix-kernel-2.4.26-openmosix1 .i686.rpm' and reboot the machine with
this kernel.

Now if you have any Linux machine which is running the same
version of OM kernel (kernel 2.4.26-openmosix) and it has the openmosixview
installed on it then you will be able to see this windows node in the cluster
list and will be able to use it as a part of the cluster.

One Linux machine (running the X Window GUI system) should
have OpenMosixView installed from 'http:// www.openmosixview.com/download/
openmosixview-1.5-redhat90.i386 .rpm', so that you can monitor the nodes in
your cluster. With this your heterogeneous SSI based, cluster-by-night is up and
running. In the next issue we will see how much this cluster can speed up your
routine batch jobs. We will also see how to deploy our full time, dedicated
cluster.
¨

By Anindya Roy, Krishna Kumar with help from Vijay Chauhan

Stay connected with us through our social media channels for the latest updates and news!