Trends Watch

Creating a Storage Cluster

PCQ Bureau

22 Feb 2006 04:26 IST

New Update

Sometimes, we go that much

further and mix and match existing technologies. The result is almost always a

path breaking idea, also known as 'innovation'.

This story is exactly one such innovation. We promised you last month that we

would be discussing different types of HPC Cluster/Super Computer technologies

and how to implement them every month. This time, we take that forward and do

something new with the cluster we built last month. We create a NAS-box using a

set of standard PCs from our Cluster setup. We also reserve another set of PCs

for running the stress tests on the first set. This is to evaluate if it's

possible to make such a NAS box and also check the performance of our cluster.

The results we got were impressive.

Advertisment

The idea we had was to aggregate the free disk space of 9

standard P4 machines running as a cluster. The result of this aggregation was

the computing power of all the 9 P4 processors, but storage of only 6 of them

because three of the PCs were diskless nodes running a Linux distro.

The setup

The software and technologies that we have used for this setup were

OpenMosix clustering software for the HPC layer. This time however, this OM (OpenMosix)

cluster was modified from last time. We configured an MFS (Mosix File system) on

top of the cluster. See below for more on what MFS is. And then we used Open

Andrew File System (OpenAFS) file system to aggregate the disk space of the 6

machines. OpenAFS provides a Distributed File system layer on top of the

cluster.

Advertisment

You have to install PCQLinux 2004 or 2005 and configure OM

on it. For more details on configuring and installing OpenMosix, read our last

month's story or visit http://openmosix.sf.net.

MFS

References and download sites

http://openmosix.sf.net

The OpenMosix project homepage, for the latest information and downloads
for OM

http://www.openafs.org/doc/index.htm

The OpenAFS project documentation, also has the relevant download links

http://tinyurl.com/7p5w6

The OpenAFS package to download for PCQLinux 2004/2005

https://www.pcquest.com/content/enterprise

/2005/105100101.asp

Our October 2005 story on how to setup an OpenMosix cluster in just 15
minutes. We also describe how we tested this setup

https://www.pcquest.com/content/linux/

103050601.asp

Our May 2003 story on Load Balancing Clusters using OM-includes
information on how to monitor the cluster

Advertisment

If you think that MFS has something to do with NFS, then

you are wrong. MFS is basically a 'cluster-wide' file system. For the last

few months, we have been talking about OpenMosix a lot. If you have used the

Cluster Knoppix live distro, then you must have noticed that when you run a data

crunching process, sometimes it is processed on a completely different system on

the cluster.

Have you ever wondered how OM is able to read data from one

machine and do the crunching on

some other machine? This is actually achieved because of MFS. When you install

OpenMosix, a kernel patch is also installed on the machine. This kernel patch

also provides an MFS extension to the Linux file system.

When you run OM, a new file system is created, which is

called '/mfs'. This folder has the individual file systems of all the

cluster nodes mounted as '/mfs/NODE_NUMBER'. So let's say you go and copy

a file into the /tmp directory for node 5. You will now be able to access that

file from any of the nodes from the /mfs/5/tmp directory.

Advertisment

To have MFS work properly, you have to have DFSA also

enabled on your cluster's nodes.

DFSA

This stands for Direct File System Access, and it has to be enabled on all

the cluster machines to make the whole thing work. When we installed OpenMosix

on top of PCQLinux 2004, DFSA was enabled by default. You can check whether it

is enabled, by running the following command:

# cat /proc/hpc/admin/version

Advertisment

DFSA allows the cluster node to run direct I/O operations

on any share mounted using MFS. Without DFSA, MFS will be nothing more than any

other network file system.

Installing OpenAFS

To install OpenAFS, you can either recompile your kernel with OpenAFS

support, or download and install the OpenAFS rpm from ftp://rpmfind.net/linux/

dag/redhat/9/e/i386/dag/RPMS/openafs-1.2.10-0.dag.rh90.i386.rpm. Run the

following command to install OpenAFS:

# rpm —ivh

openafs-1.2.10-0.dag.rh90.i386.rpm

Advertisment

You need to keep in mind that OpenAFS must be installed

after booting the system with OpenMosix. Otherwise, this functionality will be

added into the default PCQLinux 2004 kernel and when you boot into the OM kernel

you can't use OpenAFS.


With 10 nodes and 6 disks in our cluster, we managed to reach a throughput of 452.5 Mbps using the NetBench I/O benchmark

Configuring the partitions

The OpenAFS file server must have at least one partition or logical volume

dedicated to storing AFS volumes. Each server partition is mounted at a

directory named /vicepxx, where 'xx' are two lowercase letters. These /vicepxx

directories must reside in the file server's root directory.

Advertisment

Now, create a directory called /vicepxx for each AFS server

partition you are configuring (there must be at least one). Repeat the command

for each partition. To do so, run the following command:

# mkdir /vicepxx

Add a line with the following format to the /etc/fstab

file, for each directory you just created. The entire statement below must

appear on a single line.

# /dev/disk /vicepxx

ext2 defaults 0 2

Next you need to create a file system on each partition to

be mounted at the /vicepxx directory. For this, run the following command

# mkfs -v /dev/disk

Mount all partitions by running the following command. This

will of course take the parameters from the fstab file as modified above.

# mount -a

After you are done with this, you have to start the AFS

server process. You can do that by running the bosserver command. BOS stands for

Basic OverSeer (BOS) Server. When running this command, include the

'-noauth' flag to disable authorization checking. This command will look

something like this

# /usr/afs/bin/bosserver -noauth

&

This step is needed because you have not yet configured

your cell's AFS authentication and authorization mechanisms. The BOS Server

cannot perform authorization checks as it does during normal operation. In this

no-authorization mode, it does not verify the identity or privilege of the

issuer of a bos command, and so performs any operation for anyone. As it

initializes for the first time, the BOS Server creates the following directories

and files:

/usr/afs/db

/usr/afs/etc/CellServDB

/usr/afs/etc/ThisCell

/usr/afs/local

/usr/afs/logs

It then sets the owner to the local superuser (root) and

the mode bits to limit the ability to write (and in some cases, read) them. This

environment is OK for testing, but in a production scenario, we do not recommend

you continue using it this way. You should setup proper PAM authentication for

your server.

We did this just to test the performance of OpenAFS running

over a cluster. If you really want to use OpenAFS in a production environment,

please ensure security of the AFS file system and go through the documentation

at http://www.openafs.org/

doc/index.htm carefully and follow the PAM authentication path properly

according to your network needs. Because these deal with data-sharing over the

network, which can be risky.


When we loaded our storage cluster with a task that took 80% CPU, and ran NetBench, the cluster could still manage a maximum throughput of 427 Mbps

Tests and Results

After configuring both OM and OpenAFS, the NAS cluster is ready for use. To

test this device we created a 11 node (10 node + 1 controller) machine

for loading. We ran the same suite of tests as we do for benchmarking

standard NAS boxes and servers in our regular reviews. This involved running

NetBench with 1, 5, 10, 20, 30 and 40 engines.

We noted the throughput given by this cluster. As we had 10

nodes in our load and 6 disks in our cluster, we had an average of 2 clients

(load nodes) on each hard-drive of the cluster. The results we got were pretty

good. We got a maximum throughput of 452.5 Mbps. For our test, we had created a

heterogeneous network for the cluster with a mix of Gigabit and 100 Mbps LAN

cards. The loads had a dedicated network of 1 Gbps. So if you changed all these

cards to 1 Gbps, the performance could increase even more. Our cluster was able

to produce this throughput with 30 clients running simultaneously.

When we upped the load to 40 clients, the throughput dropped slightly to

436 Mbps. If we compare these

results against our recent server shoot out results, then this cluster will

stand second among those servers.

The key point here is not just the performance and

throughput of the cluster. The total CPU and RAM usage of the cluster during the

test was merely 10%. This means that despite the heavy data transfer, the

cluster can still work as a HPC and crunching huge amount of data.

To test the performance further, we ran some data crunching

tasks simultaneously along with NetBench. We got it to convert 100 WAV files to

OGG. This used 80% of the cluster's performance for around 16 mins, and we

were simultaneously running NetBench. The NetBench result was almost indentical

at-427 Mbps with 30 clients. This is pretty good.

Next month

We have already started working on the next part of this series (using a

reader suggestion), and that is on building your own customized OpenFiller

(custom NAS OS) using OpenMosix. Till then, do keep those suggestions coming, so

that we can make our stories more meaningful.

Anindya Roy with help from Vijay Chauhan

Advertisment