Sometimes, we go that much
further and mix and match existing technologies. The result is almost always a
path breaking idea, also known as 'innovation'.
This story is exactly one such innovation. We promised you last month that we
would be discussing different types of HPC Cluster/Super Computer technologies
and how to implement them every month. This time, we take that forward and do
something new with the cluster we built last month. We create a NAS-box using a
set of standard PCs from our Cluster setup. We also reserve another set of PCs
for running the stress tests on the first set. This is to evaluate if it's
possible to make such a NAS box and also check the performance of our cluster.
The results we got were impressive.
The idea we had was to aggregate the free disk space of 9
standard P4 machines running as a cluster. The result of this aggregation was
the computing power of all the 9 P4 processors, but storage of only 6 of them
because three of the PCs were diskless nodes running a Linux distro.
The setup
The software and technologies that we have used for this setup were
OpenMosix clustering software for the HPC layer. This time however, this OM (OpenMosix)
cluster was modified from last time. We configured an MFS (Mosix File system) on
top of the cluster. See below for more on what MFS is. And then we used Open
Andrew File System (OpenAFS) file system to aggregate the disk space of the 6
machines. OpenAFS provides a Distributed File system layer on top of the
cluster.
You have to install PCQLinux 2004 or 2005 and configure OM
on it. For more details on configuring and installing OpenMosix, read our last
month's story or visit http://openmosix.sf.net.
MFS
References and download sites | ||
http://openmosix.sf.net The OpenMosix project homepage, for the latest information and downloads for OM http://www.openafs.org/doc/index.htm http://tinyurl.com/7p5w6 |
https://www.pcquest.com/content/enterprise https://www.pcquest.com/content/linux/ |
If you think that MFS has something to do with NFS, then
you are wrong. MFS is basically a 'cluster-wide' file system. For the last
few months, we have been talking about OpenMosix a lot. If you have used the
Cluster Knoppix live distro, then you must have noticed that when you run a data
crunching process, sometimes it is processed on a completely different system on
the cluster.
Have you ever wondered how OM is able to read data from one
machine and do the crunching on
some other machine? This is actually achieved because of MFS. When you install
OpenMosix, a kernel patch is also installed on the machine. This kernel patch
also provides an MFS extension to the Linux file system.
When you run OM, a new file system is created, which is
called '/mfs'. This folder has the individual file systems of all the
cluster nodes mounted as '/mfs/NODE_NUMBER'. So let's say you go and copy
a file into the /tmp directory for node 5. You will now be able to access that
file from any of the nodes from the /mfs/5/tmp directory.
To have MFS work properly, you have to have DFSA also
enabled on your cluster's nodes.
DFSA
This stands for Direct File System Access, and it has to be enabled on all
the cluster machines to make the whole thing work. When we installed OpenMosix
on top of PCQLinux 2004, DFSA was enabled by default. You can check whether it
is enabled, by running the following command:
# cat /proc/hpc/admin/version
DFSA allows the cluster node to run direct I/O operations
on any share mounted using MFS. Without DFSA, MFS will be nothing more than any
other network file system.
Installing OpenAFS
To install OpenAFS, you can either recompile your kernel with OpenAFS
support, or download and install the OpenAFS rpm from ftp://rpmfind.net/linux/
dag/redhat/9/e/i386/dag/RPMS/openafs-1.2.10-0.dag.rh90.i386.rpm. Run the
following command to install OpenAFS:
# rpm —ivh
openafs-1.2.10-0.dag.rh90.i386.rpm
You need to keep in mind that OpenAFS must be installed
after booting the system with OpenMosix. Otherwise, this functionality will be
added into the default PCQLinux 2004 kernel and when you boot into the OM kernel
you can't use OpenAFS.
With 10 nodes and 6 disks in our cluster, we managed to reach a throughput of 452.5 Mbps using the NetBench I/O benchmark |
Configuring the partitions
The OpenAFS file server must have at least one partition or logical volume
dedicated to storing AFS volumes. Each server partition is mounted at a
directory named /vicepxx, where 'xx' are two lowercase letters. These /vicepxx
directories must reside in the file server's root directory.
Now, create a directory called /vicepxx for each AFS server
partition you are configuring (there must be at least one). Repeat the command
for each partition. To do so, run the following command:
# mkdir /vicepxx
Add a line with the following format to the /etc/fstab
file, for each directory you just created. The entire statement below must
appear on a single line.
# /dev/disk /vicepxx
ext2 defaults 0 2
Next you need to create a file system on each partition to
be mounted at the /vicepxx directory. For this, run the following command
# mkfs -v /dev/disk
Mount all partitions by running the following command. This
will of course take the parameters from the fstab file as modified above.
# mount -a
After you are done with this, you have to start the AFS
server process. You can do that by running the bosserver command. BOS stands for
Basic OverSeer (BOS) Server. When running this command, include the
'-noauth' flag to disable authorization checking. This command will look
something like this
# /usr/afs/bin/bosserver -noauth
&
This step is needed because you have not yet configured
your cell's AFS authentication and authorization mechanisms. The BOS Server
cannot perform authorization checks as it does during normal operation. In this
no-authorization mode, it does not verify the identity or privilege of the
issuer of a bos command, and so performs any operation for anyone. As it
initializes for the first time, the BOS Server creates the following directories
and files:
/usr/afs/db
/usr/afs/etc/CellServDB
/usr/afs/etc/ThisCell
/usr/afs/local
/usr/afs/logs
It then sets the owner to the local superuser (root) and
the mode bits to limit the ability to write (and in some cases, read) them. This
environment is OK for testing, but in a production scenario, we do not recommend
you continue using it this way. You should setup proper PAM authentication for
your server.
We did this just to test the performance of OpenAFS running
over a cluster. If you really want to use OpenAFS in a production environment,
please ensure security of the AFS file system and go through the documentation
at http://www.openafs.org/
doc/index.htm carefully and follow the PAM authentication path properly
according to your network needs. Because these deal with data-sharing over the
network, which can be risky.
When we loaded our storage cluster with a task that took 80% CPU, and ran NetBench, the cluster could still manage a maximum throughput of 427 Mbps |
Tests and Results
After configuring both OM and OpenAFS, the NAS cluster is ready for use. To
test this device we created a 11 node (10 node + 1 controller) machine
for loading. We ran the same suite of tests as we do for benchmarking
standard NAS boxes and servers in our regular reviews. This involved running
NetBench with 1, 5, 10, 20, 30 and 40 engines.
We noted the throughput given by this cluster. As we had 10
nodes in our load and 6 disks in our cluster, we had an average of 2 clients
(load nodes) on each hard-drive of the cluster. The results we got were pretty
good. We got a maximum throughput of 452.5 Mbps. For our test, we had created a
heterogeneous network for the cluster with a mix of Gigabit and 100 Mbps LAN
cards. The loads had a dedicated network of 1 Gbps. So if you changed all these
cards to 1 Gbps, the performance could increase even more. Our cluster was able
to produce this throughput with 30 clients running simultaneously.
When we upped the load to 40 clients, the throughput dropped slightly to
436 Mbps. If we compare these
results against our recent server shoot out results, then this cluster will
stand second among those servers.
The key point here is not just the performance and
throughput of the cluster. The total CPU and RAM usage of the cluster during the
test was merely 10%. This means that despite the heavy data transfer, the
cluster can still work as a HPC and crunching huge amount of data.
To test the performance further, we ran some data crunching
tasks simultaneously along with NetBench. We got it to convert 100 WAV files to
OGG. This used 80% of the cluster's performance for around 16 mins, and we
were simultaneously running NetBench. The NetBench result was almost indentical
at-427 Mbps with 30 clients. This is pretty good.
Next month
We have already started working on the next part of this series (using a
reader suggestion), and that is on building your own customized OpenFiller
(custom NAS OS) using OpenMosix. Till then, do keep those suggestions coming, so
that we can make our stories more meaningful.
Anindya Roy with help from Vijay Chauhan