We first talked about Microsoft's Compute Cluster Suite in April last year.
Lots of things have changed since then in the world of HPC, and likewise, the
Cluster Suite has also undergone many changes. At that time, we had just three
64-bit machines in our labs and we used all of them to create the MS Compute
Cluster. The interface was so difficult that we were not even able to create a
test MPI (Message passing interface) job and submit to the cluster properly. But
of course that was the first public beta of MS CCS and it was a bit too much to
expect full blown functionality from it.
Today we have the new Compute Cluster Suite SP1, and also have twenty 64-bit
machines at our disposal. That's why this time, we decided to build a much
bigger cluster of 15 nodes, with each node having a dual core CPU (plus one head
node) using the MSCCS SP1 and then test it with some standard industry
benchmarks.
We'll first talk about how to build such a cluster, and then we'll discuss
how to port High Performance Linpack(HPL) to Windows and then finally run it on
all the nodes in a distributed manner to see what kind of performance it's able
to deliver.
Linpack is a benchmark that measures floating point operations (FLOPS) and
comes in different variants. One such variant of Linpack is HPL or High
Performance Linpack. It is an industry standard benchmark for measuring
performance of supercomputers and has been used by top500.org for benchmarking
world's best 500 supercomputers.
While installing Microsoft Compute Cluster Pack, you will see this screen. Select the first option to use a node as the Head node |
The Setup
MS CCS doesn't work on 32-bit architecture, but you can install either the
Head or the Client node on a 32-bit machine. The 15 nodes that we used for
setting up MSCCS had an Intel Core 2 Duo 1.8 GHz processor and 512 MB RAM. For
the head node, we took a Dual Xeon processor machine with one GB RAM. The 15
nodes were meant to process the computing jobs, whereas the Head node managed
jobs and the whole cluster.
To interconnect the cluster we've used a Gigabit Ethernet network. All nodes
of course, had Gigabit Ethernet cards, and were PXE boot enabled. These PXE
enabled cards are used for installing an OS remotely and come in handy while
using the Windows Remote Deployment server to do a bulk installation of OSes on
multiple machines. All nodes were head less and connected to an IP KVM for
centralized management.
Wanna be part of this series ?
What we plan is to do a comparative shootout of |
Installing the Head Node
The first thing you need of course is a copy of MS Windows Server 2003
Compute Cluster Edition, and you can download a 180 days trial version of the
same from http://tinyurl.com/3ysqz5. For this download to be successful you will
require a Microsoft .Net Passport.
Install it on the machine you want to use as the Head node. The same OS can
be used for creating the Head node as well as the Compute nodes. After
installing the Head node, create an isolated domain for the cluster.
If you still have another domain controller on this network, then you can
create the head node as an additional domain controller. We created an isolated
domain controller for our setup. For this we ran the dcpromo command and
followed the dcpromo wizard. Just make sure that while creating the domain you
also install and create a local DNS server on the Head node. This will help you
when you deploy MS CCS.
Now, install the DHCP server on this machine so that the remote deployment
server can work properly. (Configuration of DHCP server is out of the scope of
this article and we are assuming that you have the basic knowhow of how to
configure basic services like DHCP on a Windows Server 2003). One word of
caution- if you are planning to provide an Internet connection to your cluster
(which is a good idea as you will get regular updates and downloads easily),
then configure it using Windows Internet Connection Sharing and not with Remote
Access Server (RAS). I am not sure about the reason but MSCCS recommends ICS
instead of RAS, and we also had troubles while we tried to run it with RAS.
As you are done with the configuration of all necessary services i.e. ADS,
DNS, DHCP and ICS, download the latest x64 version of Compute Cluster Pack SP1
from http://tinyurl.com/2rjwt4.
When installation starts the wizard pops up, which is pretty much self
explanatory. All you have to do is to select the 'Create a new compute cluster'
option. Follow this wizard to install all the required components to make the
machine a Head Node.
Go to Program Files and you will find a new menu where you will see two
applications: Cluster Job manager (used for submitting and managing cluster
jobs) and Cluster Administration (used to configure cluster and cluster nodes).
This is the 'admin' window of Compute Cluster Pack. All the installation and management tasks happen from this single interface |
Configuring the Cluster
This process involves three major tasks:
- Configuring the network topology
- Installing and adding nodes
- User management
Since, ours is a test cluster we won't give much emphasis to the users
management part, rather we will focus on configuring the network and nodes now.
Configure the Network
To configure the cluster, go to Program files>Microsoft Compute Cluster
Pack> and start the Compute Cluster Administrator. Under the 'To do List' pane,
select the 'Configure Compute Cluster Topology' option.
This will open up the wizard. From the drop down menu select 'Compute Nodes
Isolated on Private Network' and proceed to the next step.
Further the wizard will ask you to select the network cards which are
connected to the public and private network one by one. Select the right option
and then click on Finish. After this, disable the firewall, which is recommended
considering the fact that ours is a test setup.
For this, click on the 'Manage Windows Firewall Settings' option, which will
open up the standard Firewall manager window and disable the firewall. Remember,
if you are building it on a production network then chose your security policy
options accordingly.
This is the place where you configure network topology of your cluster. The next likely option will be 'Compute node loaded' |
Installing Nodes
Click on the 'Install RIS' link and install Remote Deployment Server. Then
click on the 'Manage Image' option. This will open up a wizard. In the next
step, select 'Add a new Image' option and click on Finish. This will start the
standard RIS wizard and will then ask for the folder where it will create the
RIS root directory.
Make sure that for this folder you select a partition other than the system
partition; else you won't be able to install Windows 2003. Provide a name to the
folder such as RemoteInstall and then proceed. Further the wizard will ask you
about the location of the CD, whose Image you have to create for remote
installation. Place the Windows Server 2003 Compute Cluster edition CD in the CD
drive of the Head node and specify the drive letter in this wizard. Click on
'Next' and proceed till the wizard gets completed and the image building process
starts. This process will take around 10 to 15 minutes for completion.
Once it is done, your RIS is ready and now you can turn on and boot all your
Compute nodes over the network to start an un-attended remote installation. This
process is quite simple, so we won't discuss its details.
In the 'Compute Cluster Administrator' window, you can check the status of the nodes. To check the exact resource utilization of any node, use the System Monitor option |
Adding Nodes
Till now, only the OS has been installed on the Compute nodes. To make the
whole setup work properly, you have to install a few more components. For this,
go to each node one by one, uncheck 'create the machine a Head node' option and
run the Compute Cluster Pack on them. This will install all the required
components, though in some cases it might also require to download some upgrades
etc from Internet during installation. So make sure that you have the connection
handy if required.
Once this is done, you can now add nodes to the Head node. For this first
join all nodes to the Cluster domain and reboot them. Now go to the Head node
and open Compute Cluster Administrator. From the 'To do list' select the Add
Node option, which will open up a wizard. It will ask the kind of employment
that you want,
select 'Manual Deployment' and then click on 'Next'. In the next step, type in
the FQDN of all the nodes one by one and add them by using the Add button. Then
close the wizard by clicking on Finish. The FQDN will be something like Node00x,
where x is the number of the node.
In MSCCS you can execute a task directly through the command prompt by running the mpiexec command. To submit a task you've to go through the Task Properties window |
Porting Linpack for Windows Compute Cluster
Here, we will see how one can port (re-compile) Linpack source in Windows
and then run it on Windows Compute Cluster Environment. We tried it and used it
for benchmarking our created Microsoft Compute Cluster. But we faced a problem.
Basically, Linpack is an application used majorly for testing Linux based
clusters and trust me, porting it to run on MSCCS was not at all a child's play.
In this article, we will see how with the help of some tools and libraries,
you can recompile the HPL source files in your Windows architecture and run it
on the top of MSCCS.
Prerequisites
The list of prerequisite SDKs and libraries is too long, but the first thing
that you need is MS Visual Studio 2005. Install it on any of the nodes of your
MS Compute Cluster. The compiler is to be installed on one of the nodes because
it ensures that you are compiling your application on the right hardware
architecture and as a result you'll get better performance.
After this, download both the AMD and Intel's Math Kernel libraries. Download
and install the file called 'acml3.5.0-64-win64' from http://tinyurl.com/
2k6tny. Also download and install the Intel's Math Kernel library use the
following link: http://tinyurl.com/2p9m8f.
Now install the MS Compute Cluster SDK from http://tinyurl.com/3yjyg9. Just
make sure that you download and install the 64-bit version. Now the installation
is done but for Linpack to work properly you'll have to perform some nasty
tricks. This is because the makefile that we are going to use for compiling
Linpack had a lots of path names hardcoded.
To begin with, first create a folder called “scratch” at C:\ of the node
where you have installed all the above mentioned components. Then go to the
folders where you have installed ACML and MKL.
By default they will be in the Program Files folder if you did not give any
other path. Go to the AMD folder first and rename the ACML3.5.X file as
ACML3.0.0. Similarly, go to the Intel's folder and rename 9.1.x as 8.0.1. So,
the hacking part is done and we are ready to work on the actual file.
Applications built using Visual Studio with Manifest option enabled, can't be run using MS CCS. Therefore, disable that option before you compile Linpack |
Compiling Linpack
Now download the latest version of HPL from http://tinyurl.com/2mopw8. Unzip
it in a way that the HPL folder comes under the C:\scratch folder.
In Linux, Linpack uses the make command for compilation. But the makefiles
are generally created for different Linux distros and not for the Windows. So,
now you have to grab a makefile for Windows. To make our task easier, if we also
get a .vcproj file for Linpack then we can use it to compile Linpack directly on
VS 2005. You can download all the required components from our forum. The link
for the same is http://forums.pcquest.com/forum/viewtopic.php?t=6154&highlight=.
Go to this link and download the xphl_port.zip file. Unzip it under the
C:\scratch\hpl folder and copy the HPL_timer_walltime.c to the C:\scratch\hpl\testing\timer
folder. There will be a file with the same name already sitting in that folder,
so while copying replace the old one with the new one.
Double click on the xhpl.vcproj file and open it as a VC++ project in VS
2005. You have to build the project but before that one more thing is required.
The VS 2005 while compiling an exe embeds the manifest file inside the exe,
which is not recognized by our mpiexec command that finally you have to use for
running Linpack. So, you have to tell VS2005 not to embed the manifest file
while compiling. To do so, go to the Property page of the xhpl project and click
on Manifest Tool> input output and change the value of 'Embed Manifest' from Yes
to No. Now close this window and go to the Build menu, and click on the Build
Project option to compile Linpack. The exe will be created in the C:\scratch\hpl\bin\
64\xhpl.exe.
Once you have submitted the job, you can then view the status of the job under the Job Monitor window |
Running XHPL
To run XHPL you have to use the Compute Cluster Job Manager. For this go to
Program Files> Microsoft Compute Cluster Pack. Then, go to the File> Submit job
Menu. This will open up a window. Here provide a descriptive job name and go to
the Processors tab. Then select the number of processors that you want to use
from your cluster to process your job. Remember, the number you provide should
be equal to the number of cores and not the number of physical processors.
Now go to the Tasks tab and in the Command line field, type in the command
you want to run. If it's an MPI process that you are going to run (which Linpack
is) then the command will be something like 'mpiexec xhpl.exe'. To add tasks
click on the Add button. Tasks that have been added will get listed under the
task list.
Select the task and click on the Edit button. Here, provide the working
directory, and input and output file name. The working directory is essentially
the shared location where the xhpl.exe sits and it should look something like
Error! Hyperlink reference not valid.
The output file can be any file where you want to get the output of Linpack.
By default it is hpl.out. The input file is of course the HPL.dat file. Provide
these values and submit the task to get executed.
This will start the xhpl process on all the nodes. But if it fails then you
have to modify the hpl.dat file in the bin folder. This is the file where you
set all runtime settings for xhpl and from here you can also tune XHPL for
performance. Tuning XHPL is a tedious job and it is not possible for me to cover
it in these two pages.
While writing this article, I am still trying to figure out how to get the
best performance out of our cluster by tuning XHPL. So far, I have achieved some
46 GFlops, but there is still a long way to go. So, when I am done with this
tuning, next month I will talk about how to tune XHPL in detail. Till then you
can refer to the article hosted at http://tinyurl.com/23q98y.