Implementation Guides

Clustering with Condor

PCQ Bureau

17 Sep 2003 05:48 IST

New Update

This month we are going to demonstrate how to set up a basic HTC or High Throughput Cluster. We will continue this in the next issue and will discuss how to compile your own jobs and submit them in the HTC Pool. HTC differs from HPC (High Performance Clustering) in that it deals with how many floating-point operations per month or per year you can extract from your computing environment. HPC deals with the number of such operations the environment can provide per second. To set up a HTC, we used a software called Condor. The best thing about Condor is that you can use it very easily in a heterogeneous network having Linux, Solaris, UNIX and even Windows (NT, 2000 and XP). To test it out we used a Network of four PCs (running PCQLinux 7.1, PCQLinux 8.0, and two machines running Win XP Home Edition). We created the PCQLinux 7.1 machine as the Central Manager for Condor, because Condor officially doesn't support RedHat 8.0. The reason for this is that some commands like condor_compile (creates a relinked executable for submission to the Standard Universe) don't run in version 8.0 or above. You can still have these RedHat 8 machines on your network, and they'll be able to submit and process Condor jobs.

Condor Jobs

Condor has some pre-defined environments, called the universe, in which the jobs get submitted. All these environments have special attributes and dependencies. So, you can't run a Standard job on a Windows machine or else you have to go for a Vanilla Job. There are 7 universes, namely Standard, Vanilla, PVM, MPI, Globus, Java and Scheduler Universe. This month we'll discuss Standard and Vanilla.

HTC
OR HPC?

Exercepts from HPCwire’s interview of Miron Livny (Head of the Condor Project). Here Livny explains the basic difference between an HPC and an HTC.

HPCwire: What criteria determine whether HPC or HTC is more appropriate?

LIVNY: HPC must be used for decision-support (person-in-the-loop) or applications under sharp time-constraint, such as weather modeling. However, those doing sensitivity analyses, parametric studies or simulations to establish statistical confidence need HTC. We use HTC for neural-network training, Monte Carlo statistics, and a very wide variety of simulations, including computer hardware, scheduling policies, and communication protocols, annealing, even combustion-engine simulations, where 100 or even 1000 jobs are submitted to explore the entire parameter space.

Exercepted from www.cs.wisc.edu/condor/HPCwire.1

Standard Universe: In the standard universe, we get checkpointing and remote system calls. These make it easier to be handled over the pool by Condor. To make a job for Standard universe you have to first recompile your source code with condor_compile command. To use condor_compile, just run your normal compiler with condor_compile added to its beginning. Like

#condor_compile cc test.cc -o test.out

Vanilla Universe: The jobs of this universe don't give an option for checkpointing. As a result, if something goes wrong in the middle of the process, then the only options are either to suspend the job or restart the job on another machine. Shell scripts and windows batch files are some examples of this type of jobs.

You can get more details about these Universes from (http://www.cs.wisc.edu/condor/manual/v6.4/Index.html)

Install Condor's Central Manager in Linux

We're now going to set up a Condor Pool. For this, download the Windows and Linux versions of a stable Condor release from www.cs.wisc.edu/condor/downloads/. Now you will have to create a Central Manager for Condor. This can be any machine (Linux or Windows) and should be installed first. This will have a full installation of the software. We prefer to have a PCQLinux 7.1 or any RedHat 7.x machine for it. So to install it on PCQLinux 7.1, first untar and unzip the downloaded tar ball like this

#tar -zxvf condor-6.4.7-linux-x86-glibc22-dynamic.tar.gz

This will create a folder called condor-6.4. Now create a user called "Condor" as without this your installation won't continue.

Now go to the folder and run the shell script called condor-install. This installation needs a fully qualified domain name, before starting the installation first be sure you have it set properly. When you run the shell script, it will ask you several questions about your network and system. Now just hit Enter for the first two questions which will set the default values for them. The third question will ask you whether your system has a File server installed. This is because Condor supports NFS and can use it to have a centralized network having all the release files installed at one time. So if you have your Condor user's home directory shared and mounted in all the machines with the same name, then you can go for a Network Installation else type "no" and hit Enter. Now just hit Enter unless you reach Step 8. Here it will ask you for the name of the Central Manager machine. As this is the first machine, you should select the default value, which should be the name of the local machine. Else, for other machines you have to provide the full name of the Central Manager here. Now again hit Enter unless the installation is over.

Install Condor Agent in Windows

To install Condor on a Windows machine, just run the .exe file called condor-6.4.7-winnt40-x86.exe. Here the installation is graphical and quite easy to follow. The steps are similar to the previous one. Just answer the questions and it will get installed.

Configure Condor

To configure Condor, open the file called condor_config in /home/condor directory in Linux and in c:\condor in Windows and read the first two parts carefully as they contain important information related to your network. If you don't have a fully qualified domain name for each machine, then you have to give the IP addresses for HOSTAL LOW_READ and HOSTALLOW_ WRITE in Part-2 of the file. To make things easy, you can use wild cards here.

Run Condor

You can run Condor by running the condor_master in Linux like this:

#/usr/local/condor/sbin/condor_master

To start Condor in Windows, either go to the control panel > services and start the condor service or run the following command in the command prompt.

:\>net start condor

After starting condor in all the machines, you can view the status by issuing the condor_status command in Linux like this:

#condor_status

And in Windows like this

:\>\Condor\bin\condor_status

In the next issue, we will cover running and analyzing the jobs in Condor.

Anindya Roy

Stay connected with us through our social media channels for the latest updates and news!