Remember Condor? No, we are not talking about the movie Operation Condor starring Jackie Chan. We are reminding you of the High Throughput Clustering or HTC software that we covered last month and promised to talk about more this month.
Last month, we talked about various universes that Condor uses and focussed on the Standard and Vanilla universes. These are the most common universes and they use commodity hardware. The other Universes are MPI (Message Passing Interface), PVM (Parallel Virtual Machine Interface) and the GLOBUS Universe. These are specific to the underlying hardware architectures and, therefore, one needs a good understanding of both. In this article, we’ll talk about how to submit different types of jobs in the Standard and Vanilla Condor Universes.
Creating Standard jobs
To test creating and submitting a standard job to Linux machines, we will first create the famous Hello World program in C and then see how to use it over a Condor pool. To do so, first login as user Condor in Linux and fire up any text editor. Then type the following code and save it as
test.c.
$include
int main(void)
{
printf (“Hello World!”);
return 0
}
Instead of writing your own code, you can also use the sample programs provided with Condor. You can find them in a folder called Example, which will be in the folder where you unzipped the Condor tar ball. Now you just have to compile and re-link the code so that it can be check-pointed and submitted remotely to other machines. To do so, run the following command:
#condor-compile gcc test.c —o test.o
This will compile the code using the gcc compiler and re-link it with the Condor libraries. You will receive some warnings while the code is being re-linked. This is common and can be overlooked. The re-linking is done by the Condor-compile command. This command can compile any C, C++, FORTRAN, ld, programs. You can also use Condor-compile with the Make command if you have a proper Makefile for compiling your codes.
The resulting file will be called test.o, which is ready to be submitted over the Condor pool. You can check whether the file is properly re-linked simply by running it on your terminal like this:
#./test.o
If you get an output like the one below, the code is properly compiled.
Condor: Notice: Will checkpoint to program_name.ckpt
Condor: Notice: Remote system calls disabled.
It’s now time to submit your job. For this, you first have to create a .sub file that will have all the information about submitting the job. You’ll need to create one and save it as test.sub. A simple sub file looks like this:
##############################################
#Submit description file for test program
##############################################
Executive | = | test.o |
Universe |
= | Standard |
Output |
= | test.out |
Log |
= | test.log |
Error |
= | test.err |
Queue |
You can also write more descriptive sub-files containing the system requirements as well. With this information, the jobs will only be transferred to systems that meet specific system configuration you’ve put in. For instance, if you add a single line to the above sub file like this:
Requirements = Memory >= 100 OpSys
==”LINUX” && Arch
== ”INTEL”
Then the process will be transferred only to those systems that have Linux installed and are running on an Intel architecture machine having 100 MB or more RAM.
Creating Vanilla jobs
Vanilla jobs are basically jobs that don’t have any check-pointing. The examples of Vanilla jobs are Windows Executables, batch files and Linux shell scripts. You can run any of these executables simply by creating a simple .sub file. Just change the Universe parameter to Vanilla instead of Standard in the above example and you are ready. Here you can’t re-link the files so you don’t need to run the condor_compile command. Actually one of the drawbacks of Condor is that, it still doesn’t support condor_compile for Windows executables and can therefore only run them in the Vanilla universe.
Submitting and analyzing Jobs
Submitting the jobs is very simple. Just run the command condor_submit
Figure 1: The output of Condor_q |
#condor_submit test.sub
To check the status of the jobs submitted, run the condor_q command:
#condor_q
and it will show an output like the one shown in figure 1.
Here you can see the number of processes submitted on the machine where you ran the command. To see all the processes in all the machines, run the command like this:
#condor_q —global
Here you will notice all the tasks that are running and their status. You can also find whether a job is running, halted or Idle. You can get this information from the ST column, where the R stands for Running, I stands for Idle and H stands for Halted. If you find any error or unexpected behavior here, you can further analyze the jobs by running the command
#condor_q —analyze.
It will show you the reasons your job is not working or Idle so that you can further investigate. To check which machine is running which process, run the command
#condor_status —run
Figure 2: The output of Condor_status |
and it will show a screen like the one shown in figure 2.
Here it will show which machine is processing (Name Column) which job submitted by which machine (ClientMachine Column) and in which architecture.
This was about submitting jobs in the Standard and Vanilla universes. Another very popular universe, called Globus lets you create your own computer grid. We’ve covered it in a separate article, which you’ll find in this month’s cover story.
Anindya Roy