Engineering Network Services - CSU

Engineering Network Services
 

How to Submit a Job to the Cluster


The Compute grid is not open-for-all grid. To be able to access the grid, you need to send an email to Shaila Parashar . Once the email is received, you will be added to the access list and will be able to submit the required jobs.

The cluster is designed to run both serial and parallel jobs. The following sections will give details on how to write a job and submit it. Examples will also be provided to help understand the process.

Submitting Serial Jobs

Steps involved in submitting a serial job:

  • Write a batch script/program containing the commands that need to be executed
  • Log in to shippo.engr.colostate.edu using your engineering account
  • Run the command source /space/sge/default/common/settings.csh
  • Submit the program using the qsub command .
  • Check the status of the program using qstat
  • Check the output files . The names of the files are composed of the job script name, an appended dot sign followed by an "o" for the sdtout file and an "e" for the stderr file and finally the unique jobid.

Example 1: Simple serial job

Now let us look at an example. Following is a very simple program to print the date and time and the hostname of the host where the job is running. The script is called simple.sh

You can create this script in your U: drive on any computer. Type:

% more simple.sh

#!/bin/sh
#This is a simple example of a Sun Grid Engine batch script
#
# Print date and time
date
# Sleep for 20 seconds
sleep 20
# Print date and time again
date
# Print the hostname
hostname
# End of script file


Log into shippo.engr.colostate.edu . Then run the following command (This needs to be run only once during each session):

shippo% source /space/sge/default/common/settings.csh

Then submit the program as follows:

shippo% qsub cwd simple.sh

The cwd option is used so that the output files are saved in the directory in which the program resides. If the cwd command is not used , then the output is stored in the U: drive.

Once the job is submitted, you will get the following response :

shippo% Your job job# (simple.sh) has been submitted

Where job# is the id number assigned to the job by the grid software

To check the status of your job, type:

shippo% qstat -f

A sample output of qstat command is as follows:

low.q@eng-blade6.engr.colostat BIP 0/2/8 1.72 lx24-amd64
---------------------------------------------------------------------------------
low.q@fox.engr.colostate.edu BIP 0/2/16 0.84 lx24-amd64
330 2.59783 simple.sh idname r 10/19/2010 12:00:05 1
---------------------------------------------------------------------------------
low.q@neyo.engr.colostate.edu BIP 0/2/16 1.53 lx24-amd64

The stdout and stderr files in this case are simple.sh.ojob# and the simple.sh.ejob#. The contents of these files are shown below:

shippo % more simple.sh.o330

Tue Oct 19 11:57:15 MDT 2010
Tue Oct 19 11:57:35 MDT 2010
fox

Since there are no error messages, simple.sh.e330 is empty

Example 2: Serial Matlab job

We first need to write a matlab batch job (i.e a matlab .m file) . An sample matlab batch file is given below:

shippo%more sample.m

a = [1 2 3; 4 5 6];
magic(a);
a
quit

You need to write a script the execute the above problem. The script called mymatlabjob.sh is given below.

shippo%more mymatlabjob.sh

#!/bin/csh
# Defining various SGE parameters
#$ -cwd
#$ -N testmatlab
#$ -e myjob2.err
/usr/local/bin/matlab -nodisplay -nosplash < sample.m > & out.txt

In the script, matlab is invoked with the -nodisplay and -nosplash options so that matlab runs in the command line mode. The input to the matlab command is sample.m and the output is stored in out.txt. You will need to use qsub to submit the job to the grid. But before submitting, you will need to set the environment variables

shippo% source /space/sge/default/common/settings.csh

shippo% qsub mymatlabjob.sh

The part of the listing from qstat is given below:

shippo%qstat -f

low.q@eng-blade5.engr.colostat BIP 0/2/8 1.71 lx24-amd64
---------------------------------------------------------------------------------
low.q@eng-blade6.engr.colostat BIP 0/2/8 1.61 lx24-amd64
---------------------------------------------------------------------------------
low.q@fox.engr.colostate.edu BIP 0/2/16 0.80 lx24-amd64
334 2.59783 testmatlab idname r 10/19/2010 15:08:20 1
---------------------------------------------------------------------------------
low.q@neyo.engr.colostate.edu BIP 0/2/16 1.70 lx24-amd64

The output file is out.txt. It's contents can be see by entering this command:

shippo% more out.txt

< M A T L A B (R) >
Copyright 1984-2010 The MathWorks, Inc.
Version 7.11.0.584 (R2010b) 64-bit (glnxa64)
August 16, 2010

To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

>> >> >>
a =

1 2 3
4 5 6

>>

Submitting Parallel Jobs

We have a number of parallel applications that can be used via the cluster – mpich, openmp, parallel matlab and parallel fluent. Steps to follow when submitting a parallel job:

  1. Log into shippo.engr.colostate.edu using your engineering account
  2. As soon as you login, runthe command source /space/sge/default/common/settings.csh This will set the right environment variables to run SGE.
  3. Write a batch file/program/script containing instructions to be executed. For example: an mpich program would be written in mpich code , for the parallel matlab program, you would write your .m file containing the matlab code
  4. Write a script file that will be submitted to the cluster. This should contain execution instructions that are needed to run your code.
  5. Submit the job using the qsub command . In the qsub command you need to include the number of processors that you would like to use and the parallel environment that you would like to use. The parallel environments are mpich1, openmp, matlab and fluent_pe.

So, if you are want to submit a mpich1 job using 4 processors, you can use the following command:

qsub -cwd -pe mpich1 4 myscript

  • Check the status of the job using the qstat command
  • Once the job is completed, check the output of the program.

Example 1:

Following is an example of how to write and submit a mpich1 job. You can view a sample mpich job here. Next we need to write a batch file compile this program using mpicc . The batch file called mpi2batch is given below

shippo% more mpi2batch

/usr/local/mpich1/bin/mpicc -o mpi2 -lm mpi.c

/usr/local/mpich1/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines mpi2

In the above script, the $NSLOTS is got from the number of processors requested at the qsub command. Set the environment variables by running the following command:

shippo% source /space/sge/default/common/settings.csh

To submit the job, use qsub:

shippo% qsub-cwd –pe mpich1 6 mpi2batch

The above qsub command submits the job to the mpich1 parallel environment and requests that the job be run on 6 processors. To check the status of your job, type

shippo% qstat -f

The output should be something like this

low.q@bacara.engr.colostate.ed BIP 0/3/16 1.64 lx24-amd64
343 4.20850 mpi2batch idname r 10/20/2010 11:31:05 1
---------------------------------------------------------------------------------
low.q@cody.engr.colostate.edu BIP 0/4/16 12.10 lx24-amd64
343 4.20850 mpi2batch idname r 10/20/2010 11:31:05 1
---------------------------------------------------------------------------------
low.q@eng-blade0.engr.colostat BIP 0/5/8 3.32 lx24-amd64
343 4.20850 mpi2batch idname r 10/20/2010 11:31:05 1
---------------------------------------------------------------------------------
low.q@eng-blade1.engr.colostat BIP 0/4/8 2.59 lx24-amd64
343 4.20850 mpi2batch idname r 10/20/2010 11:31:05 1
---------------------------------------------------------------------------------
low.q@eng-blade2.engr.colostat BIP 0/4/8 2.57 lx24-amd64
343 4.20850 mpi2batch idname r 10/20/2010 11:31:05 1
---------------------------------------------------------------------------------
low.q@eng-blade3.engr.colostat BIP 0/1/8 0.83 lx24-amd64
---------------------------------------------------------------------------------
low.q@eng-blade4.engr.colostat BIP 0/2/8 1.58 lx24-amd64
---------------------------------------------------------------------------------
low.q@eng-blade5.engr.colostat BIP 0/2/8 1.52 lx24-amd64
---------------------------------------------------------------------------------
low.q@eng-blade6.engr.colostat BIP 0/2/8 1.60 lx24-amd64
---------------------------------------------------------------------------------
low.q@fox.engr.colostate.edu BIP 0/2/16 12.88 lx24-amd64
---------------------------------------------------------------------------------
low.q@neyo.engr.colostate.edu BIP 0/3/16 1.53 lx24-amd64
343 4.20850 mpi2batch idname r 10/20/2010 11:31:05 1

From the above output, you can see that 6 processes are running. In this case all the processes are running on different machines. The output of the program is given below:

shippo % more mpi2batch.o343
myid 5 , lnbr 4 , rnbr 0
Success: I am 5 - left and right neighbors 4 and 0.
myid 3 , lnbr 2 , rnbr 4
Success: I am 3 - left and right neighbors 2 and 4.
myid 2 , lnbr 1 , rnbr 3
Success: I am 2 - left and right neighbors 1 and 3.
myid 1 , lnbr 0 , rnbr 2
Success: I am 1 - left and right neighbors 0 and 2.
myid 4 , lnbr 3 , rnbr 5
Success: I am 4 - left and right neighbors 3 and 5.
myid 0 , lnbr 5 , rnbr 1
Success: I am 0 - left and right neighbors 5 and 1.

Example 2:

Following is an example of a parallel program using OpenMP. A sample OpenMP program can be found here. We need to write a script that will contain instructions to compile and run the program. In our script , we will also determine how  much CPU time is consumed by the program. The script called 3Dscript is given below

shippo% more threeDscript

#!/bin/csh

limit stacksize 65536
/usr/local/sunstudio12.1/bin/cc -xopenmp=parallel 3D.c -lm -o 3D

echo "8 THREADS"
setenv OMP_NUM_THREADS 8
time ./3D

Before submitting the script, make sure that you have set the correct environment variables by running the following command:

shippo% source /space/sge/default/common/settings.csh

Then submit the script

shippo% qsub –cwd  -pe  openmp 8 threeDscript

As mentioned earlier, the status of the job can be checked by running the qstat command:

shippo%qstat -f

The output of the job will be stored in 3Dscipt.o[job#], where [job#] will be replaced by the actual job number. If the job number was 348, the output will be as follows:

shippo % more threeDscript.o348

8 THREADS
time in seconds: 10.655471
30.602u 0.222s 0:10.69 288.3% 0+0k 0+0io 0pf+0w

Submitting Other Parallel Jobs

The same steps need to be followed to submit matlab or fluent parallel jobs. The only difference is in the parallel environment that is used when submitting using the qsub command. For matlab, the parallel environment is  matlab . For  fluent, the parallel environment is fluent_pe.


 
layout image
layout image

This document last modified Friday January 22, 2016


Engineering Network Services home page link College of Engineering home page link