Batch jobs

From HP-SEE Wiki

Revision as of 11:40, 28 March 2012 by Roczei (Talk | contribs)
Jump to: navigation, search

Contents

Maui/Torque

Condor-HTC

Sun Grid Engine

SGE is typically used on computer farms, or high-performance computing clusters, and is responsible for accepting, scheduling, dispatching, and managing the remote and distributed execution of large numbers of standalone, parallel or interactive user jobs. It also manages and schedules the allocation of distributed resources such as processors, memory, disk space, and software licenses. Several commands help the users to manage the scheduler system. Here we collected the most useful ones.

Resource status commands:

  • cluster queue summary: qstat –g c
  • resource availability per host: qhost –F
  • show the parallel environments: qconf –spl
  • show the available queues: qconf -sql

Job managment commands:

  • query the job status of all jobs: qstat –u \*
  • submit a job: qsub script.sh
  • job status query: qstat –j JOB_ID
  • query the reason why out job has been not scheduled yet: qalter -w v JOB_ID

Example MPI submit script:

 #!/bin/bash
 #$ -N CONNECTIVITY
 #$ -pe mpi 10
 mpirun -np $NSLOTS ./connectivity -v

The NSLOTS variable will contain that slots number what we requested for the parallel environment (10). The following table show the most common qsub parameters.

qsub parameter Example Meaning
-N name -N test The job name
-cwd -cwd The output and the error files will be created in the actual directory
-S shell -S /bin/bash The shell type
-j {y|n} -j y Concate the error and output files
-r {y|n} -r y Job should be or not restarted after restart
-M mail -M something@example.org Job state information will be send to this mail address
-m {b|e|a|s|n} -m bea The requested job states will be reported to the mail address
-l -l h_cpu=0:15:0 Wall time limit for the job
-l -l h_vmem=1G Memory limit for the job
-pe -pe mpi 10 This is site specific parameter which setup the requested parallel environment

There is a possibility to send array jobs to SGE. For example, you may have 1000 data sets, and you want to run a single program on them, using the cluster. Here is an example to send an array job: qsub -t 1-1000 array.sh

array.sh:

 #!/bin/sh
 #$ -N PI_ARRAY_TEST
 ./pi `expr $SGE_TASK_ID \* 100000`

The SGE_TASK_ID is an inner variable which is set by one by one in the jobs.

IBM Loadleveler

Personal tools