Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info

The backend job scheduler is Gridengine, which is different from the PBS used by NCI, but both work similarly

Submit a Batch Job

A batch job can be submitted by using command qsub, in the following pattern:

Code Block
qsub -b y -N JOB_NAME -pe smp NUMBER_OF_CPU -l RESOURCE1 RESOURCE2 RESOURCE3

The following tables summarises the common resources which can be specified in the qsub command:

Resource Requests

The table below summarises the major resources attributes which are commonly used in most of the jobs. There are also other attributes which are very helping for fine tuning how a job should be scheduled, whose details are provided in the sub sections.

...

GPGPU

TODO

Rack Awareness

TODO

Submit a Batch Job

A batch job can be submitted by using command qsub, in the following pattern:

Code Block
qsub -N JOB_NAME -pe smp NUMBER_OF_CPU -l ATTR1=VAL1,ATTR2=VAL2 SCRIPT

Examples

Code Block
# a very big sleep job that needs 16 x CPUs, 2 x GPGPUs, 64GB memory, 10G disk space
qsub -b y -N generic_gpgpu -pe smp 16 -l ngpus=2,mem=65G,jobfs=10G sleep 1m

# a smaller sleep job that requires the specific A2 GPGPU...
qsub -b y -N t1000_gpgpu -pe smp 8 -l ngpus=2,gpgpu_model=A2,mem=16G,jobfs=10G sleep 1m

# a big job runs on multiple H100 nodes inside the same physical rack/cabinet F (rack awareness)
qsub -b y -N h100_gpgpu -pe mpi 256 -l ngpus=2,gpgpu_model=H100,rack=f,mem=128G,jobfs=100G sleep 1m

Submission Script

TODO