man sbatch
' for available options.srun
inside the job script - everything is executed in one step. In general you can have many subsequent srun commands, each using a subset of allocated resources (see MPI example). Using steps gives much more flexibility and allows to monitor job progress with command sacct
.--no-requeue
`.man sbatch
` for the full list.
For the full SLURM documentation please refer to https://slurm.schedmd.com/documentation.html
SLURM accounts.
To submit a job, each user have to be assigned to the SLURM account which limits usage of resources. By default all users are assigned to camk (only camk emploees) or guest account. In addition, for groups which made substantial financial contribution, there are separate accounts with higher fairshare. The default account is used automatically, to select other account use option '-A
'.
Note: all examples below assume bash as a user shell.
Serial jobs
A simple script with comments
#! /bin/bash -l ## Job name #SBATCH -J testjob ## Allocate N nodes #SBATCH -N 1 ## ntasks per node (= number of processes = number of cores) #SBATCH --ntasks-per-node=1 ## memory per core #SBATCH --mem-per-cpu=1GB ## maximum time (HH:MM:SS) #SBATCH --time=01:00:00 ## partition (queue) to use #SBATCH -p short ## stdout #SBATCH --output="stdout.txt" ## stderr #SBATCH --error="stderr.txt" ## Use account (requred only if different from default camk) #SBATCH -A camk ##send an email when done #SBATCH --mail-type=END ## jobs starts in $HOME; go to the submission directory cd $SLURM_SUBMIT_DIR # run code redirecting stdout and stderr to a file ./my_code >& out.txt
Important: Please note that the standard out and err streams from the code are redirected to a file despite the specification of standard out and err for the job. This is very important unless stdout/stderr from your code is less than a few MB. The job output is spooled locally on the execution node and copied to the user working directory only after the job completes. Since the spool size is small (a few GB) you can overfill the disk and crash all the jobs on the node. With redirection approach you avoid this and in addition you can monitor out.txt during runtime.
Array of serial jobs
It is possible to start N copies of a job using option '-a
' (array). In the example below we start 100 jobs with id ranging from 0 to 99. The id is available in the job as a shell variable SLURM_ARRAY_TASK_ID and can be used to parametrize the job.
#! /bin/bash -l
#SBATCH -J testarr
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH --mem-per-cpu=500MB
#SBATCH --time=00:10:00
#SBATCH -p short
#SBATCH --open-mode=append
#SBATCH --output="stdout-%a.txt"
#SBATCH --error="stderr-%a.txt"
#SBATCH -A camk
#SBATCH -a 0-99
cd $SLURM_SUBMIT_DIR
echo "task $SLURM_ARRAY_TASK_ID on host $(/bin/hostname)" >> out-$SLURM_ARRAY_TASK_ID.txt
More sophisticated ranges are possible:
Parallel MPI jobs
Parallel jobs must use queue para
. SLURM was designed to run primarily parallel jobs so actually there is no need for separate launcher - it is built in srun command. But it supports mpiexec
too. In the example below we allocate 8 stask but in the first step only one task is used to compile the code. In the second step we start MPI application on all allocated resources using mechanism built in srun - option --mpi=pmi2
is required!
#! /bin/bash -l #SBATCH -J testmvapich2 #SBATCH -N 2 #SBATCH --ntasks-per-node=4 #SBATCH --mem-per-cpu=500MB #SBATCH --time=00:10:00 #SBATCH -p para #SBATCH --output="stdout.txt" #SBATCH --error="stderr.txt" #SBATCH -A camk cd $SLURM_SUBMIT_DIR module purge module add mpi/mvapich2-x86_64 #just a serial task (step) srun -n 1 mpicc -o mpi-test mpi-test.c srun --mpi=pmi2 ./mpi-test # above works for mvapich, if you use openmpi use mpiexec as a launcher # mpiexec ./mpi-test
Parallel OpenMP jobs
This applies also to Mathematica jobs (however do not use more than 4 threads).
#! /bin/bash -l #SBATCH -J testopenmp #SBATCH -N 1 #SBATCH --ntasks-per-node=1 #SBATCH -c 10 #SBATCH --mem-per-cpu=500MB #SBATCH --time=00:10:00 #SBATCH -p para #SBATCH --output="stdout.txt" #SBATCH --error="stderr.txt" #SBATCH -A camk cd $SLURM_SUBMIT_DIR export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK time ./compute_pi
Please note the red part: for OpenMP jobs you ask for one task and multiple cpus (threads). It will not work if jou just ask for 10 tasks per node like for MPI job.
To start an interactive job (e.g. to compile, test, debug, profile the code) use:
srun -p interactive --pty bash
Interactive partition is dedicated and always on but limited to serial jobs. You can however request `--pty bash
` in any partition, e.g. use para partition to test/debug parallel codes or gpu for interactive work on GPUs.
For graphical applications please add `--x11
` option (of course before you have to ssh to chuck with -X option: `ssh -X chuck
`)
GPU cards are available as generic resources. To ask for specific GPU architecture use:
srun -p gpu --gres=gpu:turing:1 --pty bash
This command starts an interactive session on a gpu partition allocating 1 GPU with kepler architecture. You can verify gpu availablility with command nvidia-smi
. For a list of available architectures and node configurations please refer to Hardware page.