101: How to submit Slurm batch jobs

Introduction on how to submit the job to the Slurm cluster

Below is a sample Slurm script for running a Python code:

You python script example1.py

print("Hello World")

and the Slurm submission script example1.slurm

#!/bin/bash
#
#SBATCH --qos=cu_hpc
#SBATCH --partition=cpu
#SBATCH --job-name=example1
#SBATCH --output=example1.txt
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=00:10:00
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G

module purge

#To get worker node information
hostname
uname -a
more /proc/cpuinfo | grep "model name" | head -1
more /proc/cpuinfo | grep "processor" | wc -l
echo "pwd = "`pwd`
echo "TMPDIR = "$TMPDIR
echo "SLURM_SUBMIT_DIR = "$SLURM_SUBMIT_DIR
echo "SLURM_JOBID = "$SLURM_JOBID

#To run python script
python example1.py

Note that,

  1. For --qos, you should check which qos that you are assigned. You can check by using sacctmgr show assoc format=cluster,user,qos

    1. QoS includes cu_hpc, cu_htc, cu_math, cu_long, cu_student, escience

  2. For --partition, you can choose cpu or cpugpufor all QoS, except for cu_math (use math partition).

  3. See detail of QoS and partition here.

  4. You can also use other shells if want, not limited to bash. See an example of tcsh/csh in the CMSSW example.

To submit the job, you use sbatch

sbatch example1.slurm

You will see

Submitted batch job 81942

To check if your job is in which state

squeue -u your_user_name

In the ST column, R is Running, PD is pending.

Your output should look like

==========================================
SLURM_JOB_ID = 81943
SLURM_NODELIST = cpu-bladeh-01
==========================================
cpu-bladeh-01.stg
Linux cpu-bladeh-01.stg 3.10.0-1127.el7.x86_64 #1 SMP Tue Mar 31 23:36:51 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
model name	: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
16
pwd = /work/home/your_user_name/slurm/example1
TMPDIR = /work/scratch/your_user_name/81943
SLURM_SUBMIT_DIR = /work/scratch/your_user_name/81943
SLURM_JOBID = 81943
Hello World

With the Slurm output, you see that your job is running on the same directory that you submit the job (e.g. /work/home/your_user_name/slurm/example1. This is not recommended. You should move the job to run on $TMPDIR (or $SLURM_SUBMIT_DIR) and copy the output back when the job is done. Here is an example of modified example1.slurm to run on $TMPDIR and copy test.log (output of python script) back to your submission directory. The $TMPDIR will be deleted automatically after the job is done.

#!/bin/bash
#
#SBATCH --qos=cu_hpc
#SBATCH --partition=cpu
#SBATCH --job-name=example1
#SBATCH --output=example1.txt
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=00:10:00
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G

module purge

#To get worker node information
hostname
uname -a
more /proc/cpuinfo | grep "model name" | head -1
more /proc/cpuinfo | grep "processor" | wc -l

#To set your submission directory
echo "pwd = "`pwd`
export MYCODEDIR=`pwd`

#Check PATHs
echo "MYCODEDIR = "$MYCODEDIR
echo "TMPDIR = "$TMPDIR
echo "SLURM_SUBMIT_DIR = "$SLURM_SUBMIT_DIR
echo "SLURM_JOBID = "$SLURM_JOBID

#Move to TMPDIR and run python script
cp example1.py $TMPDIR
cd $TMPDIR
python example1.py >| test.log
ls -l
cp -rf test.log $MYCODEDIR/

Last updated