Using HPC ressources

Using HPC Resources#

On HPCs, you don’t run heavy jobs directly on the login node. Instead, you request resources on compute nodes from a job scheduler (e.g. SLURM).

Submitting jobs (sbatch)#

You can submit a script (e.g. job.sh) that describe what resources you need to the scheduler.

#!/bin/bash
#SBATCH --job-name=test_job
#SBATCH --cpus-per-task=4
#SBATCH --mem=8G
#SBATCH --time=08:00:00
#SBATCH --output=job.out

module load python/3.10
python myscript.py

Submit with:

sbatch job.sh

The scheduler puts the job in a queue.
When resources are available, the job runs on a compute node.

interactive sessions#

For testing/debugging/short work, you can request a temporary interactive shell with reserved resources:

srun --partition=short --cpus-per-task=4 --mem=8G --time=08:00:00 --pty bash -i

This gives you a shell with the requested resources, where you can run commands interactively.

option	explanation
--partition=short	queue/partition to use
--cpus-per-task=4	4 CPUs
--mem=8G	8 GB of RAM
--time=08:00:00	session stay alive 8 hours maximum
--pty bash	"Give me a live terminal (bash) on a compute node, not just a job running in the background."

Monitoring jobs#

Here are the essential SLURM Commands to Monitor Jobs

command	explanation
sinfo	Shows available partitions, node states, and how busy the cluster is.
squeue -u $USER	check your jobs
scontrol show job	Detailed info about one job: resources requested, current state, which node it runs on.
scancel	Stops a job immediately
sacct -j JOBID	see resource usage after completion