Using HPC ressources
Using HPC Resources#
On HPCs, you don’t run heavy jobs directly on the login node. Instead, you request resources on compute nodes from a job scheduler (e.g. SLURM).
Submitting jobs (sbatch)#
You can submit a script (e.g. job.sh) that describe what resources you need to the scheduler.
#!/bin/bash
#SBATCH --job-name=test_job
#SBATCH --cpus-per-task=4
#SBATCH --mem=8G
#SBATCH --time=08:00:00
#SBATCH --output=job.out
module load python/3.10
python myscript.py
Submit with:
sbatch job.sh
The scheduler puts the job in a queue.
When resources are available, the job runs on a compute node.
interactive sessions#
For testing/debugging/short work, you can request a temporary interactive shell with reserved resources:
srun --partition=short --cpus-per-task=4 --mem=8G --time=08:00:00 --pty bash -i
This gives you a shell with the requested resources, where you can run commands interactively.
| option | explanation |
|---|---|
| --partition=short | queue/partition to use |
| --cpus-per-task=4 | 4 CPUs |
| --mem=8G | 8 GB of RAM |
| --time=08:00:00 | session stay alive 8 hours maximum |
| --pty bash | "Give me a live terminal (bash) on a compute node, not just a job running in the background." |
Monitoring jobs#
Here are the essential SLURM Commands to Monitor Jobs
| command | explanation |
|---|---|
| sinfo | Shows available partitions, node states, and how busy the cluster is. |
| squeue -u $USER | check your jobs |
| scontrol show job | Detailed info about one job: resources requested, current state, which node it runs on. |
| scancel | Stops a job immediately |
| sacct -j JOBID | see resource usage after completion |