Slurm 19.05 closed

During the last service window UPPMAX updated Slurm (the popular HPC job scheduler and resource manager) on Rackham, Snowy and Bianca. The previous version was the old and no longer supported version 17.11 and we upgraded to the latest and fully supported version 19.05. The later version will hopefully improve performance, as the scheduler for a long time has struggled processing large batches of small and short jobs. It remains to see how well 19.05 handles these cases, but we are optimistic. Running a supported Slurm also gives us the opportunity to forward our problems to the Slurm developer, SchedMD, and receive support.

Moving to 19.05 should for most users work exactly as before, however, no new software, especially one as complex as Slurm, is without bugs. These are the bugs we have so far discovered and reported to SchedMD:

Known issues in Slurm 19.05

Salloc with --no-shell returns segmentation fault

$ salloc --no-shell -A snic2019-123-45 -n 1 [...]
Segmentation Fault (core dumped)

Workaround: Specify a job name with the -J or --job-name= flag.

$ salloc -J my_job --no-shell -A snic2019-123-45 -n 1 [...]
salloc: Granted job allocation 725771

** Update 2019-11-06: This issue was solved in the update to Slurm 19.05.3 on November 6th.**

You can find the release notes for 19.05 at this page.