Featured image of post Slurm: How to Change Job's Priority without Admin Privilege

Slurm: How to Change Job's Priority without Admin Privilege

Job priority control when using Slurm to submit jobs on an HPC cluster

Introduction

Slurm is a widely used workload manager and job scheduler for Linux and Unix systems. It allows users to submit and manage jobs on a cluster environment.

One important aspect of managing jobs on a cluster is job priority control. In this post, we will discuss how to use the “–nice” option in Slurm to set the priority of a job.

Motivation

Consider the following scenario:

  1. User A and user B share a computing cluster.
  2. User A has submitted a large number of computing tasks that occupy all available computing nodes, and there are also some computing tasks waiting, but the time required for a single computing task is not particularly long.
  3. At this time, user B needs to submit a computing task that requires very few computing resources (such as CPU cores and memory). However, because B submitted the task later than A, B’s task can only be started after all of A’s computing tasks have been completed.

In this scenario, although the computing resources required by user B are very small, because the computing task was submitted later, it must wait a long time before it can start computing, which wastes a lot of user B’s time.

If user B’s computing task could be prioritized ahead of the computing tasks that user A is waiting for, then as soon as one of user A’s tasks that is running ends, user B’s task can immediately start computing. After user B’s computation is completed, user A can continue to use the computing resources that user B has freed up. This way, user B can save a lot of waiting time, while the increase in waiting time for user A is minimal.

Therefore, in this situation, it is very important to give higher priority to the computing task submitted later by user B than to the computing tasks submitted earlier by user A.

Management of Job Priority

Job Priority in Slurm

Generally, the jobs submitted through Slurm are prioritized according to the “First In, First Out (FIFO)” principle.

However, cluster administrators may also calculate job priorities by configuring the “multi-factor priority plugin”, in which case job priorities depend on factors such as job size, queue time, affinity, and partition.

Methods for Adjusting Job Priority

  1. For administrators, job priorities can be directly managed by specifying or changing the value of the “priority” option for the job.
  2. For ordinary users, it is generally not allowed to specify or change the “priority” option for the job directly. However, there is a “nice” factor among the multiple factors that determine job priority, which ordinary users can specify. The “nice” factor can be understood as “niceness value”. The higher the value, the lower the job priority, which means you are more friendly to other users.

Usage of the “nice” Option

Specification and Modification

  • Users can specify the value of “nice” when submitting a job:

    1
    
    sbatch --nice=100 your_slurm_script
    
  • Users can also update the “nice” value of a submitted job that is waiting to start:

    1
    
    scontrol update JobId=<job_id> Nice=<new_nice_value>
    

Viewing Job Priority

  • Users can view the priority of a submitted job:
    1
    
    scontrol show job=<job_id> | grep Priority
    

Notes

  1. The default “nice” value is 0.
  2. Ordinary users can only specify a positive “nice” value, while administrators can specify a negative “nice” value. That is, ordinary users can only be “nice”, while administrators can be “bad”.
comments powered by Disqus