登录查看更多内容

Setup Slurm cluster for HPC

Satish Patel

Sr. Cloud Engineer | Openstack | Network | Data Center | Kubernetes | HPC | Ceph | Virtualization

发布日期: 2024年8月26日

Slurm, or Simple Linux Utility for Resource Management, is an open-source job scheduler and workload manager for high performance computing (HPC) platforms. It helps manage and distribute compute resources to users, and can start multiple jobs on a single node or a single job on multiple nodes. Slurm’s scheduling capabilities can help improve productivity, reduce costs, and accelerate job execution.

In this bog we will setup HPC cluster and run some sample jobs to demonstrate functionality.

Architecture

Slurmctld — Slurm controller service. (Head node)
Slurmd — Slurm worker or compute node service. (Compute node)
Slurmdbd — Database service for accounting storage (optional)

Installation

Setup Munge (Controller or head node)

$ sudo apt install munge libmunge2 libmunge-dev
$ munge -n | unmunge | grep STATUS

Generate munge key (Location: /etc/munge/munge.key)

$ sudo /usr/sbin/mungekey

Setup correct permissions

$ sudo chown -R munge: /etc/munge/ /var/log/munge/ /var/lib/munge/ /run/munge/
$ sudo chmod 0700 /etc/munge/ /var/log/munge/ /var/lib/munge/
$ sudo chmod 0755 /run/munge/
$ sudo chmod 0700 /etc/munge/munge.key
$ sudo chown -R munge: /etc/munge/munge.key

Restart services

$ systemctl enable munge
$ systemctl restart munge
$ systemctl status munge

Setup Munge (Worker or Compute nodes)

$ sudo apt install munge libmunge2 libmunge-dev
$ munge -n | unmunge | grep STATUS

Copy munge.key from controller nodes to all the worker nodes and set permissions.

$ sudo chown -R munge: /etc/munge/ /var/log/munge/ /var/lib/munge/ /run/munge/
$ sudo chmod 0700 /etc/munge/ /var/log/munge/ /var/lib/munge/
$ sudo chmod 0755 /run/munge/
$ sudo chmod 0700 /etc/munge/munge.key
$ sudo chown -R munge: /etc/munge/munge.key

Restart services

$ systemctl enable munge
$ systemctl restart munge
$ systemctl status munge

Setup Slurm

Distribution base installation

$ sudo apt update -y
$ sudo apt install slurmd slurmctld -y

Build packages from latest source (Recommended way for production)

$ apt-get install build-essential fakeroot devscripts equivs
$ tar -xaf slurm-24.05.2.tar.bz2
$ cd slurm-24.05.2
$ mk-build-deps -i debian/control
$ debuild -b -uc -us

Create slurm users on all the nodes

$ export SLURMUSER=1001
$ groupadd -g $SLURMUSER slurm
$ useradd  -m -c "SLURM workload manager" -d /var/lib/slurm -u $SLURMUSER -g slurm  -s /bin/bash slurm

Install packages on head node or login node.

$ dpkg -i slurm-smd_24.05.2-1_amd64.deb
$ dpkg -i slurm-smd-slurmctld_24.05.2-1_amd64.deb
$ dpkg -i slurm-smd-client_24.05.2-1_amd64.deb

Install packages on compute nodes

$ dpkg -i slurm-smd_24.05.2-1_amd64.deb
$ dpkg -i slurm-smd-slurmd_24.05.2-1_amd64.deb
$ dpkg -i slurm-smd-client_24.05.2-1_amd64.deb

Notes: may need to run following to fix some dependencies

$ apt -y — fix-broken install

Configuration

Main configuration file is /etc/slurm/slurm.conf (Keep it default and just tweak few options according your setup. copy same slurm.conf on all your compute nodes. Even you can keep it on NFS or shared location to keep it same on all the nodes)

领英推荐

Storage, CxO Podcasts, HPC, Backup, Developer, HPE…

John J. McLaughlin 1 个月前

Compute.AI Memory-Tiered Elastic Clusters

Vikram Joshi 1 年前

ceph or lustre

Yashar Esmaildokht 1 年前

ClusterName=mycluster
SlurmctldHost=headn1
SlurmUser=slurm
ProctrackType=proctrack/cgroup
AccountingStorageType=accounting_storage/none
NodeName=computen[1-9] CPUs=80 Boards=1 SocketsPerBoard=2 CoresPerSocket=20 ThreadsPerCore=2 RealMemory=250000

Start services on head or contoller nodes

$ systemctl start slurmctld

Start services on worker or compute nodes

$ systemctl start slurmd

Validation

If all good then you will see following on head node

root@headn1:~# sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
ALL*         up   infinite      9   idle computen[1-9]

Test the cluster using following command. It will run hostname on all the nodes using srun

root@headn1:~# srun -N 9 hostname
computen2
computen9
computen4
computen5
computen3
computen6
computen1
computen7
computen8

You can check status of submited job using squeue command. (Run sleep for 60 seconds and check status in queue)

root@headn1:~# srun -N 4 sleep 60

root@headn1:~# squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               126       ALL    sleep     root  R       0:05      4 computen[1-4]

Let’s test MPI jobs for real HPC test.

You have to copy your MPI program on NFS shared storage to make it available to all your worker or compute nodes. Reference doc https://slurm.schedmd.com/mpi_guide.html

hello_world.c

#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
    // Initialize the MPI environment
    MPI_Init(NULL, NULL);

    // Get the number of processes
    int world_size;
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    // Get the rank of the process
    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

    // Get the name of the processor
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    int name_len;
    MPI_Get_processor_name(processor_name, &name_len);

    // Print off a hello world message
    printf("Hello world from processor %s, rank %d out of %d processors\n",
           processor_name, world_rank, world_size);

    // Finalize the MPI environment.
    MPI_Finalize();
}

Compile hellow_world.c

mpicc mpi_hello_world.c -o hello-world

Run (In this example, I am going to run it on 4 compute nodes using -N 4)

root@headn1:/data/sample# srun -N 4 --mpi=pmix hello-world
Hello world from processor computen3, rank 0 out of 1 processors
Hello world from processor computen2, rank 0 out of 1 processors
Hello world from processor computen4, rank 0 out of 1 processors
Hello world from processor computen1, rank 0 out of 1 processors

Add GPU node

GPU node is nothing but compute node but with GPU card with few more config flags.

Create /etc/slurm/gres.conf file with following lines

NodeName=gpun1 Name=gpu AutoDetect=off File=/dev/nvidia0

Add following in slurm.conf file and restart services.

GresTypes=gpu
NodeName=gpun1 CPUs=80 Boards=1 SocketsPerBoard=2 CoresPerSocket=20 ThreadsPerCore=2 RealMemory=250000 Gres=gpu:1 Feature=gpu

Check sinfo status

root@headn1:~# sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
ALL*         up   infinite      10  idle computen[1-9],gpun1

Create PARTITION for queue management

Add following in /etc/slurm/slurm.conf file

PartitionName=ALL Nodes=ALL Default=YES MaxTime=INFINITE State=UP
PartitionName=COMP Nodes=computen[1-9] Shared=NO MaxTime=INFINITE State=UP
PartitionName=GPU Nodes=gpun1 Shared=NO MaxTime=INFINITE State=UP

Check status of partitions

root@headn1:/data/sample# sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
ALL*         up   infinite      9   idle computen[1-9],gpun1
COMP         up   infinite      8   idle computen[1-9]
GPU          up   infinite      1   idle gpun1

Now you can target job to specific partition or queue ( using -p GPU)

root@headn1:~# srun -p GPU --gres=gpu:1 nvidia-smi -L
GPU 0: NVIDIA A10 (UUID: GPU-a759982f-198e-2303-7427-fbc160cf37bd)

Enjoy!!!

要查看或添加评论，请登录

Satish Patel的更多文章

Setup Slurm-web for Slurm HPC Clusters

2024年9月3日

Setup Slurm-web for Slurm HPC Clusters

Slurm-web provides a web interface on top of Slurm with intuitive graphical views, clear insights and advanced…
IPsec VPN tunnel between StrongSwan and PaloAlto firewall

2024年6月27日

IPsec VPN tunnel between StrongSwan and PaloAlto firewall

StrongSwan is a complete IPsec solution providing encryption and authentication to servers and clients. strongSwan can…
TRex Traffic Generator

2024年6月1日

TRex Traffic Generator

TRex is an open source, low cost, stateful and stateless traffic generator fuelled by DPDK. It generates L3-7 traffic…

3 条评论
Multinode Kolla-Ansible LAB using LXD containers

2024年5月29日

Multinode Kolla-Ansible LAB using LXD containers

In this blog, I’m going to build openstack multinode lab using kolla-ansible with help of LXD virtualization. Multinode…

2 条评论
Openstack Central Logging using Opensearch

2024年5月2日

Openstack Central Logging using Opensearch

OpenSearch is a distributed search and analytics engine that supports various use cases, from implementing a search box…
HP 6125XLG Blade Switch IRF Setup

2024年3月29日

HP 6125XLG Blade Switch IRF Setup

HP (Hewlett-Packard) switches support a feature called Intelligent Resilient Framework (IRF), which is designed to…
Upgrade Ceph from Quincy to Reef Release.

2024年3月26日

Upgrade Ceph from Quincy to Reef Release.

In this blog post, I’m going to upgrade production ceph storage from Quincy to Reef release using cephadm. Please read…

1 条评论
Openstack NFS Storage Driver for Cinder

2024年3月10日

Openstack NFS Storage Driver for Cinder

Cinder can use network file system (NFS) shares as a storage backend driver using an NFS driver implementation. A…
Openstack Manila Integration to GlusterFS with Ganesha-NFS

2024年2月7日

Openstack Manila Integration to GlusterFS with Ganesha-NFS

This blogpost introduces the shared file system service for OpenStack Manila. In this lab i am going to integrate…
High Performance computing (HPC) on Openstack

2024年1月31日

High Performance computing (HPC) on Openstack

Recently i am working on deployment on High-Performance Computing (HPC) on Openstack. In this blog, I am going cover…

See all articles

Setup Slurm cluster for HPC

Satish Patel

Sr. Cloud Engineer | Openstack | Network | Data Center | Kubernetes | HPC | Ceph | Virtualization

Architecture

Installation

Setup Slurm

Configuration

领英推荐

Validation

Add GPU node

Create PARTITION for queue management

Satish Patel的更多文章

社区洞察

其他会员也浏览了

Cloud-Native Essentials: Abstracted Endpoints

Distributed File Systems, Simplified!

RAFT Algorithm: Consensus in Distributed Systems

OSS Kubernetes and Container Storage Interface (CSI) drivers

Rethinking the Mainframe: A Powerhouse for AI

Is The Fastest Storage Array In The World Now From IBM?

Streaming Metrics for Compute Observability with Kafka

The Power of Distributed Computing: Dividing Complexity, Multiplying Results

WEKA: Visionary, Again!

How We Reimagined Data Storage

Architecture

Installation

Setup Slurm

Configuration

领英推荐

Validation

Add GPU node

Create PARTITION for queue management

Satish Patel的更多文章

Setup Slurm-web for Slurm HPC Clusters

IPsec VPN tunnel between StrongSwan and PaloAlto firewall

TRex Traffic Generator

Multinode Kolla-Ansible LAB using LXD containers

Openstack Central Logging using Opensearch

HP 6125XLG Blade Switch IRF Setup

Upgrade Ceph from Quincy to Reef Release.

Openstack NFS Storage Driver for Cinder

Openstack Manila Integration to GlusterFS with Ganesha-NFS

High Performance computing (HPC) on Openstack

社区洞察

其他会员也浏览了

Cloud-Native Essentials: Abstracted Endpoints

Distributed File Systems, Simplified!

RAFT Algorithm: Consensus in Distributed Systems

OSS Kubernetes and Container Storage Interface (CSI) drivers

Rethinking the Mainframe: A Powerhouse for AI

Is The Fastest Storage Array In The World Now From IBM?

Streaming Metrics for Compute Observability with Kafka

The Power of Distributed Computing: Dividing Complexity, Multiplying Results

WEKA: Visionary, Again!

How We Reimagined Data Storage