我要评分
获取效率
正确性
完整性
易理解

Installing Slurm

Procedure

  1. Use PuTTY to log in to the server as the root user.
  2. Install the Slurm RPM package on the master, testnode1, and testnode2 nodes.

    cd /home/slurmrpm

    yum install -y slurm*

  3. Check whether the slurm user is created on all nodes.
    • If yes, run the following command to query user information:

      grep "slurm" /etc/group

      slurm:x:202:
    • If no, create the slurm user on the master, testnode1, and testnode2 nodes.

      groupadd -g 202 slurm

      useradd -u 202 -g 202 slurm

  4. Create the /var/spool/slurm/ssl, /var/spool/slurm/d, and /var/log/slurm directories on the master, testnode1, and testnode2 nodes.

    mkdir -p /var/spool/slurm/ssl

    mkdir -p /var/spool/slurm/d

    mkdir -p /var/log/slurm

  5. On the master, testnode1, and testnode2 nodes, set permissions for /var/spool/slurm.

    chown -R slurm.slurm /var/spool/slurm

  6. Modify the /etc/slurm/slurm.conf file on the master node.
    1. Open /etc/slurm/slurm.conf.

      vi /etc/slurm/slurm.conf

    2. Press i to enter the insert mode and add the following content:
      ControlMachine=master
      ControlAddr=192.168.40.11
      MpiDefault=none
      ProctrackType=proctrack/pgid
      ReturnToService=1
      SlurmctldPidFile=/var/run/slurmctld.pid
      SlurmdPidFile=/var/run/slurmd.pid
      SlurmdSpoolDir=/var/spool/slurm/d
      SlurmUser=slurm
      #SlurmdUser=root
      StateSaveLocation=/var/spool/slurm/ssl
      SwitchType=switch/none
      TaskPlugin=task/none
      FastSchedule=1
      SchedulerType=sched/backfill
      SelectType=select/linear
      AccountingStorageType=accounting_storage/none
      ClusterName=cluster
      JobAcctGatherType=jobacct_gather/none
      SlurmctldDebug=3
      SlurmctldLogFile=/var/log/slurm/slurmctld.log
      SlurmdDebug=3
      SlurmdLogFile=/var/log/slurm/slurmd.log
      
      NodeName=testnode1 CPUs=96 Sockets=4 CoresPerSocket=24 State=UNKNOWN
      NodeName=testnode2 CPUs=40 Sockets=4 CoresPerSocket=10 State=UNKNOWN
      
      PartitionName=ARM Nodes=testnode1 Default=YES MaxTime=INFINITE State=UP
      PartitionName=X86 Nodes=testnode1 Default=YES MaxTime=INFINITE State=UP
    3. Press Esc, type :wq!, and press Enter to save the changes and exit.
  7. On the master node, run the following commands to copy the /etc/slurm/slurm.conf file to the testnode1 and testnode2 nodes:

    scp /etc/slurm/slurm.conf testnode1:/etc/slurm

    scp /etc/slurm/slurm.conf testnode2:/etc/slurm

  8. Start the slurmctld service on the master node.

    systemctl start slurmctld

    systemctl enable slurmctld

  9. Start the slurmd service on the testnode1 and testnode2 nodes.

    systemctl start slurmd

    systemctl enable slurmd