Sep 18, 2018

Announcement

Changes

  • Upgraded operating system (CentOS 6.9 → CentOS 7.4)
  • New software build system (EasyBuild)
  • Upgraded module system (Lmod 4.1.4 → Lmod 7.7)
  • New manager for jobs and cluster resources (Moab/TORQUE → Slurm)

Transition

  • New environment is ready for testing
    • Development nodes: dev-intel18 (intel18 node) and lac-249 (intel16 node with k80 GPUs)
    • Currently available compute nodes:
      • intel14: 3
      • intel16: between 60 and 80
      • intel18: 2
  • On Oct 15 all nodes are scheduled to be migrated
  • intel18 cluster will be available later this month

Slurm

cluster management and job scheduling system

Job Submission in Moab/TORQUE

$ cat myJob.sub
#!/usr/bin/bash --login
#PBS -N myJob
#PBS -l walltime=4:00:00
#PBS -l nodes=1:ppn=4
#PBS -l mem=20gb
#PBS -l feature=intel16
#PBS -o /mnt/research/quantgen/logs
#PBS -j oe
#PBS -m abe

cd $PBS_O_WORKDIR
Rscript myJob.R

$ qsub myJob.sub
61177022.mgr-04.i

Job Submission in Slurm

$ cat myJob.sub
#!/usr/bin/bash --login
#SBATCH --job-name=myJob
#SBATCH --time=4:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=20gb
#SBATCH --constraint=intel18
#SBATCH --output=/mnt/research/quantgen/logs/%j
#SBATCH --mail-type=FAIL,BEGIN,END

cd $SLURM_SUBMIT_DIR
Rscript myJob.R

$ sbatch myJob.sub
Submitted batch job 134483

Check Status

Moab/TORQUE:

$ qstat -u $USER

mgr-04.i:
                                                                                  Req'd       Req'd       Elap
Job ID                  Username    Queue    Jobname          SessID  NDS   TSK   Memory      Time    S   Time
----------------------- ----------- -------- ---------------- ------ ----- ------ --------- --------- - ---------
61197580.mgr-04.i       gruenebe    main     STDIN            112526     1      1       1gb  01:00:00 R  00:01:41

Slurm:

$ squeue -u $USER
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            136161 general-l     bash gruenebe  R       0:20      1 lac-339

Delete Jobs

Moab/TORQUE:

$ qdel 61197580.mgr-04.i

Slurm:

$ scancel 136161

Array Jobs

Moab/TORQUE:

  • In submission script: #PBS -t 1-5
  • Name of environment variable: $PBS_ARRAYID

Slurm:

  • In submission script: #SBATCH --array=1-5 and #SBATCH --output=/mnt/research/quantgen/logs/%A_%a
  • Name of environment variable: $SLURM_ARRAY_TASK_ID

Interactive Jobs

Moab/TORQUE:

$ qsub -I -l nodes=1:ppn=4 -l mem=20gb -l walltime=1:00:00

Slurm

$ srun --nodes=1 --ntasks=1 --cpus-per-task=4 --mem=20gb
       --time=1:00:00 --pty /bin/bash

Conclusions

Conclusions

Work in progress:

  • Submission scripts need to be updated.
  • Software needs to be migrated to the new system.
  • There will be issues.

QuantGen Shared Configuration

The shared configuration files have been adapted to the new system and automatically load the following software:

  • intel/2018b
  • git/2.18.0
  • plink/1.90b6.4
  • julia/1.0.0
  • R/3.5.1-X11-20180604 (compiled with Intel MKL)

More available software: gcta/1.91.5b, htop/2.2.0, ncdu/1.13, parallel/20180722, plink/2.00a1, tmux/2.7, tree/1.7.0

Questions?