User Tools

Site Tools


cluster:134

This is an old revision of the document!



Back

Slurm

The Simple Linux Utility for Resource Management (SLURM) is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. The architecture is described here https://computing.llnl.gov/linux/slurm/quickstart.html.

  • Installation
    • begins with installing Munge
      • fairly straightforward, build RPMs from tarball
      • installed on head node and all compute nodes
      • copied the munge.key from head node to all compute nodes
    • slum installed from source code with
      • \-\-prefix=/opt/slurm-14 \-\-sysconfdir=/opt/slurm-14/etc
      • launched the configurator web page and set up a simple setup
        • created the openssl key and cert (see slurm web pages)
        • logs to files not mysql for now
        • change some settings in slurm.conf, particularly
          • FirstJObId, MaxJobId
          • MaxJobCount=100000
          • MaxTaskPerNode=65533
          • SRunEpilog/SRunProlog (creates and removes work directories in /scratch/SLUM_JOB_ID)


Back

cluster/134.1408027245.txt.gz · Last modified: 2014/08/14 14:40 by hmeij