User Tools

Site Tools


cluster:208

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
cluster:208 [2021/10/15 12:44]
hmeij07 [Overview]
cluster:208 [2021/10/15 13:02]
hmeij07 [Overview]
Line 8: Line 8:
 There is a techie page at this location **[[cluster:207|Slurm Techie Page]]** for those of you who are interested in the setup. There is a techie page at this location **[[cluster:207|Slurm Techie Page]]** for those of you who are interested in the setup.
  
-__This page is intended for users__ to get started with the Slurm scheduler. ''greentail52'' will be the slurm scheduler test "controller" and with several cpu+gpu compute nodes configured. Any jobs submitted should be simple, quick running jobs, like a "sleep" or "hello world" jobs. These compute nodes are still managed by Openlava.+__This page is intended for users__ to get started with the Slurm scheduler. ''greentail52'' will be the slurm scheduler test "controller" with several cpu+gpu compute nodes configured. Any jobs submitted should be simple, quick running jobs, like a "sleep" or "hello world" jobs. The configured compute nodes are still managed by Openlava.
  
 ** Default Environment ** ** Default Environment **
Line 76: Line 76:
 $ scontrol show node n78 $ scontrol show node n78
 NodeName=n78 Arch=x86_64 CoresPerSocket=8 NodeName=n78 Arch=x86_64 CoresPerSocket=8
-   CPUAlloc=CPUTot=32 CPULoad=1.05 +   CPUAlloc=CPUTot=32 CPULoad=0.03 
-   AvailableFeatures=hasLocalscratch           <<<--- available features+   AvailableFeatures=hasLocalscratch
    ActiveFeatures=hasLocalscratch    ActiveFeatures=hasLocalscratch
-   Gres=gpu:geforce_gtx_1080_ti:4(S:0-1)       <<<--- generic resources+   Gres=gpu:geforce_gtx_1080_ti:4(S:0-1)
    NodeAddr=n78 NodeHostName=n78 Version=21.08.1    NodeAddr=n78 NodeHostName=n78 Version=21.08.1
    OS=Linux 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017    OS=Linux 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017
-   RealMemory=128 AllocMem=128 FreeMem=16840 Sockets=2 Boards=1 +   RealMemory=128660 AllocMem=FreeMem=72987 Sockets=2 Boards=1 
-   State=MIXED ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/+   MemSpecLimit=1024 
-   Partitions=test +   State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/
-   BootTime=2021-03-28T20:35:53 SlurmdStartTime=2021-10-11T10:41:35 +   Partitions=test,amber128 
-   LastBusyTime=2021-10-11T10:57:04 +   BootTime=2021-03-28T20:35:53 SlurmdStartTime=2021-10-14T13:56:00 
-   CfgTRES=cpu=32,mem=128M,billing=32 +   LastBusyTime=2021-10-14T13:56:01 
-   AllocTRES=cpu=2,mem=128M+   CfgTRES=cpu=32,mem=128660M,billing=32 
 +   AllocTRES=
    CapWatts=n/a    CapWatts=n/a
    CurrentWatts=0 AveWatts=0    CurrentWatts=0 AveWatts=0
    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
 +
  
 # sorta like bhist -l # sorta like bhist -l
Line 147: Line 149:
 Same on the cpu only compute nodes. Features could be created for memory footprints (for example "hasMem64", "hasMem128", hasMem192", "hasMem256", "hasMem32"). Then all the cpu only nodes can go into one queue and we can stick all cpu+gpu nodes in another queue. Or all of them in a single queue. We'll see, just testing. Same on the cpu only compute nodes. Features could be created for memory footprints (for example "hasMem64", "hasMem128", hasMem192", "hasMem256", "hasMem32"). Then all the cpu only nodes can go into one queue and we can stick all cpu+gpu nodes in another queue. Or all of them in a single queue. We'll see, just testing.
  
-On the resource requests: You may request 1 or more nodes, 1 or more sockets per node, 1 or more cores (physical) per socket or 1 or more threads (logical + physical) per core. Such a request can be fine grained or not; just request a node with ''--exclusive'' (test queue only) or share nodes (other queues, wit ''--oversubscribe'')+On the cpu resource requests: You may request 1 or more nodes, 1 or more sockets per node, 1 or more cores (physical) per socket or 1 or more threads (logical + physical) per core. Such a request can be fine grained or not; just request a node with ''--exclusive'' (test queue only) or share nodes (other queues, with ''--oversubscribe'')
  
 //Note: this oversubscribing is not working yet. I can only get 4 simultaneous jobs running. Maybe there is a conflict with Openlava jobs. Should isolate a node and do further testing. After isolation (n37), 4 jobs with -n 4 exhausts number of physical cores. Is that why 5th job goes pending?//   //Note: this oversubscribing is not working yet. I can only get 4 simultaneous jobs running. Maybe there is a conflict with Openlava jobs. Should isolate a node and do further testing. After isolation (n37), 4 jobs with -n 4 exhausts number of physical cores. Is that why 5th job goes pending?//  
cluster/208.txt · Last modified: 2022/11/02 17:28 by hmeij07