cluster:155
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| cluster:155 [2017/03/23 18:03] – [OpenHPC] hmeij07 | cluster:155 [2017/04/05 12:35] (current) – hmeij07 | ||
|---|---|---|---|
| Line 2: | Line 2: | ||
| **[[cluster: | **[[cluster: | ||
| - | ==== OpenHPC ==== | + | ==== OpenHPC |
| Additional tools for the OpenHPC environment. First add these two lines to SMS and all compute nodes. Patch CHROOT as well. | Additional tools for the OpenHPC environment. First add these two lines to SMS and all compute nodes. Patch CHROOT as well. | ||
| Line 67: | Line 67: | ||
| yum -y groupinstall ohpc-ganglia | yum -y groupinstall ohpc-ganglia | ||
| yum -y --installroot=/ | yum -y --installroot=/ | ||
| + | # import passwd, shadow and group files for new user account ganglia | ||
| mv / | mv / | ||
| cp / | cp / | ||
| - | perl -pi -e " | + | # use provision IP |
| + | perl -pi -e " | ||
| cp / | cp / | ||
| echo " | echo " | ||
| Line 76: | Line 78: | ||
| | | ||
| | | ||
| - | | ||
| | | ||
| Line 88: | Line 89: | ||
| * http:// | * http:// | ||
| + | |||
| + | * Not installing ClusterShell, | ||
| + | * add compute hostnames to / | ||
| + | * '' | ||
| + | |||
| + | < | ||
| + | |||
| + | [root@ohpc0-test ~]# pdsh uptime | ||
| + | n31: 10:44:25 up 19: | ||
| + | n29: 10:44:25 up 19: | ||
| + | |||
| + | </ | ||
| + | |||
| + | * Skip '' | ||
| + | * Skip '' | ||
| + | * Skip '' | ||
| + | * Skip '' | ||
| + | * Redefine '' | ||
| + | * use eth0, not public address eth1 | ||
| + | * and CHROOT/ | ||
| + | * import file back into database | ||
| + | |||
| + | Ran into a slurm config problem here on compue ndoes. When issuing '' | ||
| + | |||
| + | < | ||
| + | |||
| + | # ON COMPUTE NODES, that is in CHROOT | ||
| + | |||
| + | # Removed file / | ||
| + | mv / | ||
| + | |||
| + | # Made the following link | ||
| + | [root@n31 ~]# ls -l / | ||
| + | lrwxrwxrwx 1 root root 38 Mar 30 14:05 / | ||
| + | |||
| + | # now it starts properly | ||
| + | Mar 31 12:41:05 n31.localdomain systemd[1]: Starting Slurm node daemon... | ||
| + | Mar 31 12:41:05 n31.localdomain systemd[1]: PID file / | ||
| + | Mar 31 12:41:05 n31.localdomain systemd[1]: Started Slurm node daemon. | ||
| + | |||
| + | </ | ||
| + | |||
| + | * Recreate vnfs, Reimage the whole kaboodle | ||
| + | |||
| + | Link to my previous eval of Slurm and job throughput testing: [[cluster: | ||
| + | |||
| + | Here are my current settings on slurm.conf in OpenHPC. | ||
| + | |||
| + | < | ||
| + | ClusterName=linux | ||
| + | ControlMachine=ohpc0-slurm | ||
| + | ControlAddr=192.168.1.249 | ||
| + | SlurmUser=slurm | ||
| + | SlurmctldPort=6815-6817 | ||
| + | SlurmdPort=6818 | ||
| + | AuthType=auth/ | ||
| + | StateSaveLocation=/ | ||
| + | SlurmdSpoolDir=/ | ||
| + | SwitchType=switch/ | ||
| + | MpiDefault=none | ||
| + | SlurmctldPidFile=/ | ||
| + | SlurmdPidFile=/ | ||
| + | ProctrackType=proctrack/ | ||
| + | FirstJobId=101 | ||
| + | MaxJobCount=999999 | ||
| + | SlurmctldTimeout=300 | ||
| + | SlurmdTimeout=300 | ||
| + | InactiveLimit=0 | ||
| + | MinJobAge=300 | ||
| + | KillWait=30 | ||
| + | Waittime=0 | ||
| + | SchedulerType=sched/ | ||
| + | SchedulerPort=7321 | ||
| + | SelectType=select/ | ||
| + | FastSchedule=1 | ||
| + | SlurmctldDebug=3 | ||
| + | SlurmdDebug=3 | ||
| + | JobCompType=jobcomp/ | ||
| + | PropagateResourceLimitsExcept=MEMLOCK | ||
| + | SlurmdLogFile=/ | ||
| + | SlurmctldLogFile=/ | ||
| + | Epilog=/ | ||
| + | ReturnToService=1 | ||
| + | NodeName=ohpc0-slurm NodeAddr=192.168.1.249 | ||
| + | NodeName=n29 NodeAddr=192.168.102.38 | ||
| + | NodeName=n31 NodeAddr=192.168.102.40 | ||
| + | PartitionName=test Nodes=n29, | ||
| + | |||
| + | </ | ||
| + | |||
| + | Define CPUs, Cores, ThreadsPerCore, | ||
| + | |||
| + | [[cluster: | ||
| \\ | \\ | ||
| **[[cluster: | **[[cluster: | ||
cluster/155.1490292186.txt.gz · Last modified: by hmeij07
