This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
cluster:155 [2017/03/23 17:41] hmeij07 [OpenHPC] |
cluster:155 [2017/04/05 12:35] (current) hmeij07 |
||
---|---|---|---|
Line 2: | Line 2: | ||
**[[cluster: | **[[cluster: | ||
- | ==== OpenHPC ==== | + | ==== OpenHPC |
Additional tools for the OpenHPC environment. First add these two lines to SMS and all compute nodes. Patch CHROOT as well. | Additional tools for the OpenHPC environment. First add these two lines to SMS and all compute nodes. Patch CHROOT as well. | ||
Line 26: | Line 26: | ||
< | < | ||
- | yum -y groupinstall ohpc-nagios | + | yum -y groupinstall ohpc-nagios |
- | yum -y --installroot=/ | + | yum -y --installroot=/ |
- | chroot / | + | |
- | perl -pi -e " | + | perl -pi -e " |
- | echo "nrpe 5666/tcp # NRPE" >> / | + | echo "nrpe 5666/tcp # NRPE" >> / |
- | echo "nrpe : 192.168.1.249 : ALLOW" >> / | + | echo "nrpe : 192.168.1.249 : ALLOW" >> / |
- | echo "nrpe : ALL : DENY" >> / | + | echo "nrpe : ALL : DENY" >> / |
- | chroot / | + | |
-d / | -d / | ||
- | mv / | + | mv / |
- | mv / | + | mv / |
- | perl -pi -e " | + | perl -pi -e " |
- | perl -pi -e " | + | perl -pi -e " |
- | perl -pi -e "s/ \/ | + | perl -pi -e "s/ \/ |
- | perl -pi -e " | + | perl -pi -e " |
- | chkconfig nagios on | + | |
- | systemctl start nagios | + | |
- | chmod u+s `which ping` | + | chmod u+s `which ping` |
- | echo " | + | echo " |
- | echo " | + | echo " |
- | newaliases | + | |
- | systemctl restart postfix | + | |
# recreate vnfs and reimage nodes, see page1 | # recreate vnfs and reimage nodes, see page1 | ||
- | wwvnfs -y --chroot / | + | wwvnfs -y --chroot / |
- | / | + | / |
</ | </ | ||
Line 59: | Line 59: | ||
* Open port 80 in iptables but restrict severely (plain text passwords) | * Open port 80 in iptables but restrict severely (plain text passwords) | ||
* http:// | * http:// | ||
+ | |||
+ | |||
+ | * On to Ganglia | ||
+ | |||
+ | < | ||
+ | |||
+ | yum -y groupinstall ohpc-ganglia | ||
+ | yum -y --installroot=/ | ||
+ | # import passwd, shadow and group files for new user account ganglia | ||
+ | mv / | ||
+ | cp / | ||
+ | # use provision IP | ||
+ | perl -pi -e " | ||
+ | cp / | ||
+ | echo " | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | | ||
+ | |||
+ | # recreate vnfs and reimage nodes, see page1 | ||
+ | | ||
+ | / | ||
+ | |||
+ | </ | ||
+ | |||
+ | * http:// | ||
+ | |||
+ | * Not installing ClusterShell, | ||
+ | * add compute hostnames to / | ||
+ | * '' | ||
+ | |||
+ | < | ||
+ | |||
+ | [root@ohpc0-test ~]# pdsh uptime | ||
+ | n31: 10:44:25 up 19: | ||
+ | n29: 10:44:25 up 19: | ||
+ | |||
+ | </ | ||
+ | |||
+ | * Skip '' | ||
+ | * Skip '' | ||
+ | * Skip '' | ||
+ | * Skip '' | ||
+ | * Redefine '' | ||
+ | * use eth0, not public address eth1 | ||
+ | * and CHROOT/ | ||
+ | * import file back into database | ||
+ | |||
+ | Ran into a slurm config problem here on compue ndoes. When issuing '' | ||
+ | |||
+ | < | ||
+ | |||
+ | # ON COMPUTE NODES, that is in CHROOT | ||
+ | |||
+ | # Removed file / | ||
+ | mv / | ||
+ | |||
+ | # Made the following link | ||
+ | [root@n31 ~]# ls -l / | ||
+ | lrwxrwxrwx 1 root root 38 Mar 30 14:05 / | ||
+ | |||
+ | # now it starts properly | ||
+ | Mar 31 12:41:05 n31.localdomain systemd[1]: Starting Slurm node daemon... | ||
+ | Mar 31 12:41:05 n31.localdomain systemd[1]: PID file / | ||
+ | Mar 31 12:41:05 n31.localdomain systemd[1]: Started Slurm node daemon. | ||
+ | |||
+ | </ | ||
+ | |||
+ | * Recreate vnfs, Reimage the whole kaboodle | ||
+ | |||
+ | Link to my previous eval of Slurm and job throughput testing: [[cluster: | ||
+ | |||
+ | Here are my current settings on slurm.conf in OpenHPC. | ||
+ | |||
+ | < | ||
+ | ClusterName=linux | ||
+ | ControlMachine=ohpc0-slurm | ||
+ | ControlAddr=192.168.1.249 | ||
+ | SlurmUser=slurm | ||
+ | SlurmctldPort=6815-6817 | ||
+ | SlurmdPort=6818 | ||
+ | AuthType=auth/ | ||
+ | StateSaveLocation=/ | ||
+ | SlurmdSpoolDir=/ | ||
+ | SwitchType=switch/ | ||
+ | MpiDefault=none | ||
+ | SlurmctldPidFile=/ | ||
+ | SlurmdPidFile=/ | ||
+ | ProctrackType=proctrack/ | ||
+ | FirstJobId=101 | ||
+ | MaxJobCount=999999 | ||
+ | SlurmctldTimeout=300 | ||
+ | SlurmdTimeout=300 | ||
+ | InactiveLimit=0 | ||
+ | MinJobAge=300 | ||
+ | KillWait=30 | ||
+ | Waittime=0 | ||
+ | SchedulerType=sched/ | ||
+ | SchedulerPort=7321 | ||
+ | SelectType=select/ | ||
+ | FastSchedule=1 | ||
+ | SlurmctldDebug=3 | ||
+ | SlurmdDebug=3 | ||
+ | JobCompType=jobcomp/ | ||
+ | PropagateResourceLimitsExcept=MEMLOCK | ||
+ | SlurmdLogFile=/ | ||
+ | SlurmctldLogFile=/ | ||
+ | Epilog=/ | ||
+ | ReturnToService=1 | ||
+ | NodeName=ohpc0-slurm NodeAddr=192.168.1.249 | ||
+ | NodeName=n29 NodeAddr=192.168.102.38 | ||
+ | NodeName=n31 NodeAddr=192.168.102.40 | ||
+ | PartitionName=test Nodes=n29, | ||
+ | |||
+ | </ | ||
+ | |||
+ | Define CPUs, Cores, ThreadsPerCore, | ||
+ | |||
+ | [[cluster: | ||
+ | |||
\\ | \\ | ||
**[[cluster: | **[[cluster: |