Ok, so we have a data center power outage for some electrical maintenance work sunday 8/26 2am-9am.
How to shut down the cluster? Here are the steps i took.
badmin qinact -C “preparing for power outage 8/26 2-9 AM” allbjobs -r -u all | awk {'print $1}' | grep -v JOBID | fmtbrequeue command.brequeue -u all [list of JOBPIDs]bqueuescluster-fork uptimecluster-fork halt/root/ipmi_nodes and supply argument 'on', 'off' or 'status'.#!/bin/bash
# all compute nodes on IPMI subnet
for i in `seq 218 253`
do
# CAREFUL, OPTIONS ARE: status, on or off
echo 192.168.2.${i}
ipmitool -H 192.168.2.${i} -U XXXXXX -P YYYYYY chassis power off
done
halt command on head node. And manually power off.halt command on ionode. And manually power off.Technically, the cluster still has power but the filers providing the file systems do not. Hence the power down. But in this case UPS, switches and MD1000s can stay powered up. In a complete power outage, turn all these devices off last.
lsid & lsload).ipmi_nodes program and proper argument.cluster-fork cat /etc/motd » /tmp/foo 2>&1 … this file should have the standard message of the day announcement not the “Kick started on such and such a date-time” message.cluster-fork rm -rf /localscratch/[0-9]* and rm -rf /sanscratch/[0-9]*badmin qact all).