This shows you the differences between two versions of the page.
— |
cluster:52 [2007/11/20 10:18] (current) |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | \\ | ||
+ | **[[cluster: | ||
+ | ====== Upgrading to LSF ====== | ||
+ | |||
+ | Why? Here is my summation of some items i wish to take advantage of: **[[cluster: | ||
+ | |||
+ | We're running Platform/ | ||
+ | |||
+ | ===== First Stumble ===== | ||
+ | |||
+ | What version to upgrade to? Well i thought that would be easy, the latest stable version which is LSF v7.0.1. | ||
+ | |||
+ | Our OCS version is 4.1.1 and the only " | ||
+ | |||
+ | In order to install a v7 " | ||
+ | |||
+ | Another option is to perform a manual install of v7 from source. | ||
+ | |||
+ | |||
+ | |||
+ | ===== Next Step ===== | ||
+ | |||
+ | * process a License Change request via [[http:// | ||
+ | * obtain a new license file | ||
+ | * download the LSF/HPC v6.2 roll at [[http:// | ||
+ | * plan the upgrade steps | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ===== Lots of Next Steps ===== | ||
+ | |||
+ | #0 Shut off NAT box | ||
+ | |||
+ | Reset root password, shut off box. | ||
+ | |||
+ | #1a Close the head node to all ssh traffic (firewall to trusted user VLAN access only). | ||
+ | |||
+ | #1b Inactivate all queues, Backup scratch dirs, Stop all jobs. | ||
+ | |||
+ | * Make friends! | ||
+ | |||
+ | * Take a snapshot of jobs running (like the clumon jobs page or with '' | ||
+ | |||
+ | * Stop all jobs: '' | ||
+ | |||
+ | * Disable any cronjobs. | ||
+ | |||
+ | * reset sknauert' | ||
+ | |||
+ | #1c Backup up all files needed to rebuild the io-node. | ||
+ | |||
+ | The io-node is currently a compute node (but not a member of any queue and admin_closed). It has fiber channel (2 cards) to the Netapp storage device. | ||
+ | |||
+ | #1d Backup up all files needed to rebuild the compute nodes. | ||
+ | |||
+ | This includes two varieties of nodes: a light weight node and a heavy weight node sample. Some of this should be customized with extend-compute.xml (minor chances for now ...). Rebuilding is documented on the [[https:// | ||
+ | |||
+ | #1e Stop the lava system across the cluster. | ||
+ | |||
+ | ''/ | ||
+ | |||
+ | * also on ionode!! | ||
+ | * also on the head node!! | ||
+ | * also on the head node run ''/ | ||
+ | |||
+ | #1f Backup all files in /opt/lava. | ||
+ | |||
+ | => copy the / | ||
+ | |||
+ | LSF/HPC will install in /opt/lsfhpc but make sure you have remote backup copy of /opt/lava ... rsync to / | ||
+ | |||
+ | -> Disable Tivoli agents and start a manual incremental backup. | ||
+ | |||
+ | #1g Unmount all io-node exported file systems, leave nodes running. | ||
+ | |||
+ | We'll force a reboot followed by a re-image later in staggered fashion after we are done with the LSF install. | ||
+ | |||
+ | #1h Good time to clean all orphaned jobs' working dirs in / | ||
+ | |||
+ | -> fix: set this LUN to space reservation enabled (1TB) | ||
+ | |||
+ | #1i Unmount all multipathed LUN filesystems on io-node (/ | ||
+ | |||
+ | ** => <hi #ff0000> AFTER THAT DISCONNECT THE FIBER CABLES </hi> ** | ||
+ | |||
+ | Node re-imaging involves formatting and partitioning. | ||
+ | |||
+ | |||
+ | #2. Remove the lava roll. | ||
+ | |||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | 3. Add the LSFHPC roll. | ||
+ | |||
+ | '' | ||
+ | |||
+ | 4.Prep ENV and license info. | ||
+ | |||
+ | Edit / | ||
+ | Change this section to the appropriate lsf location: | ||
+ | |||
+ | < | ||
+ | # source the job scheduler environment; | ||
+ | if [ -f / | ||
+ | . / | ||
+ | fi | ||
+ | </ | ||
+ | |||
+ | Source that new environment. '' | ||
+ | Next copy the license info to / | ||
+ | |||
+ | 5a. Start the license daemon ... port 1700 is currently free. | ||
+ | |||
+ | '' | ||
+ | |||
+ | 5b. Add this startup command to / | ||
+ | |||
+ | 5c. Check the license daemons: '' | ||
+ | |||
+ | 6. Assign compute nodes to additional resources. | ||
+ | |||
+ | '' | ||
+ | (' | ||
+ | |||
+ | This will add the Infiniband MPI implementation. | ||
+ | |||
+ | <hi # | ||
+ | |||
+ | => Before you do this, redefine the io node as a compute appliance in the cluster database and turn ''/ | ||
+ | |||
+ | ''/ | ||
+ | |||
+ | Once done, mount all NFS file systems on the head node. | ||
+ | |||
+ | => Redefine the io node as a "nas appliance" | ||
+ | |||
+ | ''/ | ||
+ | '' | ||
+ | |||
+ | |||
+ | <hi # | ||
+ | |||
+ | ''/ | ||
+ | |||
+ | <hi #ffff00> Add the memory modules at this time? </hi> | ||
+ | |||
+ | #8. Starting and testing the LSF HPC Cluster. | ||
+ | |||
+ | 7a & 7b should add the nodes to the LSF cluster. | ||
+ | |||
+ | On head node:\\ | ||
+ | '' | ||
+ | ''/ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | After this is done, and all nodes are back up, walk by the lava configuration files and add information that is missing to the LSF equivalent files. | ||
+ | |||
+ | On head node:\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | #9. Configure Master fail over. | ||
+ | |||
+ | Skip this step. | ||
+ | |||
+ | #10. "Go To 1" | ||
+ | |||
+ | Walk through the items in #1 and enable/ | ||
+ | |||
+ | Kick off Tivoli for an automated backup. | ||
+ | |||
+ | Test some job submissions ... | ||
+ | |||
+ | Document the new MPI job submission procedure ... | ||
+ | |||
+ | Add our eLIM after a while ... | ||
+ | |||
+ | #11. Relocate some home directories. | ||
+ | |||
+ | * " | ||
+ | * relocate lvargarslara, | ||
+ | |||
+ | #12. NAT box. | ||
+ | |||
+ | Reconfigure compute-1-1 for Scott, maybe. | ||
+ | |||
+ | |||
+ | ---- | ||
+ | |||
+ | So how long does this take: | ||
+ | |||
+ | * one morning to install LSF/HPC + rebuild the ionode | ||
+ | * one afternoon to rebuild all other nodes (and deal with unexpected hardware problems) | ||
+ | * one morning to open every node and remove/add memory sticks | ||
+ | |||
+ | --- // | ||
+ | |||
+ | ===== Adding Memory ===== | ||
+ | |||
+ | The depts of **CHEM** and **PHYS** will each contribute $2,400 towards the purchase of additional memory. | ||
+ | |||
+ | The $7,680 is enough to purchase 64 DIMMs adding 128 GB of memory to the cluster. | ||
+ | |||
+ | |" | ||
+ | |||
+ | The 4 heavy weight nodes, with local dedicated fast disks, will not be changed. | ||
+ | |||
+ | So the first suggestion is to remove the 1 GB DIMMs from the 16 gigE enabled nodes (queue '' | ||
+ | |||
+ | That then leaves 16 empty nodes and 64 2GB DIMMs to play with. What to do?\\ | ||
+ | Here are some options. | ||
+ | |||
+ | |||
+ | ^ Scenario A ^^^^ uniform, matches infiniband nodes ^ | ||
+ | | 64< | ||
+ | ^ Scenario B ^^^^ add equal medium and heavy nodes ^ | ||
+ | | 16 | 08 | 2x2 | 64 | " eight 4 GB light weight nodes " | | ||
+ | | 16 | 04 | 4x2 | 32 | " four 8 GB medium weight nodes " | | ||
+ | | 32 | 04 | 8x2 | | ||
+ | ^ Scenario C ^^^^ emphasis on medium nodes ^ | ||
+ | | 08 | 04 | 2x2 | 32 | " four 4 GB light weight nodes " | | ||
+ | | 40 | 10 | 4x2 | 80 | " ten 8 GB medium weight nodes " | | ||
+ | | 16 | 02 | 8x2 | 16 | " two 16 GB heavy weight nodes " | | ||
+ | ^ Scenario D ^^^^ ... ^ | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | |||
+ | * Personally, i was initially leaning towards **A**. | ||
+ | * But now, viewing this table, i like the distribution of cores across light, medium and heavy weight nodes in **B**. | ||
+ | * **C** really depends on if we need 8 GB nodes. Not sure why we would do this vs **A**. | ||
+ | |||
+ | Actually, the perfect argument for **B** was offered by Francis: | ||
+ | |If machines have 8 GB of RAM, 1 job locks up the node. So two jobs lock up 2 nodes, rendering a total of 14 cores unused and unavailable. Suppose instead we have 16GB machines. | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ===== Renaming Queues ===== | ||
+ | |||
+ | In **Scenario A** above nothing really changes but the concept of a "light weight" | ||
+ | |||
+ | In **Scenario B & C**, things change. Now we have light, medium and heavy weight nodes. | ||
+ | |||
+ | | queue_name | = | number of nodes | + | which switch | + | GB mem per node | + | total cores | + | additional info | ; | | ||
+ | |||
+ | Then our queues could be named like so: | ||
+ | |||
+ | | **16i08g128c** | 16 nodes, infiniband enabled, each 8gb mem (medium), comprising 128 cores total | | | ||
+ | | **08e04g064c** | 08 nodes, gigE enabled, each 4 gb mem (light), comprising 64 cores total | | | ||
+ | | **04e08g032c** | 04 nodes, gigE enabled, each 8 gb mem (medium), comprising 32 cores total | | | ||
+ | | **04e16g032c** | 04 nodes, gigE enabled, each 16 gb mem (heavy), comprising 32 cores total | | | ||
+ | | **04e16g032cfd** | 04 nodes, gigE enabled, each 16 gb mem (heavy), comprising 32 cores total | fast local disk access| | ||
+ | |||
+ | Or is this too cumbersome? Maybe.\\ | ||
+ | Perhaps just an abbreviation: | ||
+ | |||
+ | | **imw** | 16 nodes, infiniband enabled, each 8gb mem (medium), comprising 128 cores total | | | ||
+ | | **elw** | 08 nodes, gigE enabled, each 4 gb mem (light), comprising 64 cores total | | | ||
+ | | **emw** | 04 nodes, gigE enabled, each 8 gb mem (medium), comprising 32 cores total | | | ||
+ | | **ehw** | 04 nodes, gigE enabled, each 16 gb mem (heavy), comprising 32 cores total | | | ||
+ | | **ehwfd** | 04 nodes, gigE enabled, each 16 gb mem (heavy), comprising 32 cores total | fast local disk access | | ||
+ | |||
+ | | NEW QUEUES all priority = 50 || | ||
+ | | **imw** | compute-1-1 ... compute-1-16 | | ||
+ | | **elw** | compute-1-17 ... compute-1-24 | | ||
+ | | **emw** | compute-1-25 ... compute-1-27 compute-2-28 | | ||
+ | | **ehw** | compute-2-29 ... compute-2-32 | | ||
+ | | **ehwfd** | nfs-2-1 ... nfs-2-4 | | ||
+ | | ** matlab ** | imw + emw | | ||
+ | |||
+ | delete queues: idle, [i]debug, molscat, gaussian, nat-test | ||
+ | |||
+ | \\ | ||
+ | **[[cluster: |