Differences

This shows you the differences between two versions of the page.

--- cluster:214 [2022/04/06 13:35]
hmeij07 [example modules]
+++ cluster:214 [2023/08/18 16:19] (current)
hmeij07 [Upgrading]
@@ Line 397: / Line 397: @@
 ==== Upgrading ====
-Figure out an upgrade process before going production (don't forget any chroot images and rebuild images).
+Figure out an upgrade process before going production.
+  * **Do you actually want to upgrade OpenHPC?**
+    * v2.6 deploys ww4.x (maybe not want this, containers)
+    * chroot images and rebuild images running rocky 8
+    * OneAPI similar conflicts? (/opt/intel and /opt/ohpc/pub)
+    * slurm complications?
+  * **Upgrade Openhpc, OneAPI should be on new head node**
+    * test compatibility compilers
+    * slurm clients
 <code>
@@ Line 403: / Line 412: @@
 yum upgrade "*-ohpc"
 yum upgrade "ohpc-base"
+or
+yum update --disablerepo=* --enablerepo=[oneAPI,OpenHPC]
 </code>
+**Upgrade history**
+  * OS only, 30 Jun 2022 (90+ days up) - no ohpc, oneapi (/opt)
+  * OS only, 18 Aug 2023 (440+ days up) - no ohpc, oneapi (/opt)
+  *
 ==== example modules ====
@@ Line 458: / Line 475: @@
 Sample job to run Amber20 on n[100-101]
-<code>
+Amber cmake download fails with READLINE error ... package readline-devel needs to be installed to get past that which pulls in   ncurses-c++-libs-6.1-9.20180224.el8.x86_64   ncurses-devel-6.1-9.20180224.el8.x86_64   readline-devel-7.0-10.el8.x86_64
+** Example script run.rocky for cpu or gpu run** (for queues amber128 [n78] and test [n100-n101] for gpus and mw128 and tinymem for cpus)
+<code>
 #!/bin/bash
 # [found at XStream]
@@ Line 479: / Line 499: @@
 # CPU control
 #SBATCH -n 8     # tasks=S*C*T
-#SBATCH -B 1:1:1 # S:C:T=sockets/node:cores/socket:threads/core
+###SBATCH -B 1:1:1 # S:C:T=sockets/node:cores/socket:threads/core
+#SBATCH -B 2:4:1 # S:C:T=sockets/node:cores/socket:threads/core
+###SBATCH --cpus-per-gpu=1
+###SBATCH --mem-per-gpu=7168
 #
 # GPU control
-#SBATCH --gres=gpu:quadro_rtx_5000:1  # n[100-101]
+###SBATCH --gres=gpu:geforce_gtx_1080_ti:1  # n78
+###SBATCH --gres=gpu:quadro_rtx_5000:1  # n[100-101]
+#
+# Node control
+#SBATCH --partition=tinymem
+#SBATCH --nodelist=n57
 # unique job scratch dirs
-MYSANSCRATCH=/sanscratch/$LSB_JOBID
+MYSANSCRATCH=/sanscratch/$SLURM_JOB_ID
-MYLOCALSCRATCH=/localscratch/$LSB_JOBID
+MYLOCALSCRATCH=/localscratch/$SLURM_JOB_ID
 export MYSANSCRATCH MYLOCALSCRATCH
 cd $MYLOCALSCRATCH
 ### AMBER20
-source /share/apps/CENTOS8/ohpc/software/Amber/20/amber.sh
+#source /share/apps/CENTOS8/ohpc/software/amber/20/amber.sh
+# OR #
 module load amber/20
+# check
 which nvcc gcc mpicc pmemd.cuda
@@ Line 501: / Line 530: @@
 cp -r ~/sharptail/* .
-# for amber20 on n[100-101]
+export CUDA_VISIBLE_DEVICES=0
+# for amber20 on n[100-101] gpus, select gpu model
+#mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts-one.txt \
+#-np  1 \
+#pmemd.cuda \
+#-O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd
+# for amber20 on n59/n77 cpus, select partition
 mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts.txt \
+-np  8 \
+pmemd.MPI \
+-O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd
+scp mdout.$SLURM_JOB_ID ~/tmp/
+</code>
+** Example script run.centos for cpus or gpu run** (queues mwgpu, exx96)
+<code>
+#!/bin/bash
+# [found at XStream]
+# Slurm will IGNORE all lines after the FIRST BLANK LINE,
+# even the ones containing #SBATCH.
+# Always put your SBATCH parameters at the top of your batch script.
+# Took me days to find ... really silly behavior -Henk
+#
+# GENERAL
+#SBATCH --job-name="test"
+#SBATCH --output=out   # or both in default file
+#SBATCH --error=err    # slurm-$SLURM_JOBID.out
+##SBATCH --mail-type=END
+##SBATCH --mail-user=hmeij@wesleyan.edu
+#
+# NODE control
+#SBATCH -N 1     # default, nodes
+#
+# CPU control
+#SBATCH -n 1     # tasks=S*C*T
+#SBATCH -B 1:1:1 # S:C:T=sockets/node:cores/socket:threads/core
+###SBATCH -B 2:4:1 # S:C:T=sockets/node:cores/socket:threads/core
+#
+# GPU control
+###SBATCH --gres=gpu:tesla_k20m:1  # n[33-37]
+#SBATCH --gres=gpu:geforce_rtx_2080_s:1  # n[79-90]
+#SBATCH --cpus-per-gpu=1
+#SBATCH --mem-per-gpu=7168
+#
+# Node control
+#SBATCH --partition=exx96
+#SBATCH --nodelist=n88
+# unique job scratch dirs
+MYSANSCRATCH=/sanscratch/$SLURM_JOB_ID
+MYLOCALSCRATCH=/localscratch/$SLURM_JOB_ID
+export MYSANSCRATCH MYLOCALSCRATCH
+cd $MYLOCALSCRATCH
+# amber20/cuda 9.2/openmpi good for n33-n37 and n79-n90
+export PATH=/share/apps/CENTOS7/openmpi/4.0.4/bin:$PATH
+export LD_LIBRARY_PATH=/share/apps/CENTOS7/openmpi/4.0.4/lib:$LD_LIBRARY_PATH
+export CUDA_HOME=/usr/local/n37-cuda-9.2
+export PATH=/usr/local/n37-cuda-9.2/bin:$PATH
+export LD_LIBRARY_PATH=/usr/local/n37-cuda-9.2/lib64:$LD_LIBRARY_PATH
+export LD_LIBRARY_PATH="/usr/local/n37-cuda-9.2/lib:${LD_LIBRARY_PATH}"
+export PATH=/share/apps/CENTOS7/python/3.8.3/bin:$PATH
+export LD_LIBRARY_PATH=/share/apps/CENTOS7/python/3.8.3/lib:$LD_LIBRARY_PATH
+which nvcc mpirun python
+source /usr/local/amber20/amber.sh
+# stage the data
+cp -r ~/sharptail/* .
+###export CUDA_VISIBLE_DEVICES=`shuf -i 0-3 -n 1`
+###export CUDA_VISIBLE_DEVICES=0
+# for amber20 on n[33-37] gpus, select gpu model
+mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts-one.txt \
 -np  1 \
 pmemd.cuda \
 -O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd
-# save results
+# for amber20 on n59/n100 cpus, select partition
+#mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts.txt \
+#-np  8 \
+#pmemd.MPI \
+#-O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd
 scp mdout.$SLURM_JOB_ID ~/tmp/
+</code>
+**The script amber.sh was converted to a module like so**
+<code>
+# or do this and add content of foo_1.0 to this module
+#$LMOD_DIR/sh_to_modulefile  --to TCL --from=bash \
+#--output /tmp/foo_1.0 \
+#/share/apps/CENTOS8/ohpc/software/amber/20/amber.sh
+# need Lmod 8.6+, ohpc has 8.5.1
+#switch -- [module-info shelltype] {
+#    sh {
+#        source-sh bash $scriptpath/amber.sh
+#    }
+#    csh {
+#        source-sh tcsh $scriptpath/amber.csh
+#    }
+#}
+# which generated these lines with the Tcl header, then add these to the modulefile for amber/20
+setenv AMBERHOME {/share/apps/CENTOS8/ohpc/software/amber/20}
+setenv LD_LIBRARY_PATH {/share/apps/CENTOS8/ohpc/software/amber/20/lib}
+prepend-path PATH {/share/apps/CENTOS8/ohpc/software/amber/20/bin}
+setenv PERL5LIB {/share/apps/CENTOS8/ohpc/software/amber/20/lib/perl}
+setenv PYTHONPATH {/share/apps/CENTOS8/ohpc/software/amber/20/lib/python3.9/site-packages}
 </code>
+==== Amber22 ====
+Amber22 is somehow incompatible with CentOS/Rocky openmpi (yum install). Hence the latest version of openmpi was compiled and installed into $AMBERHOME. No need to set PATHs, just be sure to source amber.sh in your script. (compile instructions below for me...)
+https://ambermd.org/InstCentOS.php\\
+"download a recent version of OpenMPI at open-mpi.org, untar the distribution in amber22_src/AmberTools/src, and execute in that directory the configure_openmpi script. (Do this after you have done a serial install, and have sourced the amber.sh script in the installation folder to create an AMBERHOME)"
+<code>
+[hmeij@n79 src]$ echo $AMBERHOME
+/share/apps/CENTOS7/amber/amber22
+[hmeij@n79 src]$ which mpirun mpicc
+/share/apps/CENTOS7/amber/amber22/bin/mpirun
+/share/apps/CENTOS7/amber/amber22/bin/mpicc
+</code>
+First establish a successful run with the **run.rocky** script for Amber20 (listed above). Then change the module in your script. (for queues amber128 [n78] and test [n100-n101] for gpus and mw128 and tinymem for cpus)
+<code>
+module load amber/22
+# if the module does not show up in the output of your console
+module avail
+# treat your module cache as out of date
+module --ignore_cache avail
+</code>
+First establish a success full run with the **run.centos** script for Amber20 (listed above, for cpus or gpus on queues mwgpu and exx96).
+Then edit the  script and apply these edits. We had to use a specific compatible ''gcc/g++'' version to make this work. Hardware is getting too old.
+<code>
+# comment out the 2 export lines pointing to openmpi
+##export PATH=/share/apps/CENTOS7/openmpi/4.0.4/bin:$PATH
+##export LD_LIBRARY_PATH=/share/apps/CENTOS7/openmpi/4.0.4/lib:$LD_LIBRARY_PATH
+# additional gcc 6.5.0
+export PATH=/share/apps/CENTOS7/gcc/6.5.0/bin:$PATH
+export LD_LIBRARY_PATH=/share/apps/CENTOS7/gcc/6.5.0/lib64:$LD_LIBRARY_PATH
+# edit or add correct source line, which and ldd lines just for debugging
+###source /usr/local/amber16/amber.sh # works on mwgpu
+###source /usr/local/amber20/amber.sh # works on exx96
+source /share/apps/CENTOS7/amber/amber22/amber.sh # works on mwgpu and exx96
+which nvcc mpirun python
+ldd `which pmemd.cuda_SPFP`
+</code>
 \\
 **[[cluster:0|Back]]**

DokuWiki

User Tools

Site Tools

Differences

Page Tools