User Tools

Site Tools


cluster:214

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:214 [2022/04/06 14:45]
hmeij07 [Amber20]
cluster:214 [2023/08/18 16:19] (current)
hmeij07 [Upgrading]
Line 397: Line 397:
 ==== Upgrading ==== ==== Upgrading ====
  
-Figure out an upgrade process before going production (don't forget any chroot images and rebuild images).+Figure out an upgrade process before going production
 + 
 +  * **Do you actually want to upgrade OpenHPC?** 
 +    * v2.6 deploys ww4.x (maybe not want this, containers) 
 +    * chroot images and rebuild images running rocky 8 
 +    * OneAPI similar conflicts? (/opt/intel and /opt/ohpc/pub) 
 +    * slurm complications? 
 +  * **Upgrade Openhpc, OneAPI should be on new head node** 
 +    * test compatibility compilers 
 +    * slurm clients
  
 <code> <code>
Line 403: Line 412:
 yum upgrade "*-ohpc" yum upgrade "*-ohpc"
 yum upgrade "ohpc-base" yum upgrade "ohpc-base"
 +
 +or
 +
 +yum update --disablerepo=* --enablerepo=[oneAPI,OpenHPC]
  
 </code> </code>
  
 +**Upgrade history**
  
 +  * OS only, 30 Jun 2022 (90+ days up) - no ohpc, oneapi (/opt)
 +  * OS only, 18 Aug 2023 (440+ days up) - no ohpc, oneapi (/opt)
 +  * 
 ==== example modules ==== ==== example modules ====
  
Line 458: Line 475:
 Sample job to run Amber20 on n[100-101] Sample job to run Amber20 on n[100-101]
  
-Amber cmake download fails with READLINE error ... package readline-devel needs to be installed tto get past that which pulss in   ncurses-c++-libs-6.1-9.20180224.el8.x86_64   ncurses-devel-6.1-9.20180224.el8.x86_64   readline-devel-7.0-10.el8.x86_64  +Amber cmake download fails with READLINE error ... package readline-devel needs to be installed to get past that which pulls in   ncurses-c++-libs-6.1-9.20180224.el8.x86_64   ncurses-devel-6.1-9.20180224.el8.x86_64   readline-devel-7.0-10.el8.x86_64  
  
-<code>+** Example script run.rocky for cpu or gpu run** (for queues amber128 [n78] and test [n100-n101] for gpus and mw128 and tinymem for cpus)
  
 +<code>
 #!/bin/bash #!/bin/bash
 # [found at XStream] # [found at XStream]
Line 481: Line 499:
 # CPU control # CPU control
 #SBATCH -n 8     # tasks=S*C*T #SBATCH -n 8     # tasks=S*C*T
-#SBATCH -B 1:1:1 # S:C:T=sockets/node:cores/socket:threads/core+###SBATCH -B 1:1:1 # S:C:T=sockets/node:cores/socket:threads/core 
 +#SBATCH -B 2:4:1 # S:C:T=sockets/node:cores/socket:threads/core 
 +###SBATCH --cpus-per-gpu=1 
 +###SBATCH --mem-per-gpu=7168 
 # #
 # GPU control # GPU control
-#SBATCH --gres=gpu:quadro_rtx_5000: # n[100-101] +###SBATCH --gres=gpu:geforce_gtx_1080_ti: # n78 
 +###SBATCH --gres=gpu:quadro_rtx_5000: # n[100-101] 
 +
 +# Node control 
 +#SBATCH --partition=tinymem 
 +#SBATCH --nodelist=n57
  
  
 # unique job scratch dirs # unique job scratch dirs
-MYSANSCRATCH=/sanscratch/$LSB_JOBID +MYSANSCRATCH=/sanscratch/$SLURM_JOB_ID 
-MYLOCALSCRATCH=/localscratch/$LSB_JOBID+MYLOCALSCRATCH=/localscratch/$SLURM_JOB_ID
 export MYSANSCRATCH MYLOCALSCRATCH export MYSANSCRATCH MYLOCALSCRATCH
 cd $MYLOCALSCRATCH cd $MYLOCALSCRATCH
  
 ### AMBER20 ### AMBER20
-source /share/apps/CENTOS8/ohpc/software/Amber/20/amber.sh+#source /share/apps/CENTOS8/ohpc/software/amber/20/amber.sh 
 +# OR #
 module load amber/20 module load amber/20
 +# check
 which nvcc gcc mpicc pmemd.cuda which nvcc gcc mpicc pmemd.cuda
  
Line 503: Line 530:
 cp -r ~/sharptail/* . cp -r ~/sharptail/* .
  
-# for amber20 on n[100-101]+export CUDA_VISIBLE_DEVICES=0 
 + 
 +# for amber20 on n[100-101] gpus, select gpu model 
 +#mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts-one.txt \ 
 +#-np  1 \ 
 +#pmemd.cuda \ 
 +#-O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd 
 + 
 +# for amber20 on n59/n77 cpus, select partition
 mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts.txt \ mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts.txt \
 +-np  8 \
 +pmemd.MPI \
 +-O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd
 +
 +scp mdout.$SLURM_JOB_ID ~/tmp/
 +
 +</code>
 +
 +** Example script run.centos for cpus or gpu run** (queues mwgpu, exx96)
 +
 +<code>
 +#!/bin/bash
 +# [found at XStream]
 +# Slurm will IGNORE all lines after the FIRST BLANK LINE,
 +# even the ones containing #SBATCH.
 +# Always put your SBATCH parameters at the top of your batch script.
 +# Took me days to find ... really silly behavior -Henk
 +#
 +# GENERAL
 +#SBATCH --job-name="test"
 +#SBATCH --output=out   # or both in default file
 +#SBATCH --error=err    # slurm-$SLURM_JOBID.out
 +##SBATCH --mail-type=END
 +##SBATCH --mail-user=hmeij@wesleyan.edu
 +#
 +# NODE control
 +#SBATCH -N 1     # default, nodes
 +#
 +# CPU control
 +#SBATCH -n 1     # tasks=S*C*T
 +#SBATCH -B 1:1:1 # S:C:T=sockets/node:cores/socket:threads/core
 +###SBATCH -B 2:4:1 # S:C:T=sockets/node:cores/socket:threads/core
 +#
 +# GPU control
 +###SBATCH --gres=gpu:tesla_k20m: # n[33-37]
 +#SBATCH --gres=gpu:geforce_rtx_2080_s: # n[79-90]
 +#SBATCH --cpus-per-gpu=1
 +#SBATCH --mem-per-gpu=7168
 +#
 +# Node control
 +#SBATCH --partition=exx96
 +#SBATCH --nodelist=n88
 +
 +
 +# unique job scratch dirs
 +MYSANSCRATCH=/sanscratch/$SLURM_JOB_ID
 +MYLOCALSCRATCH=/localscratch/$SLURM_JOB_ID
 +export MYSANSCRATCH MYLOCALSCRATCH
 +cd $MYLOCALSCRATCH
 +
 +# amber20/cuda 9.2/openmpi good for n33-n37 and n79-n90
 +export PATH=/share/apps/CENTOS7/openmpi/4.0.4/bin:$PATH
 +export LD_LIBRARY_PATH=/share/apps/CENTOS7/openmpi/4.0.4/lib:$LD_LIBRARY_PATH
 +export CUDA_HOME=/usr/local/n37-cuda-9.2
 +export PATH=/usr/local/n37-cuda-9.2/bin:$PATH
 +export LD_LIBRARY_PATH=/usr/local/n37-cuda-9.2/lib64:$LD_LIBRARY_PATH
 +export LD_LIBRARY_PATH="/usr/local/n37-cuda-9.2/lib:${LD_LIBRARY_PATH}"
 +export PATH=/share/apps/CENTOS7/python/3.8.3/bin:$PATH
 +export LD_LIBRARY_PATH=/share/apps/CENTOS7/python/3.8.3/lib:$LD_LIBRARY_PATH
 +which nvcc mpirun python
 +
 +
 +source /usr/local/amber20/amber.sh
 +# stage the data
 +cp -r ~/sharptail/* .
 +
 +###export CUDA_VISIBLE_DEVICES=`shuf -i 0-3 -n 1`
 +###export CUDA_VISIBLE_DEVICES=0
 +
 +
 +# for amber20 on n[33-37] gpus, select gpu model
 +mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts-one.txt \
 -np  1 \ -np  1 \
 pmemd.cuda \ pmemd.cuda \
 -O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd -O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd
  
-save results+for amber20 on n59/n100 cpus, select partition 
 +#mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts.txt \ 
 +#-np  8 \ 
 +#pmemd.MPI \ 
 +#-O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd 
 scp mdout.$SLURM_JOB_ID ~/tmp/ scp mdout.$SLURM_JOB_ID ~/tmp/
 +</code>
 +
 +
 +**The script amber.sh was converted to a module like so**
 +
 +<code>
 +
 +# or do this and add content of foo_1.0 to this module
 +#$LMOD_DIR/sh_to_modulefile  --to TCL --from=bash \
 +#--output /tmp/foo_1.0 \
 +#/share/apps/CENTOS8/ohpc/software/amber/20/amber.sh
 +
 +# need Lmod 8.6+, ohpc has 8.5.1
 +#switch -- [module-info shelltype] {
 +#    sh {
 +#        source-sh bash $scriptpath/amber.sh
 +#    }
 +#    csh {
 +#        source-sh tcsh $scriptpath/amber.csh
 +#    }
 +#}
 +
 +# which generated these lines with the Tcl header, then add these to the modulefile for amber/20
 +
 +setenv AMBERHOME {/share/apps/CENTOS8/ohpc/software/amber/20}
 +setenv LD_LIBRARY_PATH {/share/apps/CENTOS8/ohpc/software/amber/20/lib}
 +prepend-path PATH {/share/apps/CENTOS8/ohpc/software/amber/20/bin}
 +setenv PERL5LIB {/share/apps/CENTOS8/ohpc/software/amber/20/lib/perl}
 +setenv PYTHONPATH {/share/apps/CENTOS8/ohpc/software/amber/20/lib/python3.9/site-packages}
 +
  
  
 </code> </code>
 +
 +==== Amber22 ====
 +
 +Amber22 is somehow incompatible with CentOS/Rocky openmpi (yum install). Hence the latest version of openmpi was compiled and installed into $AMBERHOME. No need to set PATHs, just be sure to source amber.sh in your script. (compile instructions below for me...)
 +
 +https://ambermd.org/InstCentOS.php\\
 +"download a recent version of OpenMPI at open-mpi.org, untar the distribution in amber22_src/AmberTools/src, and execute in that directory the configure_openmpi script. (Do this after you have done a serial install, and have sourced the amber.sh script in the installation folder to create an AMBERHOME)"
 +
 +<code>
 +
 +[hmeij@n79 src]$ echo $AMBERHOME
 +/share/apps/CENTOS7/amber/amber22
 +
 +[hmeij@n79 src]$ which mpirun mpicc
 +/share/apps/CENTOS7/amber/amber22/bin/mpirun
 +/share/apps/CENTOS7/amber/amber22/bin/mpicc
 +
 +</code>
 +
 +First establish a successful run with the **run.rocky** script for Amber20 (listed above). Then change the module in your script. (for queues amber128 [n78] and test [n100-n101] for gpus and mw128 and tinymem for cpus)
 +
 +<code>
 +
 +module load amber/22
 +
 +# if the module does not show up in the output of your console
 +
 +module avail
 +
 +# treat your module cache as out of date
 +
 +module --ignore_cache avail
 +
 +</code>
 +
 +First establish a success full run with the **run.centos** script for Amber20 (listed above, for cpus or gpus on queues mwgpu and exx96). 
 +
 +Then edit the  script and apply these edits. We had to use a specific compatible ''gcc/g++'' version to make this work. Hardware is getting too old.
 +
 +<code>
 +
 +# comment out the 2 export lines pointing to openmpi
 +##export PATH=/share/apps/CENTOS7/openmpi/4.0.4/bin:$PATH
 +##export LD_LIBRARY_PATH=/share/apps/CENTOS7/openmpi/4.0.4/lib:$LD_LIBRARY_PATH
 +
 +# additional gcc 6.5.0
 +export PATH=/share/apps/CENTOS7/gcc/6.5.0/bin:$PATH
 +export LD_LIBRARY_PATH=/share/apps/CENTOS7/gcc/6.5.0/lib64:$LD_LIBRARY_PATH
 +
 +# edit or add correct source line, which and ldd lines just for debugging
 +###source /usr/local/amber16/amber.sh # works on mwgpu
 +###source /usr/local/amber20/amber.sh # works on exx96
 +source /share/apps/CENTOS7/amber/amber22/amber.sh # works on mwgpu and exx96
 +which nvcc mpirun python
 +ldd `which pmemd.cuda_SPFP`
 +
 +</code>
 +
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
cluster/214.1649256330.txt.gz · Last modified: 2022/04/06 14:45 by hmeij07