User Tools

Site Tools


cluster:214

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:214 [2022/04/06 09:35]
hmeij07 [example modules]
cluster:214 [2023/08/18 12:19] (current)
hmeij07 [Upgrading]
Line 397: Line 397:
 ==== Upgrading ==== ==== Upgrading ====
  
-Figure out an upgrade process before going production (don't forget any chroot images and rebuild images).+Figure out an upgrade process before going production
 + 
 +  * **Do you actually want to upgrade OpenHPC?** 
 +    * v2.6 deploys ww4.x (maybe not want this, containers) 
 +    * chroot images and rebuild images running rocky 8 
 +    * OneAPI similar conflicts? (/opt/intel and /opt/ohpc/pub) 
 +    * slurm complications? 
 +  * **Upgrade Openhpc, OneAPI should be on new head node** 
 +    * test compatibility compilers 
 +    * slurm clients
  
 <code> <code>
Line 403: Line 412:
 yum upgrade "*-ohpc" yum upgrade "*-ohpc"
 yum upgrade "ohpc-base" yum upgrade "ohpc-base"
 +
 +or
 +
 +yum update --disablerepo=* --enablerepo=[oneAPI,OpenHPC]
  
 </code> </code>
  
 +**Upgrade history**
  
 +  * OS only, 30 Jun 2022 (90+ days up) - no ohpc, oneapi (/opt)
 +  * OS only, 18 Aug 2023 (440+ days up) - no ohpc, oneapi (/opt)
 +  * 
 ==== example modules ==== ==== example modules ====
  
Line 458: Line 475:
 Sample job to run Amber20 on n[100-101] Sample job to run Amber20 on n[100-101]
  
-<code>+Amber cmake download fails with READLINE error ... package readline-devel needs to be installed to get past that which pulls in   ncurses-c++-libs-6.1-9.20180224.el8.x86_64   ncurses-devel-6.1-9.20180224.el8.x86_64   readline-devel-7.0-10.el8.x86_64  
  
 +** Example script run.rocky for cpu or gpu run** (for queues amber128 [n78] and test [n100-n101] for gpus and mw128 and tinymem for cpus)
 +
 +<code>
 #!/bin/bash #!/bin/bash
 # [found at XStream] # [found at XStream]
Line 479: Line 499:
 # CPU control # CPU control
 #SBATCH -n 8     # tasks=S*C*T #SBATCH -n 8     # tasks=S*C*T
-#SBATCH -B 1:1:1 # S:C:T=sockets/node:cores/socket:threads/core+###SBATCH -B 1:1:1 # S:C:T=sockets/node:cores/socket:threads/core 
 +#SBATCH -B 2:4:1 # S:C:T=sockets/node:cores/socket:threads/core 
 +###SBATCH --cpus-per-gpu=1 
 +###SBATCH --mem-per-gpu=7168 
 # #
 # GPU control # GPU control
-#SBATCH --gres=gpu:quadro_rtx_5000: # n[100-101] +###SBATCH --gres=gpu:geforce_gtx_1080_ti: # n78 
 +###SBATCH --gres=gpu:quadro_rtx_5000: # n[100-101] 
 +
 +# Node control 
 +#SBATCH --partition=tinymem 
 +#SBATCH --nodelist=n57
  
  
 # unique job scratch dirs # unique job scratch dirs
-MYSANSCRATCH=/sanscratch/$LSB_JOBID +MYSANSCRATCH=/sanscratch/$SLURM_JOB_ID 
-MYLOCALSCRATCH=/localscratch/$LSB_JOBID+MYLOCALSCRATCH=/localscratch/$SLURM_JOB_ID
 export MYSANSCRATCH MYLOCALSCRATCH export MYSANSCRATCH MYLOCALSCRATCH
 cd $MYLOCALSCRATCH cd $MYLOCALSCRATCH
  
 ### AMBER20 ### AMBER20
-source /share/apps/CENTOS8/ohpc/software/Amber/20/amber.sh+#source /share/apps/CENTOS8/ohpc/software/amber/20/amber.sh 
 +# OR #
 module load amber/20 module load amber/20
 +# check
 which nvcc gcc mpicc pmemd.cuda which nvcc gcc mpicc pmemd.cuda
  
Line 501: Line 530:
 cp -r ~/sharptail/* . cp -r ~/sharptail/* .
  
-# for amber20 on n[100-101]+export CUDA_VISIBLE_DEVICES=0 
 + 
 +# for amber20 on n[100-101] gpus, select gpu model 
 +#mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts-one.txt \ 
 +#-np  1 \ 
 +#pmemd.cuda \ 
 +#-O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd 
 + 
 +# for amber20 on n59/n77 cpus, select partition
 mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts.txt \ mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts.txt \
 +-np  8 \
 +pmemd.MPI \
 +-O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd
 +
 +scp mdout.$SLURM_JOB_ID ~/tmp/
 +
 +</code>
 +
 +** Example script run.centos for cpus or gpu run** (queues mwgpu, exx96)
 +
 +<code>
 +#!/bin/bash
 +# [found at XStream]
 +# Slurm will IGNORE all lines after the FIRST BLANK LINE,
 +# even the ones containing #SBATCH.
 +# Always put your SBATCH parameters at the top of your batch script.
 +# Took me days to find ... really silly behavior -Henk
 +#
 +# GENERAL
 +#SBATCH --job-name="test"
 +#SBATCH --output=out   # or both in default file
 +#SBATCH --error=err    # slurm-$SLURM_JOBID.out
 +##SBATCH --mail-type=END
 +##SBATCH --mail-user=hmeij@wesleyan.edu
 +#
 +# NODE control
 +#SBATCH -N 1     # default, nodes
 +#
 +# CPU control
 +#SBATCH -n 1     # tasks=S*C*T
 +#SBATCH -B 1:1:1 # S:C:T=sockets/node:cores/socket:threads/core
 +###SBATCH -B 2:4:1 # S:C:T=sockets/node:cores/socket:threads/core
 +#
 +# GPU control
 +###SBATCH --gres=gpu:tesla_k20m: # n[33-37]
 +#SBATCH --gres=gpu:geforce_rtx_2080_s: # n[79-90]
 +#SBATCH --cpus-per-gpu=1
 +#SBATCH --mem-per-gpu=7168
 +#
 +# Node control
 +#SBATCH --partition=exx96
 +#SBATCH --nodelist=n88
 +
 +
 +# unique job scratch dirs
 +MYSANSCRATCH=/sanscratch/$SLURM_JOB_ID
 +MYLOCALSCRATCH=/localscratch/$SLURM_JOB_ID
 +export MYSANSCRATCH MYLOCALSCRATCH
 +cd $MYLOCALSCRATCH
 +
 +# amber20/cuda 9.2/openmpi good for n33-n37 and n79-n90
 +export PATH=/share/apps/CENTOS7/openmpi/4.0.4/bin:$PATH
 +export LD_LIBRARY_PATH=/share/apps/CENTOS7/openmpi/4.0.4/lib:$LD_LIBRARY_PATH
 +export CUDA_HOME=/usr/local/n37-cuda-9.2
 +export PATH=/usr/local/n37-cuda-9.2/bin:$PATH
 +export LD_LIBRARY_PATH=/usr/local/n37-cuda-9.2/lib64:$LD_LIBRARY_PATH
 +export LD_LIBRARY_PATH="/usr/local/n37-cuda-9.2/lib:${LD_LIBRARY_PATH}"
 +export PATH=/share/apps/CENTOS7/python/3.8.3/bin:$PATH
 +export LD_LIBRARY_PATH=/share/apps/CENTOS7/python/3.8.3/lib:$LD_LIBRARY_PATH
 +which nvcc mpirun python
 +
 +
 +source /usr/local/amber20/amber.sh
 +# stage the data
 +cp -r ~/sharptail/* .
 +
 +###export CUDA_VISIBLE_DEVICES=`shuf -i 0-3 -n 1`
 +###export CUDA_VISIBLE_DEVICES=0
 +
 +
 +# for amber20 on n[33-37] gpus, select gpu model
 +mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts-one.txt \
 -np  1 \ -np  1 \
 pmemd.cuda \ pmemd.cuda \
 -O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd -O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd
  
-save results+for amber20 on n59/n100 cpus, select partition 
 +#mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts.txt \ 
 +#-np  8 \ 
 +#pmemd.MPI \ 
 +#-O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd 
 scp mdout.$SLURM_JOB_ID ~/tmp/ scp mdout.$SLURM_JOB_ID ~/tmp/
 +</code>
 +
 +
 +**The script amber.sh was converted to a module like so**
 +
 +<code>
 +
 +# or do this and add content of foo_1.0 to this module
 +#$LMOD_DIR/sh_to_modulefile  --to TCL --from=bash \
 +#--output /tmp/foo_1.0 \
 +#/share/apps/CENTOS8/ohpc/software/amber/20/amber.sh
 +
 +# need Lmod 8.6+, ohpc has 8.5.1
 +#switch -- [module-info shelltype] {
 +#    sh {
 +#        source-sh bash $scriptpath/amber.sh
 +#    }
 +#    csh {
 +#        source-sh tcsh $scriptpath/amber.csh
 +#    }
 +#}
 +
 +# which generated these lines with the Tcl header, then add these to the modulefile for amber/20
 +
 +setenv AMBERHOME {/share/apps/CENTOS8/ohpc/software/amber/20}
 +setenv LD_LIBRARY_PATH {/share/apps/CENTOS8/ohpc/software/amber/20/lib}
 +prepend-path PATH {/share/apps/CENTOS8/ohpc/software/amber/20/bin}
 +setenv PERL5LIB {/share/apps/CENTOS8/ohpc/software/amber/20/lib/perl}
 +setenv PYTHONPATH {/share/apps/CENTOS8/ohpc/software/amber/20/lib/python3.9/site-packages}
 +
  
  
 </code> </code>
 +
 +==== Amber22 ====
 +
 +Amber22 is somehow incompatible with CentOS/Rocky openmpi (yum install). Hence the latest version of openmpi was compiled and installed into $AMBERHOME. No need to set PATHs, just be sure to source amber.sh in your script. (compile instructions below for me...)
 +
 +https://ambermd.org/InstCentOS.php\\
 +"download a recent version of OpenMPI at open-mpi.org, untar the distribution in amber22_src/AmberTools/src, and execute in that directory the configure_openmpi script. (Do this after you have done a serial install, and have sourced the amber.sh script in the installation folder to create an AMBERHOME)"
 +
 +<code>
 +
 +[hmeij@n79 src]$ echo $AMBERHOME
 +/share/apps/CENTOS7/amber/amber22
 +
 +[hmeij@n79 src]$ which mpirun mpicc
 +/share/apps/CENTOS7/amber/amber22/bin/mpirun
 +/share/apps/CENTOS7/amber/amber22/bin/mpicc
 +
 +</code>
 +
 +First establish a successful run with the **run.rocky** script for Amber20 (listed above). Then change the module in your script. (for queues amber128 [n78] and test [n100-n101] for gpus and mw128 and tinymem for cpus)
 +
 +<code>
 +
 +module load amber/22
 +
 +# if the module does not show up in the output of your console
 +
 +module avail
 +
 +# treat your module cache as out of date
 +
 +module --ignore_cache avail
 +
 +</code>
 +
 +First establish a success full run with the **run.centos** script for Amber20 (listed above, for cpus or gpus on queues mwgpu and exx96). 
 +
 +Then edit the  script and apply these edits. We had to use a specific compatible ''gcc/g++'' version to make this work. Hardware is getting too old.
 +
 +<code>
 +
 +# comment out the 2 export lines pointing to openmpi
 +##export PATH=/share/apps/CENTOS7/openmpi/4.0.4/bin:$PATH
 +##export LD_LIBRARY_PATH=/share/apps/CENTOS7/openmpi/4.0.4/lib:$LD_LIBRARY_PATH
 +
 +# additional gcc 6.5.0
 +export PATH=/share/apps/CENTOS7/gcc/6.5.0/bin:$PATH
 +export LD_LIBRARY_PATH=/share/apps/CENTOS7/gcc/6.5.0/lib64:$LD_LIBRARY_PATH
 +
 +# edit or add correct source line, which and ldd lines just for debugging
 +###source /usr/local/amber16/amber.sh # works on mwgpu
 +###source /usr/local/amber20/amber.sh # works on exx96
 +source /share/apps/CENTOS7/amber/amber22/amber.sh # works on mwgpu and exx96
 +which nvcc mpirun python
 +ldd `which pmemd.cuda_SPFP`
 +
 +</code>
 +
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
cluster/214.1649252131.txt.gz · Last modified: 2022/04/06 09:35 by hmeij07