User Tools

Site Tools


cluster:189

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
cluster:189 [2020/01/11 17:47]
hmeij07
cluster:189 [2020/01/11 18:07]
hmeij07 [Funding Policy]
Line 7: Line 7:
  
 In 2006, 4 Wesleyan qfaculty members approached ITS with a proposal to centrally manage a whigh performance computing center (HPCC) seeding the effort with an NSF grant (about $190K). ITS offered 0.5 FTE for a dedicated "hpcadmin". An Advisory Group was formed by these faculty plus hpcadmin (5 members, not necessarily our current "power users"). Another NSF grant reward was added in 2010 (about $105K). An alumni donation followed in 2016 (about $10K).  In 2018 the first instance of "faculty startup monies" was contribute to the HPCC (about $92K, see "Priority Policy" below. In 2019, a TrueNAS/ZFS appliance was purchased (about $40K) followed in 2020 by a GPU expansion project (about $96K). The latter two were self-funded expenditures, see "Funding Policy" below. To view the NSF grants visit [[cluster:169|Acknowledgement]] In 2006, 4 Wesleyan qfaculty members approached ITS with a proposal to centrally manage a whigh performance computing center (HPCC) seeding the effort with an NSF grant (about $190K). ITS offered 0.5 FTE for a dedicated "hpcadmin". An Advisory Group was formed by these faculty plus hpcadmin (5 members, not necessarily our current "power users"). Another NSF grant reward was added in 2010 (about $105K). An alumni donation followed in 2016 (about $10K).  In 2018 the first instance of "faculty startup monies" was contribute to the HPCC (about $92K, see "Priority Policy" below. In 2019, a TrueNAS/ZFS appliance was purchased (about $40K) followed in 2020 by a GPU expansion project (about $96K). The latter two were self-funded expenditures, see "Funding Policy" below. To view the NSF grants visit [[cluster:169|Acknowledgement]]
 +
 +The Advisory Group meets with the user base yearly in reading week of the Spring semester (early May) before everybody scatters for the summer. At this meeting the hpcadmin reviews the past year, previews the coming year and users are contributing feedback on progress and problems.
  
 ==== Structure ==== ==== Structure ====
Line 20: Line 22:
 Several months later a pattern emerged.  The Provost would annually contribute $25K if the HPC user base raised $15K annually.  That would amount to $160K in 4 years enough for a hardware refresh or new hardware acquisition.  Finances also contributed $10K for maintenance such as failed disks, network switches, etc, but these funds do not "roll over". Use it or loose it. All funds start July 1st. Several months later a pattern emerged.  The Provost would annually contribute $25K if the HPC user base raised $15K annually.  That would amount to $160K in 4 years enough for a hardware refresh or new hardware acquisition.  Finances also contributed $10K for maintenance such as failed disks, network switches, etc, but these funds do not "roll over". Use it or loose it. All funds start July 1st.
  
 +In order for the HPC user base to raise $15K annually, CPU and GPU hourly usage was deployed. A dictionary is maintained listing PIs and their members (students majors, lab students, grads, phd candidates, collaborators, etc).  Each PI then quarterly contributes to the user fund based on  a scheme yieldingq $15K annually.
  
 +Here is 2019's queue usage [[cluster:188|2019 Queue Usage]] and 2019 contribution scheme.
  
-funding policy +Contribution Scheme for 01 July 2019 onwards\\ 
-priority access policy +Hours (K) - Rate ($/CPU Hour)\\ 
- +  * 0-5 = Free 
-user base stats, annual meeting, spring reading week +  * >5-25 = 0.03 
-2019 queue usage stats link +  * >25-125 = 0.006 
-adv group details, administrative +  * >125-625 = 0.0012 
-hpcc stats cpu coresgpusmemhdd (rough) link to guide +  * >625-3125 = 0.00024 
-latest deployment: nvidia gpu cloud on premise (docker containers) link +  * >3125 = 0.000048 
- +cpu usage of 3,125,000 hours/year would cost $ 2,400.00 \\ 
-Add funding model scheme 2016 2019 +gpu hour of usage is 3x the cpu hourly rate.\\
-3x gpu vs cpu.  +
-Script preempts nodes every 2 hours. +
  
 +We currently have about 1,450 physical cpu cores, 60 gpus, 520 gb of gpu memory and 8,560 gb cpu memory provided by about 120 compute nodes and login nodes. Scratch spaces are provide local to compute nodes (2-5 tb) or over the network via NFS (55 tb). Home directories are under quota (10 tb) but these will disappear in the future with the TrueNAS/ZFS appliance (190 tb, 475 tb effective assuming a compression rate of 2.5x). a guide can be found here [[cluster:82|Brief Description]] and the software is located here [[cluster:73|CD-HIT]]
  
  
cluster/189.txt · Last modified: 2024/02/12 16:47 by hmeij07