This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:189 [2020/01/10 19:57] hmeij07 [Priority Access] |
cluster:189 [2020/01/11 22:39] hmeij07 [Structure and History of HPCC] |
||
---|---|---|---|
Line 2: | Line 2: | ||
**[[cluster: | **[[cluster: | ||
- | ===== Priority Access | + | ===== Structure and History of HPCC ===== |
- | This page will describe | + | As promised at the CLAC HPC Mindshare event at Swarthmore College Jan 2020. Here is the Funding and Priority Policies with some context around it. Questions/ |
+ | |||
+ | ==== History ==== | ||
+ | |||
+ | In 2006, 4 Wesleyan faculty members approached ITS with a proposal to centrally manage a high performance computing center (HPCC) seeding the effort with an NSF grant (about $190K, two racks full of Dell PE1950, a total of 256 physical cpu cores on Infiniband). ITS offered 0.5 FTE for a dedicated " | ||
+ | |||
+ | The Advisory Group meets with the user base yearly during the reading week of the Spring semester (early May) before everybody scatters | ||
+ | |||
+ | ==== Structure ==== | ||
+ | |||
+ | The Wesleyan | ||
+ | |||
+ | The QAC has an [[https:// | ||
+ | |||
+ | ==== Funding Policy ==== | ||
+ | |||
+ | After an 8 year run of the HPCC, and a drying up of grant opportunities at NSF, it was decided to explore self-funding so the HPCC effort could continue without external dependencies on funds. A report was compiled of the HPCC progress including topics such as Publications, | ||
+ | |||
+ | Several months later a pattern emerged. | ||
+ | |||
+ | In order for the HPCC user base to raise $15K annually, CPU and GPU hourly usage monitoring was deployed (using Openlava '' | ||
+ | |||
+ | Here is queue usage for 2019 [[cluster: | ||
+ | |||
+ | Contribution Scheme for 01 July 2019 onwards\\ | ||
+ | Hours (K) - Rate ($/CPU Hour)\\ | ||
+ | * 0-5 = Free | ||
+ | * >5-25 = 0.03 | ||
+ | * >25-125 = 0.006 | ||
+ | * >125-625 = 0.0012 | ||
+ | * > | ||
+ | * >3125 = 0.000048 | ||
+ | A cpu usage of 3,125,000 hours/year would cost $ 2,400.00 \\ | ||
+ | A gpu hour of usage is 3x the cpu hourly rate.\\ | ||
+ | |||
+ | We currently have about 1,450 physical cpu cores (all Xeon), 60 gpus (K20, GTX2018Ti, RTX2080S), 520 gb of gpu memory and 8,560 gb of cpu memory. Provided by about 120 compute nodes and login nodes. Scratch spaces are provided local to compute nodes (2-5 tb) or over the network via NFS (55 tb), consult [[cluster: | ||
+ | |||
+ | |||
+ | ==== Priority Policy ==== | ||
+ | |||
+ | This policy was put in place about 3 years ago to deal with the issues surrounding new monies infusions from for example; new faculty " | ||
There are few Principles in this Priority Access Policy | There are few Principles in this Priority Access Policy | ||
- | - Contributions, | + | - Contributions, |
- Priority access is granted for 3 years starting at the date of deployment (user access). | - Priority access is granted for 3 years starting at the date of deployment (user access). | ||
- Only applies to newly purchased resources which should be under warranty in the priority period. | - Only applies to newly purchased resources which should be under warranty in the priority period. | ||
+ | |||
+ | **The main objective is to build an HPCC community resource for all users with no (permanent) special treatment of any subgroup.** | ||
- | The main objective is to build an HPCC for all users with no (permanent) special treatment | + | The first principle implies that all users have access to the new resource(s) immediately when deployed. Root privilege is for hpcadmin only, sudo privilege may be used if/when necessary to achieve some purpose. The hpcadmin will maintain the new resource(s) while configuration(s) of the new resource(s) will be done by consent of all parties involved. Final approval by the Advisory Group initiates deployment activities. |
- | The first principle implies that all users have access to the new resources immidiately when deployed. Root privilege is for hpcadmin only, sudo privilge may be used if/when necessary to achieve some purpose. The hpcadmin will maintain the new resource(s) while configuration(s) of new resource(s) will be done by consent of all parties involved. Final approval by the Advisory Group initiates deployment activities. | + | The second principle grants priority access to certain resource(s) for a limited time to a limited group. The same PI/users relationship will be used as is used in the CPU/GPU Usage Contribution scheme. Priority access |
- | + | ||
- | The second principle grants priority access to certain resource(s) for a limited time to a limited group. The same PI/users relationship will be used as is used in the CPU Usage Contribution scheme. Priority access means if during the priority period the priority members jobs go into pending mode for more than 24 hours the hpcadmin will clear compute nodes of running jobs and force those pending jobs to run. | + | |
All users should be aware this may happen so please checkpoint your jobs with a checkpoint interval of 24 hours. Please consult | All users should be aware this may happen so please checkpoint your jobs with a checkpoint interval of 24 hours. Please consult | ||
+ | ==== General ==== | ||
+ | |||
+ | There are 557 lines in ''/ | ||
+ | |||
+ | Rstore is a platform for storing research static data. The hope is to move static data off the HPCC and mount it read-only back onto the HPCC login nodes. | ||
+ | The Data Center has recently been renovated so the HPCC has no more cooling power problems (It used to be in the event of a cooling tower failure, within 3 hours the HPCC would push temps above 85F). No more. We have sufficient rack space (5) and power for expansion. For details on that "live renovation" | ||