This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:184 [2019/09/12 11:51] hmeij07 |
cluster:184 [2019/09/27 12:38] hmeij07 [AWS deploys T4] |
||
---|---|---|---|
Line 1: | Line 1: | ||
\\ | \\ | ||
**[[cluster: | **[[cluster: | ||
+ | |||
+ | ==== AWS deploys T4 ==== | ||
+ | |||
+ | * https:// | ||
+ | |||
+ | Look at this, the smallest Elastic Cloud Compute Instances are **g4dn.xlarge** yielding access to 4 vCPUs and 1x T4 GPU. The largest is **g4dn.16xlarge** yielding access to 64 vCPUs and 1x T4 GPUs. Now the smallest is priced at $0.526/hr, and running that card 24/7 for a year is a cost of $4,607.76 ... meaning ... option #7 below with 26 GPUs would cost you a whopping $119,802. Annually! That's the low tide water mark. | ||
+ | |||
+ | The high tide water mark? The largest instance is priced at $4.352 and would cost you near one million dollars to run per year if you matched option #7. | ||
Line 7: | Line 15: | ||
Ok, we try this year. Here are some informational pages. | Ok, we try this year. Here are some informational pages. | ||
+ | * [[cluster: | ||
* [[cluster: | * [[cluster: | ||
* [[cluster: | * [[cluster: | ||
Line 17: | Line 26: | ||
* CPU model/type determines dpfp/cycle; silver 16, gold 32. | * CPU model/type determines dpfp/cycle; silver 16, gold 32. | ||
- | Criteria for selection (points of discussion raised at last meeting): | + | Criteria for selection (points of discussion raised at last meeting |
- | - Continue with current work load, just more of it (RXT2080ti/RXT4000) | + | - Continue with current work load, just more of it (RTX2080ti/RTX4000) |
- | - Do above, | + | - Do above, |
- | - Do above, but invest for future expansion into complex Deep Learning (RXT6000) | + | - Do above, but invest for future expansion into complex Deep Learning (RTX6000) |
- | //**Pick your option and put it in the shopping cart**// | + | //**Pick your option and put it in the shopping cart**// |
+ | Table best read from the bottom up to assess differences. | ||
^ Options | ^ Options | ||
Line 30: | Line 40: | ||
| Cpus | 12 | 8 | 18 | 14 | 10 | 34 | 26 | 16 | 16 | 12 | total| | | Cpus | 12 | 8 | 18 | 14 | 10 | 34 | 26 | 16 | 16 | 12 | total| | ||
| Cores | 96 | 64 | 180 | 140 | 100 | 272 | 208 | 192 | 128 | 72 | physical| | | Cores | 96 | 64 | 180 | 140 | 100 | 272 | 208 | 192 | 128 | 72 | physical| | ||
- | | Tflops | + | | Tflops |
| Gpus | 48 | 16 | 36 | 28 | 20 | 34 | 26 | 16 | 28 | 60 | total| | | Gpus | 48 | 16 | 36 | 28 | 20 | 34 | 26 | 16 | 28 | 60 | total| | ||
| Cores | 209 | 74 | 157 | 72 | 92 | 75 | 67 | 74 | 72 | 138 | cuda K| | | Cores | 209 | 74 | 157 | 72 | 92 | 75 | 67 | 74 | 72 | 138 | cuda K| | ||
Line 38: | Line 48: | ||
| $/ | | $/ | ||
^ Per Node ^^^^^^^^^^^^ | ^ Per Node ^^^^^^^^^^^^ | ||
- | | Chassis | + | | Chassis |
| CPU | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | total| | | CPU | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | total| | ||
| | 4208 | 4208 | 5115 | 5115 | 5115 | 4208 | 4208 | 4214 | 4208 | 4208 | model| | | | 4208 | 4208 | 5115 | 5115 | 5115 | 4208 | 4208 | 4214 | 4208 | 4208 | model| | ||
- | | | silver | + | | | silver |
| | 2x8 | 2x8 | 2x10 | 2x10 | 2x10 | 2x8 | 2x8 | 2x12 | 2x8 | 2x8 | physical| | | | 2x8 | 2x8 | 2x10 | 2x10 | 2x10 | 2x8 | 2x8 | 2x12 | 2x8 | 2x8 | physical| | ||
| | 2.1 | 2.1 | 2.4 | 2.4 | 2.4 | 2.1 | 2.1 | 2.2 | 2.1 | 2.1 | Ghz| | | | 2.1 | 2.1 | 2.4 | 2.4 | 2.4 | 2.1 | 2.1 | 2.2 | 2.1 | 2.1 | Ghz| | ||
Line 55: | Line 65: | ||
| | 250 | 295 | 250 | 70 | 295 | 160 | 70 | 295 | 70 | 160 | Watts| | | | 250 | 295 | 250 | 70 | 295 | 160 | 70 | 295 | 70 | 160 | Watts| | ||
| Power | 2200 | 1600 | 1600 | 1600 | 1600 | 1600 | 1600 | 2200 | 1600 | 2000 | Watts| | | Power | 2200 | 1600 | 1600 | 1600 | 1600 | 1600 | 1600 | 2200 | 1600 | 2000 | Watts| | ||
- | | | 1+1 | 1+1 | 1+1 | 1+1 | 1+1 | 1+1 | 1+1 | 1+1 | 1+1? | + | | | 1+1 | 1+1 | 1+1 | 1+1 | 1+1 | 1+1 | 1+1 | 1+1 | 1+1 | 2+2 | redundant| |
| CentOS7 | | CentOS7 | ||
- | | Nics | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2? | + | | Nics | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | gigabit| |
| Warranty | | Warranty | ||
- | | | -3 | -6 | -1 | -1 | -5.5 | 0 | +1.6 | 0 | +1.5 | -1 | diff| | + | | | -3 | -6 | -1 | -1 | -5.5 | 0 | +1.6 | 0 | +1.5 | -1 | |
* #1/#2 All GPU warranty requests will be filled by GPU maker. | * #1/#2 All GPU warranty requests will be filled by GPU maker. | ||
+ | * #7 up to 4 GPUs per node. Filling rack leaving 1U open between nodes, count=15 | ||
* #8 fills intended rack with AC in rack. GPU Tower/4U rack mount. | * #8 fills intended rack with AC in rack. GPU Tower/4U rack mount. | ||
- | * #8 includes NVLink connector (bridge kit). Allows up to 4 GPUs per node with no cooling issues. | + | * #8 includes NVLink connector (bridge kit). Up to 4 GPUs per node. |
* Tariffs may affect all quotes when executed. | * Tariffs may affect all quotes when executed. | ||
* S&H included (or estimated) | * S&H included (or estimated) | ||
+ | * More than 4-6 nodes would be lots of work if Warewulf/ | ||
+ | On the question of active versus passive cooling: | ||