This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:184 [2019/09/07 14:08] hmeij07 [Summary] |
cluster:184 [2019/09/27 12:35] hmeij07 [AWS deploys T4] |
||
---|---|---|---|
Line 1: | Line 1: | ||
\\ | \\ | ||
**[[cluster: | **[[cluster: | ||
+ | |||
+ | ==== AWS deploys T4 ==== | ||
+ | |||
+ | * https:// | ||
+ | |||
+ | Look at this, the smallest Elastic Cloud Compute Instances are **g4dn.xlarge** yielding access to 4 vCPUs and 1x T4 GPU. The largest is **g4dn.12xlarge** yielding access to 48 vCPUs and 4 x T4 GPUs. Now the smallest is priced at $ 0.526/hr, and running that card 24/7 for a year is a cost of $ 4,607.76 ... meaning ... option #7 below with 26 GPUs would cost you a whopping $ 119,802. Annually! That's the low water tide mark. | ||
+ | |||
+ | The largest instance is priced at $ 4.352 and would cost you near one million dollars to run per year if you matched option #7. | ||
Line 7: | Line 15: | ||
Ok, we try this year. Here are some informational pages. | Ok, we try this year. Here are some informational pages. | ||
+ | * [[cluster: | ||
* [[cluster: | * [[cluster: | ||
* [[cluster: | * [[cluster: | ||
* [[cluster: | * [[cluster: | ||
- | ^ | + | |
- | | Nodes | | | | | | | | | | + | * All GPU cards able to do single and double precision (fp64/ |
- | | Cpus | | | | | | | | | | | + | * Tensor cores are 4 single precision cores able to return double precision results |
- | | Cores | | | | | | | | | physical | + | * GPU cards performance on double precision depends on the quantity of tensors |
- | | Gpus | | | | | | | | | | | + | * CPU model/type determines dpfp/cycle; silver 16, gold 32. |
- | | Cores | | | | | | | | | | + | |
- | | Cores | | | | | | | | | tensor | + | Criteria for selection (points of discussion raised at last meeting 08/ |
- | | Tflops | + | - Continue with current work load, just more of it (RTX2080ti/ |
- | | | + | - Do above, and enable beginners level intro into Deep Learning (T4) |
+ | - Do above, but invest for future expansion into complex Deep Learning (RTX6000) | ||
+ | |||
+ | //**Pick your option and put it in the shopping cart**// | ||
+ | Table best read from the bottom up to assess differences. | ||
+ | |||
+ | ^ | ||
+ | ^ | ||
+ | ^ ^ rtx2080ti | ||
+ | | Nodes | | ||
+ | | Cpus | | ||
+ | | Cores | | ||
+ | | Tflops | ||
+ | | Gpus | | ||
+ | | Cores | | ||
+ | | Cores | | ||
+ | | Tflops | ||
+ | | Tflops | ||
+ | | $/ | ||
+ | ^ Per Node ^^^^^^^^^^^^ | ||
+ | | Chassis | ||
+ | | CPU | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | total| | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | | ||
+ | | | ||
+ | | | ||
+ | | Drives | ||
+ | | | ||
+ | | GPU | 8 | 4 | 4 | 4 | 4 | 2 | 2 | 2 | 4 | 10 | total| | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | Power | 2200 | 1600 | 1600 | 1600 | 1600 | 1600 | 1600 | 2200 | 1600 | 2000 | Watts| | ||
+ | | | ||
+ | | CentOS7 | ||
+ | | Nics | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | gigabit| | ||
+ | | Warranty | ||
+ | | | | ||
+ | |||
+ | * #1/#2 All GPU warranty requests will be filled by GPU maker. | ||
+ | * #7 up to 4 GPUs per node. Filling rack leaving 1U open between nodes, count=15 | ||
+ | * #8 fills intended rack with AC in rack. GPU Tower/4U rack mount. | ||
+ | * #8 includes NVLink connector (bridge kit). Up to 4 GPUs per node. | ||
+ | * Tariffs may affect all quotes when executed. | ||
+ | * S&H included (or estimated) | ||
+ | * More than 4-6 nodes would be lots of work if Warewulf/ | ||
+ | |||
+ | On the question of active versus passive cooling: | ||
**Exxactcorp**: | **Exxactcorp**: |