This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:184 [2019/09/12 14:00] hmeij07 |
cluster:184 [2019/12/13 13:29] hmeij07 |
||
---|---|---|---|
Line 2: | Line 2: | ||
**[[cluster: | **[[cluster: | ||
+ | ==== Turing/ | ||
+ | |||
+ | * https:// | ||
+ | |||
+ | ==== AWS deploys T4 ==== | ||
+ | |||
+ | * https:// | ||
+ | |||
+ | Look at this, the smallest Elastic Cloud Compute Instances are **g4dn.xlarge** yielding access to 4 vCPUs, 16GiB memory and 1x T4 GPU. The largest is **g4dn.16xlarge** yielding access to 64 vCPUs 256 GiB memory and 1x T4 GPUs. Now the smallest is priced at $0.526/hr, and running that card 24/7 for a year is a cost of $4,607.76 ... meaning ... option #7 below with 26 GPUs would cost you a whopping $119,802. Annually! That's the low tide water mark. | ||
+ | |||
+ | The high tide water mark? The largest instance is priced at $4.352 and would cost you near one million dollars to run per year if you matched option #7. | ||
+ | |||
+ | Rival cloud vendor Google also offers Nvidia T4 GPUs in its cloud; Google announced global availability back in April. Google Cloud’s T4 GPU availability includes three regions each in the U.S. and Asia and one each in South America and Europe. That page mentions a price of "as low as $0.29 per hour per GPU" which translates to $66K per year matching option #7 below. Still. Insane. | ||
+ | |||
+ | * https:// | ||
==== 2019 GPU Expansion ==== | ==== 2019 GPU Expansion ==== | ||
+ | |||
+ | Focus on RTX2080 model... | ||
+ | |||
+ | * Vendor A: | ||
+ | * Option 1: 48 gpus, 12 nodes, 24U, each: two 4116 12-core cpus (silver), 96 gb ram, 1tb SSD, four rtx2080 gpus (8gb), | ||
+ | * Option 2: 40 gpus, 10 nodes, 20U, each: two 4116 12-core cpus (silver), 96 gb ram, 1tb SSD, four rtx2080ti gpus (11gb), | ||
+ | * A1+A2 installed, configured and tested: NGC Docker containers Deep Learning Software Stack: NVIDIA DIGITS, TensorFlow, Caffe, NVIDIA CUDA, PyTorch, RapidsAI, Portainer ... NGC Catalog can be found at | ||
+ | https:// | ||
+ | | ||
+ | * Vendor B: | ||
+ | * Option 1: 36 gpus, 9 nodes, 18U, each: two 4214 12-core cpus (silver), 96 gb ram, 2x960gb SATA, four rtx2080tifsta gpus (11gb), | ||
+ | | ||
+ | * Vendor C: | ||
+ | * Option 1: 40 gpus, 10 nodes, 40U, each: two 4214 12-core cpus (silver), 96 gb ram, 240 gb SSD, four rtx2080ti gpus (11gb), | ||
+ | * Option 2: 48 gpus, 12 nodes, 48U, each: two 4214 12-core cpus (silver), 96 gb ram, 240 gb SSD, four rtx2080s gpus (8gb), | ||
+ | | ||
+ | * Vendor D: | ||
+ | * Option 1: 48 gpus, 12 nodes, 12U, each: two 4214 12-core cpus (silver), 64 gb ram, 2x480gb SATA, four rtx2080s gpus (8gb), | ||
Ok, we try this year. Here are some informational pages. | Ok, we try this year. Here are some informational pages. | ||
Line 35: | Line 68: | ||
| Gpus | 48 | 16 | 36 | 28 | 20 | 34 | 26 | 16 | 28 | 60 | total| | | Gpus | 48 | 16 | 36 | 28 | 20 | 34 | 26 | 16 | 28 | 60 | total| | ||
| Cores | 209 | 74 | 157 | 72 | 92 | 75 | 67 | 74 | 72 | 138 | cuda K| | | Cores | 209 | 74 | 157 | 72 | 92 | 75 | 67 | 74 | 72 | 138 | cuda K| | ||
- | | Cores | 26 | 9 | 20 | | + | | Cores | 26 | 9 | 20 | |
| Tflops | | Tflops | ||
| Tflops | | Tflops | ||
Line 64: | Line 97: | ||
* #1/#2 All GPU warranty requests will be filled by GPU maker. | * #1/#2 All GPU warranty requests will be filled by GPU maker. | ||
+ | * #7 up to 4 GPUs per node. Filling rack leaving 1U open between nodes, count=15 | ||
* #8 fills intended rack with AC in rack. GPU Tower/4U rack mount. | * #8 fills intended rack with AC in rack. GPU Tower/4U rack mount. | ||
* #8 includes NVLink connector (bridge kit). Up to 4 GPUs per node. | * #8 includes NVLink connector (bridge kit). Up to 4 GPUs per node. |