Differences

This shows you the differences between two versions of the page.

--- cluster:184 [2019/09/12 14:00]
hmeij07
+++ cluster:184 [2019/12/13 13:29]
hmeij07
@@ Line 2: / Line 2: @@
 **[[cluster:0|Back]]**
+==== Turing/Volta/Pascal ====
+  * https://graphicscardhub.com/turing-vs-volta-v-pascal/
+==== AWS deploys T4 ====
+  * https://www.hpcwire.com/2019/09/20/aws-makes-t4-gpu-instances-broadly-available-for-inferencing-graphics/
+Look at this, the smallest Elastic Cloud Compute Instances are **g4dn.xlarge** yielding access to 4 vCPUs, 16GiB memory and 1x T4 GPU. The largest is **g4dn.16xlarge** yielding access to 64 vCPUs 256 GiB memory and 1x T4 GPUs. Now the smallest is priced at $0.526/hr, and running that card 24/7 for a year is a cost of $4,607.76 ... meaning ... option #7 below with 26 GPUs would cost you a whopping $119,802. Annually! That's the low tide water mark.
+The high tide water mark? The largest instance is priced at $4.352 and would cost you near one million dollars to run per year if you matched option #7.
+Rival cloud vendor Google also offers Nvidia T4 GPUs in its cloud; Google announced global availability back in April. Google Cloud’s T4 GPU availability includes three regions each in the U.S. and Asia and one each in South America and Europe. That page mentions a price of "as low as $0.29 per hour per GPU" which translates to $66K per year matching option #7 below. Still. Insane.
+  * https://www.hpcwire.com/2019/04/30/google-cloud-goes-global-with-nvidia-t4-gpus/
 ==== 2019 GPU Expansion ====
+Focus on RTX2080 model...
+  * Vendor A:
+    * Option 1: 48 gpus, 12 nodes, 24U, each: two 4116 12-core cpus (silver), 96 gb ram, 1tb SSD, four rtx2080 gpus (8gb),  centos7 yes, cuda yes, 3 yr, nics?, wxdxh"?
+    * Option 2: 40 gpus, 10 nodes, 20U, each: two 4116 12-core cpus (silver), 96 gb ram, 1tb SSD, four rtx2080ti gpus (11gb),  centos7 yes, cuda yes, 3 yr, nics?, wxdxh"?
+    * A1+A2 installed, configured and tested: NGC Docker containers Deep Learning Software Stack: NVIDIA DIGITS, TensorFlow, Caffe, NVIDIA CUDA, PyTorch, RapidsAI, Portainer ... NGC Catalog can be found  at
+https://ngc.nvidia.com/catalog/all?orderBy=modifiedDESC&query=&quickFilter=all&filters=
+  * Vendor B:
+    * Option 1: 36 gpus, 9 nodes, 18U, each: two 4214 12-core cpus (silver), 96 gb ram, 2x960gb SATA, four rtx2080tifsta gpus (11gb),  centos7 no, cuda no, 3 yr, 2xgbe nics, wxdxh"?
+  * Vendor C:
+    * Option 1: 40 gpus, 10 nodes, 40U, each: two 4214 12-core cpus (silver), 96 gb ram, 240 gb SSD, four rtx2080ti gpus (11gb),  centos7 yes, cuda yes, 3 yr, 2xgbe nics, 18.2x26.5x7"
+    * Option 2: 48 gpus, 12 nodes, 48U, each: two 4214 12-core cpus (silver), 96 gb ram, 240 gb SSD, four rtx2080s gpus (8gb),  centos7 yes, cuda yes, 3 yr, 2xgbe nics, 18.2x26.5x7"
+  * Vendor D:
+    * Option 1: 48 gpus, 12 nodes, 12U, each: two 4214 12-core cpus (silver), 64 gb ram, 2x480gb SATA, four rtx2080s gpus (8gb),  centos7 yes, cuda yes, 3 yr, 2xgbe nics, 17.2x35.2x1.7"
 Ok, we try this year. Here are some informational pages.
@@ Line 35: / Line 68: @@
 |  Gpus  |  48  |  16  |  36  |  28  |  20  |  34  |  26  |  16  |  28  |  60  | total|
 |  Cores  |  209  |  74  |  157  |  72  |  92  |  75  |  67  |  74  |  72  |  138  | cuda K|
-|  Cores  |  26  |  9  |  20  |  16  |  11.5  |  10  |  8  |  9  |  9  |  17  | tensor K|
+|  Cores  |  26  |  9  |  20  |  8.9  |  11.5  |  10  |  8  |  9  |  9  |  17  | tensor K|
 |  Tflops  |  21  |  13  |  16  |  7  |  10  |  7.5  |  6.5  |  13  |  7  |  13  | gpu dpfp|
 |  Tflops  |  682  |  261  |  511  |  227  |  326  |  241  |  211  |  261  |  227  |  426  | gpu spfp|
@@ Line 64: / Line 97: @@
   * #1/#2 All GPU warranty requests will be filled by GPU maker.
+  * #7 up to 4 GPUs per node. Filling rack leaving 1U open between nodes, count=15
   * #8 fills intended rack with AC in rack. GPU Tower/4U rack mount.
   * #8 includes NVLink connector (bridge kit). Up to 4 GPUs per node.

DokuWiki

User Tools

Site Tools

Differences

Page Tools