User Tools

Site Tools


cluster:184

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:184 [2019/09/27 12:45]
hmeij07 [AWS deploys T4]
cluster:184 [2020/01/03 13:22] (current)
hmeij07
Line 1: Line 1:
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
 +
 +==== Turing/Volta/Pascal ====
 +
 +  * https://graphicscardhub.com/turing-vs-volta-v-pascal/
  
 ==== AWS deploys T4 ==== ==== AWS deploys T4 ====
Line 6: Line 10:
   * https://www.hpcwire.com/2019/09/20/aws-makes-t4-gpu-instances-broadly-available-for-inferencing-graphics/   * https://www.hpcwire.com/2019/09/20/aws-makes-t4-gpu-instances-broadly-available-for-inferencing-graphics/
  
-Look at this, the smallest Elastic Cloud Compute Instances are **g4dn.xlarge** yielding access to 4 vCPUs and 1x T4 GPU. The largest is **g4dn.16xlarge** yielding access to 64 vCPUs and 1x T4 GPUs. Now the smallest is priced at $0.526/hr, and running that card 24/7 for a year is a cost of $4,607.76 ... meaning ... option #7 below with 26 GPUs would cost you a whopping $119,802. Annually! That's the low tide water mark. +Look at this, the smallest Elastic Cloud Compute Instances are **g4dn.xlarge** yielding access to 4 vCPUs, 16GiB memory and 1x T4 GPU. The largest is **g4dn.16xlarge** yielding access to 64 vCPUs 256 GiB memory and 1x T4 GPUs. Now the smallest is priced at $0.526/hr, and running that card 24/7 for a year is a cost of $4,607.76 ... meaning ... option #7 below with 26 GPUs would cost you a whopping $119,802. Annually! That's the low tide water mark. 
  
 The high tide water mark? The largest instance is priced at $4.352 and would cost you near one million dollars to run per year if you matched option #7. The high tide water mark? The largest instance is priced at $4.352 and would cost you near one million dollars to run per year if you matched option #7.
Line 15: Line 19:
  
 ==== 2019 GPU Expansion ==== ==== 2019 GPU Expansion ====
 +
 +More focus...
 +
 +  * Vendor A: 
 +    * Option 1: 48 gpus, 12 nodes, 24U, each: two 4214 12-core cpus (silver), 96 gb ram, 1tb SSD, four NVIDIA RTX 2080 SUPER 8GB GPU,  centos7 yes, cuda yes, 3 yr, 2x gbe nics, 17.2w 31.5d 3.46h"​ (fits)​
 +
 +With the Deep Learning Ready docker containers...[[cluster:187|NGC Docker Containers]]
 +
 +The SUPER model quote above is what we selected\\
 + --- //[[hmeij@wesleyan.edu|Henk]] 2020/01/03 08:22//
 +
 +
 +Focus on RTX2080 model...
 +
 +  * Vendor A: ​
 +    * Option 1: 48 gpus, 12 nodes, 24U, each: two 4116 12-core cpus (silver), 96 gb ram, 1tb SSD, four rtx2080 gpus (8gb),  centos7 yes, cuda yes, 3 yr, nics?, wxdxh"?
 +    * Option 2: 40 gpus, 10 nodes, 20U, each: two 4116 12-core cpus (silver), 96 gb ram, 1tb SSD, four rtx2080ti gpus (11gb),  centos7 yes, cuda yes, 3 yr, nics?, wxdxh"?
 +    * A1+A2 installed, configured and tested: NGC Docker containers Deep Learning Software Stack: NVIDIA DIGITS, TensorFlow, Caffe, NVIDIA CUDA, PyTorch, RapidsAI, Portainer ... NGC Catalog can be found  at​
 +https://ngc.nvidia.com/catalog/all?orderBy=modifiedDESC&query=&quickFilter=all&filters=​
 +
 +  * Vendor B:​
 +    * Option 1: 36 gpus, 9 nodes, 18U, each: two 4214 12-core cpus (silver), 96 gb ram, 2x960gb SATA, four rtx2080tifsta gpus (11gb),  centos7 no, cuda no, 3 yr, 2xgbe nics, wxdxh"?
 +
 +  * Vendor C:​
 +    * Option 1: 40 gpus, 10 nodes, 40U, each: two 4214 12-core cpus (silver), 96 gb ram, 240 gb SSD, four rtx2080ti gpus (11gb),  centos7 yes, cuda yes, 3 yr, 2xgbe nics, 18.2x26.5x7"
 +    * Option 2: 48 gpus, 12 nodes, 48U, each: two 4214 12-core cpus (silver), 96 gb ram, 240 gb SSD, four rtx2080s gpus (8gb),  centos7 yes, cuda yes, 3 yr, 2xgbe nics, 18.2x26.5x7"
 +
 +  * Vendor D:​
 +    * Option 1: 48 gpus, 12 nodes, 12U, each: two 4214 12-core cpus (silver), 64 gb ram, 2x480gb SATA, four rtx2080s gpus (8gb),  centos7 yes, cuda yes, 3 yr, 2xgbe nics, 17.2x35.2x1.7"
  
 Ok, we try this year. Here are some informational pages. Ok, we try this year. Here are some informational pages.
Line 46: Line 79:
 |  Gpus  |  48  |  16  |  36  |  28  |  20  |  34  |  26  |  16  |  28  |  60  | total| |  Gpus  |  48  |  16  |  36  |  28  |  20  |  34  |  26  |  16  |  28  |  60  | total|
 |  Cores  |  209  |  74  |  157  |  72  |  92  |  75  |  67  |  74  |  72  |  138  | cuda K| |  Cores  |  209  |  74  |  157  |  72  |  92  |  75  |  67  |  74  |  72  |  138  | cuda K|
-|  Cores  |  26  |  9  |  20  |  16   11.5  |  10  |  8  |  9  |  9  |  17  | tensor K|+|  Cores  |  26  |  9  |  20  |  8.9   11.5  |  10  |  8  |  9  |  9  |  17  | tensor K|
 |  Tflops  |  21  |  13  |  16  |  7  |  10  |  7.5  |  6.5  |  13  |  7  |  13  | gpu dpfp| |  Tflops  |  21  |  13  |  16  |  7  |  10  |  7.5  |  6.5  |  13  |  7  |  13  | gpu dpfp|
 |  Tflops  |  682  |  261  |  511  |  227  |  326  |  241  |  211  |  261  |  227  |  426  | gpu spfp| |  Tflops  |  682  |  261  |  511  |  227  |  326  |  241  |  211  |  261  |  227  |  426  | gpu spfp|
cluster/184.1569588307.txt.gz · Last modified: 2019/09/27 12:45 by hmeij07