User Tools

Site Tools


cluster:184

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:184 [2019/09/27 08:36]
hmeij07 [AWS deploys T4]
cluster:184 [2020/01/03 08:22] (current)
hmeij07
Line 1: Line 1:
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
 +
 +==== Turing/Volta/Pascal ====
 +
 +  * https://graphicscardhub.com/turing-vs-volta-v-pascal/
  
 ==== AWS deploys T4 ==== ==== AWS deploys T4 ====
Line 6: Line 10:
   * https://www.hpcwire.com/2019/09/20/aws-makes-t4-gpu-instances-broadly-available-for-inferencing-graphics/   * https://www.hpcwire.com/2019/09/20/aws-makes-t4-gpu-instances-broadly-available-for-inferencing-graphics/
  
-Look at this, the smallest Elastic Cloud Compute Instances are **g4dn.xlarge** yielding access to 4 vCPUs and 1x T4 GPU. The largest is **g4dn.16xlarge** yielding access to 64 vCPUs and 1x T4 GPUs. Now the smallest is priced at $ 0.526/hr, and running that card 24/7 for a year is a cost of $ 4,607.76 ... meaning ... option #7 below with 26 GPUs would cost you a whopping $ 119,802. Annually! That's the low water tide mark. +Look at this, the smallest Elastic Cloud Compute Instances are **g4dn.xlarge** yielding access to 4 vCPUs, 16GiB memory and 1x T4 GPU. The largest is **g4dn.16xlarge** yielding access to 64 vCPUs 256 GiB memory and 1x T4 GPUs. Now the smallest is priced at $0.526/hr, and running that card 24/7 for a year is a cost of $4,607.76 ... meaning ... option #7 below with 26 GPUs would cost you a whopping $119,802. Annually! That's the low tide water mark. 
  
-The largest instance is priced at $ 4.352 and would cost you near one million dollars to run per year if you matched option #7.+The high tide water mark? The largest instance is priced at $4.352 and would cost you near one million dollars to run per year if you matched option #7.
  
 +Rival cloud vendor Google also offers Nvidia T4 GPUs in its cloud; Google announced global availability back in April. Google Cloud’s T4 GPU availability includes three regions each in the U.S. and Asia and one each in South America and Europe. That page mentions a price of "as low as $0.29 per hour per GPU" which translates to $66K per year matching option #7 below. Still. Insane.
 +
 +  * https://www.hpcwire.com/2019/04/30/google-cloud-goes-global-with-nvidia-t4-gpus/
  
 ==== 2019 GPU Expansion ==== ==== 2019 GPU Expansion ====
 +
 +More focus...
 +
 +  * Vendor A: 
 +    * Option 1: 48 gpus, 12 nodes, 24U, each: two 4214 12-core cpus (silver), 96 gb ram, 1tb SSD, four NVIDIA RTX 2080 SUPER 8GB GPU,  centos7 yes, cuda yes, 3 yr, 2x gbe nics, 17.2w 31.5d 3.46h"​ (fits)​
 +
 +With the Deep Learning Ready docker containers...[[cluster:187|NGC Docker Containers]]
 +
 +The SUPER model quote above is what we selected\\
 + --- //[[hmeij@wesleyan.edu|Henk]] 2020/01/03 08:22//
 +
 +
 +Focus on RTX2080 model...
 +
 +  * Vendor A: ​
 +    * Option 1: 48 gpus, 12 nodes, 24U, each: two 4116 12-core cpus (silver), 96 gb ram, 1tb SSD, four rtx2080 gpus (8gb),  centos7 yes, cuda yes, 3 yr, nics?, wxdxh"?
 +    * Option 2: 40 gpus, 10 nodes, 20U, each: two 4116 12-core cpus (silver), 96 gb ram, 1tb SSD, four rtx2080ti gpus (11gb),  centos7 yes, cuda yes, 3 yr, nics?, wxdxh"?
 +    * A1+A2 installed, configured and tested: NGC Docker containers Deep Learning Software Stack: NVIDIA DIGITS, TensorFlow, Caffe, NVIDIA CUDA, PyTorch, RapidsAI, Portainer ... NGC Catalog can be found  at​
 +https://ngc.nvidia.com/catalog/all?orderBy=modifiedDESC&query=&quickFilter=all&filters=​
 +
 +  * Vendor B:​
 +    * Option 1: 36 gpus, 9 nodes, 18U, each: two 4214 12-core cpus (silver), 96 gb ram, 2x960gb SATA, four rtx2080tifsta gpus (11gb),  centos7 no, cuda no, 3 yr, 2xgbe nics, wxdxh"?
 +
 +  * Vendor C:​
 +    * Option 1: 40 gpus, 10 nodes, 40U, each: two 4214 12-core cpus (silver), 96 gb ram, 240 gb SSD, four rtx2080ti gpus (11gb),  centos7 yes, cuda yes, 3 yr, 2xgbe nics, 18.2x26.5x7"
 +    * Option 2: 48 gpus, 12 nodes, 48U, each: two 4214 12-core cpus (silver), 96 gb ram, 240 gb SSD, four rtx2080s gpus (8gb),  centos7 yes, cuda yes, 3 yr, 2xgbe nics, 18.2x26.5x7"
 +
 +  * Vendor D:​
 +    * Option 1: 48 gpus, 12 nodes, 12U, each: two 4214 12-core cpus (silver), 64 gb ram, 2x480gb SATA, four rtx2080s gpus (8gb),  centos7 yes, cuda yes, 3 yr, 2xgbe nics, 17.2x35.2x1.7"
  
 Ok, we try this year. Here are some informational pages. Ok, we try this year. Here are some informational pages.
Line 43: Line 79:
 |  Gpus  |  48  |  16  |  36  |  28  |  20  |  34  |  26  |  16  |  28  |  60  | total| |  Gpus  |  48  |  16  |  36  |  28  |  20  |  34  |  26  |  16  |  28  |  60  | total|
 |  Cores  |  209  |  74  |  157  |  72  |  92  |  75  |  67  |  74  |  72  |  138  | cuda K| |  Cores  |  209  |  74  |  157  |  72  |  92  |  75  |  67  |  74  |  72  |  138  | cuda K|
-|  Cores  |  26  |  9  |  20  |  16   11.5  |  10  |  8  |  9  |  9  |  17  | tensor K|+|  Cores  |  26  |  9  |  20  |  8.9   11.5  |  10  |  8  |  9  |  9  |  17  | tensor K|
 |  Tflops  |  21  |  13  |  16  |  7  |  10  |  7.5  |  6.5  |  13  |  7  |  13  | gpu dpfp| |  Tflops  |  21  |  13  |  16  |  7  |  10  |  7.5  |  6.5  |  13  |  7  |  13  | gpu dpfp|
 |  Tflops  |  682  |  261  |  511  |  227  |  326  |  241  |  211  |  261  |  227  |  426  | gpu spfp| |  Tflops  |  682  |  261  |  511  |  227  |  326  |  241  |  211  |  261  |  227  |  426  | gpu spfp|
cluster/184.1569587813.txt.gz · Last modified: 2019/09/27 08:36 by hmeij07