Differences

This shows you the differences between two versions of the page.

--- cluster:184 [2019/09/07 14:08]
hmeij07 [Summary]
+++ cluster:184 [2019/09/27 12:35]
hmeij07 [AWS deploys T4]
@@ Line 1: / Line 1: @@
 \\
 **[[cluster:0|Back]]**
+==== AWS deploys T4 ====
+  * https://www.hpcwire.com/2019/09/20/aws-makes-t4-gpu-instances-broadly-available-for-inferencing-graphics/
+Look at this, the smallest Elastic Cloud Compute Instances are **g4dn.xlarge** yielding access to 4 vCPUs and 1x T4 GPU. The largest is **g4dn.12xlarge** yielding access to 48 vCPUs and 4 x T4 GPUs. Now the smallest is priced at $ 0.526/hr, and running that card 24/7 for a year is a cost of $ 4,607.76 ... meaning ... option #7 below with 26 GPUs would cost you a whopping $ 119,802. Annually! That's the low water tide mark.
+The largest instance is priced at $ 4.352 and would cost you near one million dollars to run per year if you matched option #7.
@@ Line 7: / Line 15: @@
 Ok, we try this year. Here are some informational pages.
+  * [[cluster:168|2018 GPU Expansion]] page
   * [[cluster:175|P100 vs GTX & K20]] page
   * [[cluster:181|2019 GPU Models]] page
   * [[cluster:182|P100 vs RTX 6000 & T4]] page
-^  Option  ^  #1  ^  #2  ^  #3  ^  #4  ^  #5  ^  #6  ^  #7  ^  #8  ^    |
-|  Nodes  |    |    |    |    |    |    |    |    |  U  |
+  * All GPU cards able to do single and double precision (fp64/fp32), "mixed mode"
-|  Cpus  |    |    |    |    |    |    |    |    |    |
+  * Tensor cores are 4 single precision cores able to return double precision results
-|  Cores  |    |    |    |    |    |    |    |    |  physical  |
+  * GPU cards performance on double precision depends on the quantity of tensors
-|  Gpus  |    |    |    |    |    |    |    |    |    |
+  * CPU model/type determines dpfp/cycle; silver 16, gold 32.
-|  Cores  |    |    |    |    |    |    |    |    |  cuda  |
-|  Cores  |    |    |    |    |    |    |    |    |  tensor  |
+Criteria for selection (points of discussion raised at last meeting 08/27/2019):
-|  Tflops  |    |    |    |    |    |    |    |    |  cpu dpfp  |
+  - Continue with current work load, just more of it (RTX2080ti/RTX4000)
-|  Tflops  |    |    |    |    |    |    |    |    |  gpu spfp  |
+  - Do above, and enable beginners level intro into Deep Learning (T4)
+  - Do above, but invest for future expansion into complex Deep Learning (RTX6000)
+//**Pick your option and put it in the shopping cart**//  8-)\\
+Table best read from the bottom up to assess differences.
+^  Options  ^^^^^^^^^^^  Notes  ^
+^    ^  #1  ^  #2  ^  #3  ^  #4  ^  #5  ^  #6  ^  #7  ^  #8  ^  #9  ^  #10  ^    ^
+^    ^  rtx2080ti  ^  rtx6000  ^  rtx2080ti  ^  t4  ^  rtx6000  ^  rtx4000  ^  t4  ^  rtx6000  ^  t4  ^  rtx4000  ^    ^
+|  Nodes  |  6  |  4  |  9  |  7  |  5  |  17  |  13  |  8  |  8  |  6  | total|
+|  Cpus  |  12  |  8  |  18  |  14  |  10  |  34  |  26  |  16  |  16  |  12  | total|
+|  Cores  |  96  |  64  |  180  |  140  |  100  |  272  |  208  |  192  |  128  |  72  | physical|
+|  Tflops  |  3.2  |  2.2  |  13.8  |  10.7  |  7.7  |  9.2  |  7  |  6.8  |  4.3  |  2.5  | cpu dpfp|
+|  Gpus  |  48  |  16  |  36  |  28  |  20  |  34  |  26  |  16  |  28  |  60  | total|
+|  Cores  |  209  |  74  |  157  |  72  |  92  |  75  |  67  |  74  |  72  |  138  | cuda K|
+|  Cores  |  26  |  9  |  20  |  16  |  11.5  |  10  |  8  |  9  |  9  |  17  | tensor K|
+|  Tflops  |  21  |  13  |  16  |  7  |  10  |  7.5  |  6.5  |  13  |  7  |  13  | gpu dpfp|
+|  Tflops  |  682  |  261  |  511  |  227  |  326  |  241  |  211  |  261  |  227  |  426  | gpu spfp|
+|  $/TFlop  |  138  |  348  |  188  |  423  |  295  |  402  |  466  |  361  |  433  |  232  | gpu dp+sp|
+^ Per Node  ^^^^^^^^^^^^
+|  Chassis  |  2U(12)  |  2U(8)  |  2U(18)  |  2U(14)  |  2U(10)  |  1U(17)  |  1U(13)  |  4U(32)  |   1U(8) |  4U(24)  | rails?|
+|  CPU  |  2  |  2  |  2  |  2  |  2  |  2  |  2  |  2  |  2  |  2  | total|
+|    |  4208  |  4208  |  5115  |  5115  |  5115  |  4208  |  4208  |  4214  |  4208  |  4208  | model|
+|    |  silver  |  silver  |  gold  |  gold  |  gold  |  silver  |  silver  |  silver  |  silver  |  silver  | type|
+|    |  2x8  |  2x8  |  2x10  |  2x10  |  2x10  |  2x8  |  2x8  |  2x12  |  2x8  |  2x8  | physical|
+|    |  2.1  |  2.1  |  2.4  |  2.4  |  2.4  |  2.1  |  2.1  |  2.2  |  2.1  |  2.1  | Ghz|
+|    |  85  |  85  |  85  |  85  |  85  |  85  |  85  |  85  |  85  |  85  | Watts|
+|  DDR4  |  192  |  192  |  192  |  192  |  192  |  192  |  192  |  192  |  192  |  192  | GB mem|
+|    |  2933  |  2933  |  2266  |  2666  |  2666  |  2666  |  2666  |  2933  |  2933  |  2666  | Mhz|
+|  Drives  |  2x960  |  2x960  |  960  |  960  |  960  |  240  |  240  |  240  |  240  |  240  | GB|
+|    |  2.5  |  2.5  |  2.5  |  2.5  |  2.5  |  2.5  |  2.5  |  2.5  |  2.5  |  2.5  | SSD/HDD|
+|  GPU  |  8  |  4  |  4  |  4  |  4  |  2  |  2  |  2  |  4  |  10  | total|
+|    |  RTX  |  RTX  |  RTX  |  T  |  RTX  |  RTX  |  T  |  RTX  |  T  |  RTX  | arch|
+|    |  2080ti  |  6000  |  2080ti  |  4  |  6000  |  4000  |  4  |  6000  |  4  |  4000  | model|
+|    |  11  |  24  |  11  |  16  |  24  |  8  |  16  |  24  |  16  |  8  | GB mem|
+|    |  250  |  295  |  250  |  70  |  295  |  160  |  70  |  295  |  70  |  160  | Watts|
+|  Power  |  2200  |  1600  |  1600  |  1600  |  1600  |  1600  |  1600  |  2200  |  1600  |  2000  | Watts|
+|    |  1+1  |  1+1  |  1+1  |  1+1  |  1+1  |  1+1  |  1+1  |  1+1  |  1+1  |  2+2  | redundant|
+|  CentOS7  |  n+n  |  n+n  |  y+?  |  y+?  |  y+?  |  y+y  |  y+y  |  y+y  |  n+n  |  n+n  | +cuda?|
+|  Nics  |  2  |  2  |  2  |  2  |  2  |  2  |  2  |  2  |  2  |  2  | gigabit|
+|  Warranty  |  3  |  3  |  3  |  3  |  3  |  3  |  3  |  3  |  3  |  3  | standard|
+|    |  -3  |  -6  |  -1  |  -1  |  -5.5  |  0  |  +1.6  |  0  |  +1.5  |  -1  |  Δ  |
+  * #1/#2 All GPU warranty requests will be filled by GPU maker.
+  * #7 up to 4 GPUs per node. Filling rack leaving 1U open between nodes, count=15
+  * #8 fills intended rack with AC in rack. GPU Tower/4U rack mount.
+  * #8 includes NVLink connector (bridge kit). Up to 4 GPUs per node.
+  * Tariffs may affect all quotes when executed.
+  * S&H included (or estimated)
+  * More than 4-6 nodes would be lots of work if Warewulf/CentOS7 imaging is not working.
+On the question of active versus passive cooling:
 **Exxactcorp**: For the GPU discussion, 2 to 4 GPUs per node is fine. T4 GPU is 100% fine , and the passive heatsink is better not worse. The system needs to be one that supports passive Tesla cards and the chassis fans would simply ramp to cool the card properly, as in any passive tesla situation.  Titan RTX GPUs is what you should be worried about, and I would be hesitant to quote them. They are *NOT GOOD* for multi GPU systems.

DokuWiki

User Tools

Site Tools

Differences

Page Tools