Differences

This shows you the differences between two versions of the page.

--- cluster:184 [2019/09/07 14:08]
hmeij07 [Summary]
+++ cluster:184 [2019/09/12 11:51]
hmeij07
@@ Line 11: / Line 11: @@
   * [[cluster:182|P100 vs RTX 6000 & T4]] page
-^  Option  ^  #1  ^  #2  ^  #3  ^  #4  ^  #5  ^  #6  ^  #7  ^  #8  ^    |
-|  Nodes  |    |    |    |    |    |    |    |    |  U  |
+  * All GPU cards able to do single and double precision (fp64/fp32), "mixed mode"
-|  Cpus  |    |    |    |    |    |    |    |    |    |
+  * Tensor cores are 4 single precision cores able to return double precision results
-|  Cores  |    |    |    |    |    |    |    |    |  physical  |
+  * GPU cards performance on double precision depends on the quantity of tensors
-|  Gpus  |    |    |    |    |    |    |    |    |    |
+  * CPU model/type determines dpfp/cycle; silver 16, gold 32.
-|  Cores  |    |    |    |    |    |    |    |    |  cuda  |
-|  Cores  |    |    |    |    |    |    |    |    |  tensor  |
+Criteria for selection (points of discussion raised at last meeting):
-|  Tflops  |    |    |    |    |    |    |    |    |  cpu dpfp  |
+  - Continue with current work load, just more of it (RXT2080ti/RXT4000)
-|  Tflops  |    |    |    |    |    |    |    |    |  gpu spfp  |
+  - Do above, but beginners level intro Deep Learning (T4)
+  - Do above, but invest for future expansion into complex Deep Learning (RXT6000)
+//**Pick your option and put it in the shopping cart**//  8-)
+^  Options  ^^^^^^^^^^^  Notes  ^
+^    ^  #1  ^  #2  ^  #3  ^  #4  ^  #5  ^  #6  ^  #7  ^  #8  ^  #9  ^  #10  ^    ^
+^    ^  rtx2080ti  ^  rtx6000  ^  rtx2080ti  ^  t4  ^  rtx6000  ^  rtx4000  ^  t4  ^  rtx6000  ^  t4  ^  rtx4000  ^    ^
+|  Nodes  |  6  |  4  |  9  |  7  |  5  |  17  |  13  |  8  |  8  |  6  | total|
+|  Cpus  |  12  |  8  |  18  |  14  |  10  |  34  |  26  |  16  |  16  |  12  | total|
+|  Cores  |  96  |  64  |  180  |  140  |  100  |  272  |  208  |  192  |  128  |  72  | physical|
+|  Tflops  |  3.2  |  2.2  |  13.8  |  10.7  |  7.7  |  9.2  |  7  |  13.5  |  4.3  |  2.5  | cpu dpfp|
+|  Gpus  |  48  |  16  |  36  |  28  |  20  |  34  |  26  |  16  |  28  |  60  | total|
+|  Cores  |  209  |  74  |  157  |  72  |  92  |  75  |  67  |  74  |  72  |  138  | cuda K|
+|  Cores  |  26  |  9  |  20  |  16  |  11.5  |  10  |  8  |  9  |  9  |  17  | tensor K|
+|  Tflops  |  21  |  13  |  16  |  7  |  10  |  7.5  |  6.5  |  13  |  7  |  13  | gpu dpfp|
+|  Tflops  |  682  |  261  |  511  |  227  |  326  |  241  |  211  |  261  |  227  |  426  | gpu spfp|
+|  $/TFlop  |  138  |  348  |  188  |  423  |  295  |  402  |  466  |  361  |  433  |  232  | gpu dp+sp|
+^ Per Node  ^^^^^^^^^^^^
+|  Chassis  |  2U(12)  |  2U(8)  |  2U(18)  |  2U(14)  |  2U(10)  |  1U(16)  |  1U(13)  |  4U(32)  |   1U(8) |  4U(24)  | rails?|
+|  CPU  |  2  |  2  |  2  |  2  |  2  |  2  |  2  |  2  |  2  |  2  | total|
+|    |  4208  |  4208  |  5115  |  5115  |  5115  |  4208  |  4208  |  4214  |  4208  |  4208  | model|
+|    |  silver  |  silver  |  gold  |  gold  |  gold  |  silver  |  silver  |  gold  |  silver  |  silver  | type|
+|    |  2x8  |  2x8  |  2x10  |  2x10  |  2x10  |  2x8  |  2x8  |  2x12  |  2x8  |  2x8  | physical|
+|    |  2.1  |  2.1  |  2.4  |  2.4  |  2.4  |  2.1  |  2.1  |  2.2  |  2.1  |  2.1  | Ghz|
+|    |  85  |  85  |  85  |  85  |  85  |  85  |  85  |  85  |  85  |  85  | Watts|
+|  DDR4  |  192  |  192  |  192  |  192  |  192  |  192  |  192  |  192  |  192  |  192  | GB mem|
+|    |  2933  |  2933  |  2266  |  2666  |  2666  |  2666  |  2666  |  2933  |  2933  |  2666  | Mhz|
+|  Drives  |  2x960  |  2x960  |  960  |  960  |  960  |  240  |  240  |  240  |  240  |  240  | GB|
+|    |  2.5  |  2.5  |  2.5  |  2.5  |  2.5  |  2.5  |  2.5  |  2.5  |  2.5  |  2.5  | SSD/HDD|
+|  GPU  |  8  |  4  |  4  |  4  |  4  |  2  |  2  |  2  |  4  |  10  | total|
+|    |  RTX  |  RTX  |  RTX  |  T  |  RTX  |  RTX  |  T  |  RTX  |  T  |  RTX  | arch|
+|    |  2080ti  |  6000  |  2080ti  |  4  |  6000  |  4000  |  4  |  6000  |  4  |  4000  | model|
+|    |  11  |  24  |  11  |  16  |  24  |  8  |  16  |  24  |  16  |  8  | GB mem|
+|    |  250  |  295  |  250  |  70  |  295  |  160  |  70  |  295  |  70  |  160  | Watts|
+|  Power  |  2200  |  1600  |  1600  |  1600  |  1600  |  1600  |  1600  |  2200  |  1600  |  2000  | Watts|
+|    |  1+1  |  1+1  |  1+1  |  1+1  |  1+1  |  1+1  |  1+1  |  1+1  |  1+1?  |  2+2?  | redundant|
+|  CentOS7  |  n+n  |  n+n  |  y+?  |  y+?  |  y+?  |  y+y  |  y+y  |  y+y  |  n+n  |  n+n  | +cuda?|
+|  Nics  |  2  |  2  |  2  |  2  |  2  |  2  |  2  |  2  |  2?  |  2?  | gigabit|
+|  Warranty  |  3  |  3  |  3  |  3  |  3  |  3  |  3  |  3  |  3  |  3  | standard|
+|    |  -3  |  -6  |  -1  |  -1  |  -5.5  |  0  |  +1.6  |  0  |  +1.5  |  -1  | diff|
+  * #1/#2 All GPU warranty requests will be filled by GPU maker.
+  * #8 fills intended rack with AC in rack. GPU Tower/4U rack mount.
+  * #8 includes NVLink connector (bridge kit). Allows up to 4 GPUs per node with no cooling issues.
+  * Tariffs may affect all quotes when executed.
+  * S&H included (or estimated)
 **Exxactcorp**: For the GPU discussion, 2 to 4 GPUs per node is fine. T4 GPU is 100% fine , and the passive heatsink is better not worse. The system needs to be one that supports passive Tesla cards and the chassis fans would simply ramp to cool the card properly, as in any passive tesla situation.  Titan RTX GPUs is what you should be worried about, and I would be hesitant to quote them. They are *NOT GOOD* for multi GPU systems.

DokuWiki

User Tools

Site Tools

Differences

Page Tools