Differences

This shows you the differences between two versions of the page.

--- cluster:184 [2019/09/12 11:51]
hmeij07
+++ cluster:184 [2019/09/27 12:38]
hmeij07 [AWS deploys T4]
@@ Line 1: / Line 1: @@
 \\
 **[[cluster:0|Back]]**
+==== AWS deploys T4 ====
+  * https://www.hpcwire.com/2019/09/20/aws-makes-t4-gpu-instances-broadly-available-for-inferencing-graphics/
+Look at this, the smallest Elastic Cloud Compute Instances are **g4dn.xlarge** yielding access to 4 vCPUs and 1x T4 GPU. The largest is **g4dn.16xlarge** yielding access to 64 vCPUs and 1x T4 GPUs. Now the smallest is priced at $0.526/hr, and running that card 24/7 for a year is a cost of $4,607.76 ... meaning ... option #7 below with 26 GPUs would cost you a whopping $119,802. Annually! That's the low tide water mark.
+The high tide water mark? The largest instance is priced at $4.352 and would cost you near one million dollars to run per year if you matched option #7.
@@ Line 7: / Line 15: @@
 Ok, we try this year. Here are some informational pages.
+  * [[cluster:168|2018 GPU Expansion]] page
   * [[cluster:175|P100 vs GTX & K20]] page
   * [[cluster:181|2019 GPU Models]] page
@@ Line 17: / Line 26: @@
   * CPU model/type determines dpfp/cycle; silver 16, gold 32.
-Criteria for selection (points of discussion raised at last meeting):
+Criteria for selection (points of discussion raised at last meeting 08/27/2019):
-  - Continue with current work load, just more of it (RXT2080ti/RXT4000)
+  - Continue with current work load, just more of it (RTX2080ti/RTX4000)
-  - Do above, but beginners level intro Deep Learning (T4)
+  - Do above, and enable beginners level intro into Deep Learning (T4)
-  - Do above, but invest for future expansion into complex Deep Learning (RXT6000)
+  - Do above, but invest for future expansion into complex Deep Learning (RTX6000)
-//**Pick your option and put it in the shopping cart**//  8-)
+//**Pick your option and put it in the shopping cart**//  8-)\\
+Table best read from the bottom up to assess differences.
 ^  Options  ^^^^^^^^^^^  Notes  ^
@@ Line 30: / Line 40: @@
 |  Cpus  |  12  |  8  |  18  |  14  |  10  |  34  |  26  |  16  |  16  |  12  | total|
 |  Cores  |  96  |  64  |  180  |  140  |  100  |  272  |  208  |  192  |  128  |  72  | physical|
-|  Tflops  |  3.2  |  2.2  |  13.8  |  10.7  |  7.7  |  9.2  |  7  |  13.5  |  4.3  |  2.5  | cpu dpfp|
+|  Tflops  |  3.2  |  2.2  |  13.8  |  10.7  |  7.7  |  9.2  |  7  |  6.8  |  4.3  |  2.5  | cpu dpfp|
 |  Gpus  |  48  |  16  |  36  |  28  |  20  |  34  |  26  |  16  |  28  |  60  | total|
 |  Cores  |  209  |  74  |  157  |  72  |  92  |  75  |  67  |  74  |  72  |  138  | cuda K|
@@ Line 38: / Line 48: @@
 |  $/TFlop  |  138  |  348  |  188  |  423  |  295  |  402  |  466  |  361  |  433  |  232  | gpu dp+sp|
 ^ Per Node  ^^^^^^^^^^^^
-|  Chassis  |  2U(12)  |  2U(8)  |  2U(18)  |  2U(14)  |  2U(10)  |  1U(16)  |  1U(13)  |  4U(32)  |   1U(8) |  4U(24)  | rails?|
+|  Chassis  |  2U(12)  |  2U(8)  |  2U(18)  |  2U(14)  |  2U(10)  |  1U(17)  |  1U(13)  |  4U(32)  |   1U(8) |  4U(24)  | rails?|
 |  CPU  |  2  |  2  |  2  |  2  |  2  |  2  |  2  |  2  |  2  |  2  | total|
 |    |  4208  |  4208  |  5115  |  5115  |  5115  |  4208  |  4208  |  4214  |  4208  |  4208  | model|
-|    |  silver  |  silver  |  gold  |  gold  |  gold  |  silver  |  silver  |  gold  |  silver  |  silver  | type|
+|    |  silver  |  silver  |  gold  |  gold  |  gold  |  silver  |  silver  |  silver  |  silver  |  silver  | type|
 |    |  2x8  |  2x8  |  2x10  |  2x10  |  2x10  |  2x8  |  2x8  |  2x12  |  2x8  |  2x8  | physical|
 |    |  2.1  |  2.1  |  2.4  |  2.4  |  2.4  |  2.1  |  2.1  |  2.2  |  2.1  |  2.1  | Ghz|
@@ Line 55: / Line 65: @@
 |    |  250  |  295  |  250  |  70  |  295  |  160  |  70  |  295  |  70  |  160  | Watts|
 |  Power  |  2200  |  1600  |  1600  |  1600  |  1600  |  1600  |  1600  |  2200  |  1600  |  2000  | Watts|
-|    |  1+1  |  1+1  |  1+1  |  1+1  |  1+1  |  1+1  |  1+1  |  1+1  |  1+1?  |  2+2?  | redundant|
+|    |  1+1  |  1+1  |  1+1  |  1+1  |  1+1  |  1+1  |  1+1  |  1+1  |  1+1  |  2+2  | redundant|
 |  CentOS7  |  n+n  |  n+n  |  y+?  |  y+?  |  y+?  |  y+y  |  y+y  |  y+y  |  n+n  |  n+n  | +cuda?|
-|  Nics  |  2  |  2  |  2  |  2  |  2  |  2  |  2  |  2  |  2?  |  2?  | gigabit|
+|  Nics  |  2  |  2  |  2  |  2  |  2  |  2  |  2  |  2  |  2  |  2  | gigabit|
 |  Warranty  |  3  |  3  |  3  |  3  |  3  |  3  |  3  |  3  |  3  |  3  | standard|
-|    |  -3  |  -6  |  -1  |  -1  |  -5.5  |  0  |  +1.6  |  0  |  +1.5  |  -1  | diff|
+|    |  -3  |  -6  |  -1  |  -1  |  -5.5  |  0  |  +1.6  |  0  |  +1.5  |  -1  |  Δ  |
   * #1/#2 All GPU warranty requests will be filled by GPU maker.
+  * #7 up to 4 GPUs per node. Filling rack leaving 1U open between nodes, count=15
   * #8 fills intended rack with AC in rack. GPU Tower/4U rack mount.
-  * #8 includes NVLink connector (bridge kit). Allows up to 4 GPUs per node with no cooling issues.
+  * #8 includes NVLink connector (bridge kit). Up to 4 GPUs per node.
   * Tariffs may affect all quotes when executed.
   * S&H included (or estimated)
+  * More than 4-6 nodes would be lots of work if Warewulf/CentOS7 imaging is not working.
+On the question of active versus passive cooling:

DokuWiki

User Tools

Site Tools

Differences

Page Tools