User Tools

Site Tools


cluster:221

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:221 [2023/03/01 11:36]
hmeij07
cluster:221 [2023/03/14 09:59] (current)
hmeij07
Line 4: Line 4:
 ==== Infiniband Monitoring ==== ==== Infiniband Monitoring ====
  
-The NVIDIA Firmware Tools (MFT) is a toolset to generate a standard or customized NDIVIA firmware image Querying for firmware information. It is required for ''ibswinfo'' which can monitor unmanaged Infiniband switches. Our new Infiniband switch is a SB7890 EDR. You will also need+The NVIDIA Firmware Tools (MFT) is a toolset to generate a standard or customized NDIVIA firmware image Querying for firmware information. It is required for ''ibswinfo'' which can monitor unmanaged Infiniband switches. Our new Infiniband switch is a **SB7890** EDR. You will also need
  
   * infiniband-diags   * infiniband-diags
Line 118: Line 118:
 </code> </code>
  
-Ok, so onwards to stage ''ibswinfo.sh'' from https://github.com/stanford-rc/ibswinfo+Ok, so onward to stage ''ibswinfo.sh'' from https://github.com/stanford-rc/ibswinfo
  
 Download the script and stage in ''/usr/bin'' Download the script and stage in ''/usr/bin''
Line 198: Line 198:
 </code> </code>
  
 +Under load with full power...
  
 +<code>
 +
 +ibswinfo -d /dev/mst/SW_MT53000_SwitchIB_lid-0x0003
 +=================================================
 +SwitchIB Mellanox Technologies
 +=================================================
 +part number        | MSB7890-ES2F
 +serial number      | MT2239XZ011W
 +product name       | Scorpion2 IB EDR Unmanaged
 +revision           | AC
 +ports              | 36
 +PSID               | MT_2640110032
 +GUID               | 0x900a840300ecde60
 +firmware version   | 15.2008.2102
 +-------------------------------------------------
 +uptime (d-h:m:s)   | 46d-21:14:46
 +-------------------------------------------------
 +PSU0 status        | OK
 +     P/          | MTEF-PSF-AC-I
 +     S/          | MT2238XZ0MYR
 +     DC power      | OK
 +     fan status    | OK
 +     power (W)     | 27    <--- 27+32=59 units rated typical 122, max 162
 +PSU1 status        | OK
 +     P/          | MTEF-PSF-AC-I
 +     S/          | MT2238XZ0MZ2
 +     DC power      | OK
 +     fan status    | OK
 +     power (W)     | 32
 +-------------------------------------------------
 +temperature (C)    | 39    <--- one degree higher
 +max temp (C)       | 45
 +-------------------------------------------------
 +fan status         | OK    <--- speeds about the same
 +fan#1 (rpm)        | 8337
 +fan#2 (rpm)        | 7194
 +fan#3 (rpm)        | 8287
 +fan#4 (rpm)        | 7045
 +fan#5 (rpm)        | 8389
 +fan#6 (rpm)        | 7194
 +fan#7 (rpm)        | 8441
 +fan#8 (rpm)        | 7156
 +-------------------------------------------------
 +
 +</code>
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
  
cluster/221.1677688572.txt.gz ยท Last modified: 2023/03/01 11:36 by hmeij07