User Tools

Site Tools


cluster:221

Warning: Undefined array key 4 in /usr/share/dokuwiki/inc/html.php on line 1453

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
cluster:221 [2023/03/01 11:32]
hmeij07 created
cluster:221 [2023/03/14 09:59] (current)
hmeij07
Line 4: Line 4:
 ==== Infiniband Monitoring ==== ==== Infiniband Monitoring ====
  
-The NVIDIA Firmware Tools (MFT) is a toolset to generate a standard or customized NDIVIA firmware image Querying for firmware information. It is required for ''ibswinfo'' which can monitor unmanaged Infiniband switches. Our new Infiniband switch is a SB7890 EDR. You will also need+The NVIDIA Firmware Tools (MFT) is a toolset to generate a standard or customized NDIVIA firmware image Querying for firmware information. It is required for ''ibswinfo'' which can monitor unmanaged Infiniband switches. Our new Infiniband switch is a **SB7890** EDR. You will also need
  
   * infiniband-diags   * infiniband-diags
Line 58: Line 58:
  python-srpm-macros             noarch          3-43.el8                     appstream           14 k  python-srpm-macros             noarch          3-43.el8                     appstream           14 k
  python3-rpm-macros             noarch          3-43.el8                     appstream           14 k  python3-rpm-macros             noarch          3-43.el8                     appstream           14 k
-python3-rpm-generators-5-7.el8.noarch.rpm+python3-rpm-generators          noarch          5-7.el8                      appstream           14k
  qt5-srpm-macros                noarch          5.15.3-1.el8                 appstream          9.5 k  qt5-srpm-macros                noarch          5.15.3-1.el8                 appstream          9.5 k
  redhat-rpm-config              noarch          130-1.el8                    appstream           89 k  redhat-rpm-config              noarch          130-1.el8                    appstream           89 k
Line 118: Line 118:
 </code> </code>
  
-Ok, so onwards to stage''ibswinfo.sh'' from https://github.com/stanford-rc/ibswinfo+Ok, so onward to stage ''ibswinfo.sh'' from https://github.com/stanford-rc/ibswinfo
  
 Download the script and stage in ''/usr/bin'' Download the script and stage in ''/usr/bin''
Line 198: Line 198:
 </code> </code>
  
 +Under load with full power...
  
 +<code>
 +
 +ibswinfo -d /dev/mst/SW_MT53000_SwitchIB_lid-0x0003
 +=================================================
 +SwitchIB Mellanox Technologies
 +=================================================
 +part number        | MSB7890-ES2F
 +serial number      | MT2239XZ011W
 +product name       | Scorpion2 IB EDR Unmanaged
 +revision           | AC
 +ports              | 36
 +PSID               | MT_2640110032
 +GUID               | 0x900a840300ecde60
 +firmware version   | 15.2008.2102
 +-------------------------------------------------
 +uptime (d-h:m:s)   | 46d-21:14:46
 +-------------------------------------------------
 +PSU0 status        | OK
 +     P/          | MTEF-PSF-AC-I
 +     S/          | MT2238XZ0MYR
 +     DC power      | OK
 +     fan status    | OK
 +     power (W)     | 27    <--- 27+32=59 units rated typical 122, max 162
 +PSU1 status        | OK
 +     P/          | MTEF-PSF-AC-I
 +     S/          | MT2238XZ0MZ2
 +     DC power      | OK
 +     fan status    | OK
 +     power (W)     | 32
 +-------------------------------------------------
 +temperature (C)    | 39    <--- one degree higher
 +max temp (C)       | 45
 +-------------------------------------------------
 +fan status         | OK    <--- speeds about the same
 +fan#1 (rpm)        | 8337
 +fan#2 (rpm)        | 7194
 +fan#3 (rpm)        | 8287
 +fan#4 (rpm)        | 7045
 +fan#5 (rpm)        | 8389
 +fan#6 (rpm)        | 7194
 +fan#7 (rpm)        | 8441
 +fan#8 (rpm)        | 7156
 +-------------------------------------------------
 +
 +</code>
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
  
cluster/221.1677688333.txt.gz ยท Last modified: 2023/03/01 11:32 by hmeij07