User Tools

Site Tools


cluster:227

This is an old revision of the document!



Back

HPC Monitoring

We used to use Zenoss as our health and alerting monitor (Zenoss).

Because of a research project needing quick insight into resource consumations on compute nodes we first quickly installed Ganglia. Not developed anymore but a great tool. You can quickly download centos 8 packages and grab centos 7 packages. For the latter you need to change the yum repo URLs to (and uncomment the mirrorlist URLs)

 baseurl=http://vault.centos.org/centos/$releasever/os/$basearch/

The only change I made obvious to the needed ones was specifying that the agent gmond reports in every 60 seconds (send_metadata interval = 60). I love abstract graphs like this, you know all is humming along in one view. And you can obtain gpu metrics (for centos 7 nodes) finding templates here


Back

cluster/227.1729020614.txt.gz · Last modified: 2024/10/15 19:30 by hmeij07