cluster:196
Netdata
We use Zenoss for monitor and alerting the whole HPC. Page can be found here Zenoss
At PEARC20 conference I became aware of Netdata which seems a good tool for our “tails” (login, storage servers for example). Lots of detailed information.
bash <(curl -Ss https://my-netdata.io/kickstart.sh)
Then open port 19999 in firewall for wesleyan.edu.
Gosh, it does alerting too. The original monitor scripts, unsilenced, exist on cottontail:/usr/lib/netadata/conf.d/health.d-orig
- hpcmon Zenoss server
- cottontail Primary log in and scheduler server
- cottontail2 Backup scheduler, centos 6 compile env
- greentail52 NFS server /sanscratch, centos 7 compile env
- ringtail NFS server /home33
- sharptail
- mstore0 NFS server /mindstore
- mindstoresrv1 Replication target for mstore0
- petaltail Sandbox, Warewulf centos 6
- swallowtail Sandbox
- whitetail Openhpc Warewulf centos 7 (powered down)
- sharptail2dr Disaster recovery host for hpcstore (active users) /homesdr
Silence
cd /usr/lib/netdata/conf.d/health.d/ for i in `ls`; do \ perl -pi -e "s/to: sysadmin/to: silent/g" $i; \ perl -pi -e "s/to: webmaster/to: silent/g" $i; \ perl -pi -e "s/to: dba/to: silent/g" $i; \ perl -pi -e "s/to: sitemgr/to: silent/g" $i; \ perl -pi -e "s/to: domainadmin/to: silent/g" $i; \ perl -pi -e "s/to: proxyadmin/to: silent/g" $i; \ done grep -i to: * | grep -v silent
cluster/196.txt · Last modified: by hmeij07
