\\
**[[cluster:0|Back]]**
==== Netdata ====
We use Zenoss for monitor and alerting the whole HPC. Page can be found here [[cluster:183|Zenoss]]
At PEARC20 conference I became aware of [[https://www.netdata.cloud/|Netdata]] which seems a good tool for our "tails" (login, storage servers for example). Lots of detailed information.
bash <(curl -Ss https://my-netdata.io/kickstart.sh)
Then open port 19999 in firewall for wesleyan.edu.
Gosh, it does alerting too. The original monitor scripts, unsilenced, exist on ''cottontail:/usr/lib/netadata/conf.d/health.d-orig''
* [[http://hpcmon.wesleyan.edu:19999|hpcmon]] Zenoss server
* [[http://cottontail.wesleyan.edu:19999|cottontail]] Primary log in and scheduler server
* [[http://cottontail2.wesleyan.edu:19999|cottontail2]] Backup scheduler, centos 6 compile env
* [[http://greentail52.wesleyan.edu:19999|greentail52]] NFS server /sanscratch, centos 7 compile env
* [[http://ringtail.wesleyan.edu:19999|ringtail]] NFS server /home33
* sharptail
* [[http://mindstoresrv0.wesleyan.edu:19999|mstore0]] NFS server /mindstore
* [[http://mindstoresrv1.wesleyan.edu:19999|mindstoresrv1]] Replication target for mstore0
* [[http://petaltail.wesleyan.edu:19999|petaltail]] Sandbox, Warewulf centos 6
* [[http://swallowtail.wesleyan.edu:19999|swallowtail]] Sandbox
* [[http://whitetail.wesleyan.edu:19999|whitetail]] Openhpc Warewulf centos 7 (powered down)
* [[http://sharptail2dr.wesleyan.edu:19999|sharptail2dr]] Disaster recovery host for hpcstore (active users) /homesdr
==== Silence ====
cd /usr/lib/netdata/conf.d/health.d/
for i in `ls`; do \
perl -pi -e "s/to: sysadmin/to: silent/g" $i; \
perl -pi -e "s/to: webmaster/to: silent/g" $i; \
perl -pi -e "s/to: dba/to: silent/g" $i; \
perl -pi -e "s/to: sitemgr/to: silent/g" $i; \
perl -pi -e "s/to: domainadmin/to: silent/g" $i; \
perl -pi -e "s/to: proxyadmin/to: silent/g" $i; \
done
grep -i to: * | grep -v silent
\\
**[[cluster:0|Back]]**