cluster:227
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| cluster:227 [2024/10/15 19:21] – hmeij07 | cluster:227 [2024/10/23 12:16] (current) – hmeij07 | ||
|---|---|---|---|
| Line 6: | Line 6: | ||
| We used to use Zenoss as our health and alerting monitor ([[cluster: | We used to use Zenoss as our health and alerting monitor ([[cluster: | ||
| - | Because of a research project needing quick insight into resource consumations on compute nodes we first quickly installed Ganglia. | + | Because of a research project needing quick insight into resource consumations on compute nodes we first quickly installed Ganglia. |
| < | < | ||
| | | ||
| </ | </ | ||
| + | |||
| + | The only change I made obvious to the needed ones was specifying that the agent '' | ||
| + | |||
| + | * https:// | ||
| + | |||
| + | Here is what it looks like (either select Grid > Wesleyan HPC > Server or after selecting Wesleyan HPC scroll down the page to view all nodes and pick a metric. | ||
| + | |||
| + | * http:// | ||
| + | |||
| + | {{: | ||
| + | |||
| + | But Ganglia does not provide for alerting so we added **Zabbix**. | ||
| + | |||
| + | We set up agent monitoring using Zabbix Agent (both centos 7 and 8 - centos or rocky) and added the gpu templates from these links. The XML loads as Template on the zabbix_server, | ||
| + | |||
| + | * setup data collection with Zabbix agent, setup monitoring with Zabbix agent | ||
| + | * enable discovery on both with 192.168.102.1-254 | ||
| + | * https:// | ||
| + | * https:// | ||
| + | |||
| + | And that looks like this | ||
| + | |||
| + | * http:// | ||
| + | |||
| + | Log in as guest. Then you can go to " | ||
cluster/227.1729020079.txt.gz · Last modified: by hmeij07
