cluster:68 [DokuWiki]

cluster:68

Table of Contents

RTM
Graph Tab
Grid Tab

Back

RTM

This is a collection of interesting graphs generated by the Real Time Monitoring tool Platform is developing. Data covers our evaluation period.

What is RTM ? RTM is used to monitor and graph LSF resources (including networks, disks, applications, etc.) in a cluster, or multiple clusters. In graph or table formats, RTM displays resource-related information such as the number of jobs submitted, the details of individual jobs (like load average, cpu usage, job owner), or the hosts on which the jobs ran. RTM also display globbal statistics per cluster, per user, per queue etc.

RTM Using Guide

Accessing RTM, login in as guest/guest

Graph Tab

Tree: Cluster Swallowtail-> Leaf: Cluster Overview

<hi #dda0dd>Cluster Level Statistics</hi>

Viewing Graph 'LSF 62 - GRID Available Memory'

Viewing Graph 'LSF 62 - GRID IO Levels'

Viewing Graph 'LSF 62 - GRID Load Average'

Viewing Graph 'LSF 62 - GRID CPU Utilization'

Viewing Graph 'LSF 62 - GRID Job Statistics'

Viewing Graph 'LSF 62 - GridJobs Collection Stats'

Graphs → Tree Mode → LSF 62 - Overall Job Efficiency

Graphs → Tree Mode → LSF 62 - Pending Jobs

Tree: Cluster Swallowtail-> Leaf: CPU Capacity

<hi #dda0dd>Queue Level Statistics</hi>

Viewing Graph 'LSF 62 - imw_nodes - CPU Capacity'

Viewing Graph 'LSF 62 - elw_nodes - CPU Capacity'

Viewing Graph 'LSF 62 - emw_nodes - CPU Capacity'

Viewing Graph 'LSF 62 - ehw_nodes - CPU Capacity'

Viewing Graph 'LSF 62 - ehwfd_nodes - CPU Capacity'

Tree: Cluster Swallowtail-> Leaf: CPU Utilization

<hi #dda0dd>Queue Level Statistics</hi>

Viewing Graph 'LSF 62 - imw_nodes - CPU Utilization'

Viewing Graph 'LSF 62 - elw_nodes - CPU Utilization'

Viewing Graph 'LSF 62 - emw_nodes - CPU Utilization'

Viewing Graph 'LSF 62 - ehw_nodes - CPU Utilization'

Viewing Graph 'LSF 62 - ehwfd_nodes - CPU Utilization'

Tree: Cluster Swallowtail-> Leaf: Slot Utilization

<hi #dda0dd>Queue Level Statistics</hi>

Viewing Graph 'LSF 62 - imw_nodes - Slot Utilization'

Viewing Graph 'LSF 62 - elw_nodes - Slot Utilization'

Graphs → Tree Mode → LSF 62 - emw_nodes - Slot Utilization

Viewing Graph 'LSF 62 - ehw_nodes - Slot Utilization'

Graphs → Tree Mode → LSF 62 - ehwfd_nodes - Slot Utilization

Tree: Cluster Swallowtail-> Leaf: Job Info

<hi #dda0dd>Queue Level Statistics</hi>

Viewing Graph 'LSF 62 - imw - Job Details'

Viewing Graph 'LSF 62 - elw - Job Details'

Viewing Graph 'LSF 62 - emw - Job Details'

Viewing Graph 'LSF 62 - ehw - Job Details'

Viewing Graph 'LSF 62 - ehwfd - Job Details'

Tree: Cluster Swallowtail-> Leaf: Pending

<hi #dda0dd>Queue Level Statistics</hi>

Viewing Graph 'LSF 62 - imw - Queue Pending Times'

Viewing Graph 'LSF 62 - elw - Queue Pending Times'

Viewing Graph 'LSF 62 - emw - Queue Pending Times'

Graphs → Tree Mode → LSF 62 - ehw - Queue Pending Times

Graphs → Tree Mode → LSF 62 - ehwfd - Queue Pending Times

Tree: Hosts-> Host: head node

<hi #dda0dd>Single Computer Level Statistics</hi>

Viewing Graph 'head node - Memory Usage'

Viewing Graph 'head node - Load Average'

Viewing Graph 'head node - Processes'

Tree: Compute Hosts-> Host: "compute node name"

<hi #dda0dd>Single Computer Level Statistics</hi>

… example …

Viewing Graph 'nfs-2-2 - GRID Available Memory'

Viewing Graph 'nfs-2-2 - GRID CPU Utilization'

Viewing Graph 'nfs-2-2 - GRID IO Levels'

Viewing Graph 'nfs-2-2 - GRID Job Statistics'

Viewing Graph 'nfs-2-2 - GRID Load Average'

Grid Tab

These are the examples of the tabular global statistics for currently running jobs.

Queue Level Stats

User Level Stats

Cluster Level Stats

Dashboard

That's it.

Back

cluster/68.txt · Last modified: 2008/08/19 17:34 (external edit)