User Tools

Site Tools


cluster:68


Back

RTM

This is a collection of interesting graphs generated by the Real Time Monitoring tool Platform is developing. Data covers our evaluation period.

What is RTM ? RTM is used to monitor and graph LSF resources (including networks, disks, applications, etc.) in a cluster, or multiple clusters. In graph or table formats, RTM displays resource-related information such as the number of jobs submitted, the details of individual jobs (like load average, cpu usage, job owner), or the hosts on which the jobs ran. RTM also display globbal statistics per cluster, per user, per queue etc.

RTM Using Guide

Accessing RTM, login in as guest/guest

Graph Tab

Tree: Cluster Swallowtail-> Leaf: Cluster Overview

<hi #dda0dd>Cluster Level Statistics</hi>

Viewing Graph 'LSF 62 - GRID Available Memory'
 Monthly (2 Hour Average)
Viewing Graph 'LSF 62 - GRID IO Levels'
 Weekly (30 Minute Average)
Viewing Graph 'LSF 62 - GRID Load Average'
 Monthly (2 Hour Average)
Viewing Graph 'LSF 62 - GRID CPU Utilization'
 Monthly (2 Hour Average)
Viewing Graph 'LSF 62 - GRID Job Statistics'
 Monthly (2 Hour Average)
Viewing Graph 'LSF 62 - GridJobs Collection Stats'
 Monthly (2 Hour Average)
Graphs → Tree Mode → LSF 62 - Overall Job Efficiency
 Monthly (2 Hour Average)
Graphs → Tree Mode → LSF 62 - Pending Jobs
 Monthly (2 Hour Average)

Tree: Cluster Swallowtail-> Leaf: CPU Capacity

<hi #dda0dd>Queue Level Statistics</hi>

Viewing Graph 'LSF 62 - imw_nodes - CPU Capacity'
 Monthly (2 Hour Average)
Viewing Graph 'LSF 62 - elw_nodes - CPU Capacity'
 Monthly (2 Hour Average)
Viewing Graph 'LSF 62 - emw_nodes - CPU Capacity'
 Monthly (2 Hour Average)
Viewing Graph 'LSF 62 - ehw_nodes - CPU Capacity'
 Monthly (2 Hour Average)
Viewing Graph 'LSF 62 - ehwfd_nodes - CPU Capacity'
 Monthly (2 Hour Average)

Tree: Cluster Swallowtail-> Leaf: CPU Utilization

<hi #dda0dd>Queue Level Statistics</hi>

Viewing Graph 'LSF 62 - imw_nodes - CPU Utilization'
 Monthly (2 Hour Average)
Viewing Graph 'LSF 62 - elw_nodes - CPU Utilization'
 Monthly (2 Hour Average)
Viewing Graph 'LSF 62 - emw_nodes - CPU Utilization'
 Monthly (2 Hour Average)
Viewing Graph 'LSF 62 - ehw_nodes - CPU Utilization'
 Monthly (2 Hour Average)
Viewing Graph 'LSF 62 - ehwfd_nodes - CPU Utilization'
 Monthly (2 Hour Average)

Tree: Cluster Swallowtail-> Leaf: Slot Utilization

<hi #dda0dd>Queue Level Statistics</hi>

Viewing Graph 'LSF 62 - imw_nodes - Slot Utilization'
 Monthly (2 Hour Average)
Viewing Graph 'LSF 62 - elw_nodes - Slot Utilization'
 Monthly (2 Hour Average)
Graphs → Tree Mode → LSF 62 - emw_nodes - Slot Utilization
 Monthly (2 Hour Average)
Viewing Graph 'LSF 62 - ehw_nodes - Slot Utilization'
 Monthly (2 Hour Average)
Graphs → Tree Mode → LSF 62 - ehwfd_nodes - Slot Utilization
 Monthly (2 Hour Average)

Tree: Cluster Swallowtail-> Leaf: Job Info

<hi #dda0dd>Queue Level Statistics</hi>

Viewing Graph 'LSF 62 - imw - Job Details'
 Monthly (2 Hour Average)
Viewing Graph 'LSF 62 - elw - Job Details'
 Monthly (2 Hour Average)
Viewing Graph 'LSF 62 - emw - Job Details'
 Monthly (2 Hour Average)
Viewing Graph 'LSF 62 - ehw - Job Details'
 Monthly (2 Hour Average)
Viewing Graph 'LSF 62 - ehwfd - Job Details'
 Monthly (2 Hour Average)

Tree: Cluster Swallowtail-> Leaf: Pending

<hi #dda0dd>Queue Level Statistics</hi>

Viewing Graph 'LSF 62 - imw - Queue Pending Times'
 Monthly (2 Hour Average)
Viewing Graph 'LSF 62 - elw - Queue Pending Times'
 Monthly (2 Hour Average)
Viewing Graph 'LSF 62 - emw - Queue Pending Times'
 Monthly (2 Hour Average)
Graphs → Tree Mode → LSF 62 - ehw - Queue Pending Times
 Monthly (2 Hour Average)
Graphs → Tree Mode → LSF 62 - ehwfd - Queue Pending Times
 Monthly (2 Hour Average)

Tree: Hosts-> Host: head node

<hi #dda0dd>Single Computer Level Statistics</hi>

Viewing Graph 'head node - Memory Usage'
 Monthly (2 Hour Average)
Viewing Graph 'head node - Load Average'
 Monthly (2 Hour Average)
Viewing Graph 'head node - Processes'
 Monthly (2 Hour Average)

Tree: Compute Hosts-> Host: "compute node name"

<hi #dda0dd>Single Computer Level Statistics</hi>

… example …

Viewing Graph 'nfs-2-2 - GRID Available Memory'
 Monthly (2 Hour Average)
Viewing Graph 'nfs-2-2 - GRID CPU Utilization'
 Monthly (2 Hour Average)
Viewing Graph 'nfs-2-2 - GRID IO Levels'
 Monthly (2 Hour Average)
Viewing Graph 'nfs-2-2 - GRID Job Statistics'
 Monthly (2 Hour Average)
Viewing Graph 'nfs-2-2 - GRID Load Average'
 Monthly (2 Hour Average)

Grid Tab

These are the examples of the tabular global statistics for currently running jobs.

Queue Level Stats

User Level Stats

Cluster Level Stats

Dashboard

That's it.


Back

cluster/68.txt · Last modified: 2008/08/19 13:34 (external edit)