Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

AMI is instrumented extensively with various Prometheus metrics which are pushed to Grafana to allow users to monitor the performance of AMI. Additionally, user graphs are stored in Prometheus and can be pulled and executed offline to investigate performance issues.

Prometheus Metrics

NameTypeLabelsDescription
ami_event_countCounterhutch, type, processCounts occurrences of different types of events
ami_event_time_secsGaugehutch, type, processMeasures elapsed time of different types of events
ami_event_size_bytesGaugehutch, processMeasures size of data sent over ZMQ sockets
ami_event_latency_secsGaugehutch, sender, processMeasures time it takes to send data over ZMQ.
ami_graphInfohutch, nameJSON string of AMI client graphs.
ami_graph_versionGaugehutch, nameVersion number of graph.

Grafana Monitoring

  • Datagram Time: the average time it takes to process a single event in the heartbeat (only measured by workers)
  • Datagrams per second: the rate of datagrams seen by each worker.

...

  • Graph Version: increments when pushing a new version of the graph to AMI
  • TranisitionsTransitions: number of psana transitions seen by each process

Retrieving Client Graphs from Prometheus


Code Block
languagepy
linenumberstrue
from prometheus_api_client import PrometheusConnect
import datetime as dt
import pandas as pd

prom = PrometheusConnect(url="http://psmetric03:9090", disable_ssl=True)
prom.all_metrics()
label = {'hutch': 'local', 'name': 'graph'}
data = prom.get_metric_range_data("ami_graph_info",
                           label_config=label,
                           start_time=(dt.datetime.now() - dt.timedelta(hours=8)),
                           end_time=dt.datetime.now())

metrics = list(map(lambda i: i.get('metric'), data))
df = pd.DataFrame(metrics)

row = df[df['version'] == '3']
with open('dump.fc', 'w') as f:
    f.writelines(row.graph)