In addition to the command line interface, the accelerator message logs can be queried from Grafana.


To begin, go to this link and sign in using your SLAC account: https://grafana.slac.stanford.edu


Viewing the Log

The following link will take you to Grafana's "explore" functionality with a query for viewing the message log prepopulated:

https://grafana.slac.stanford.edu/goto/UomQAWpSg?orgId=1

That link should look like the following:


Some items of note here, the drop down in the top left has been set to "Loki". Loki is the log aggregation and storage technology where the accelerator logs are stored and what we will be querying against. More information is available in the link if interested, though it is not necessary for understanding the rest of this page.

There is a "Live" button in the top right which will stream live data based on the query that has been written.

You'll also see the query itself built visually. The label job=accelerator_logs is the only required option, this filters out the logs we are interested in from everything else stored in Loki. Below that you will see further constraints on what will be returned: the line must contain the string "LCLS". And then there are 2 regular expressions that the line should not match. These will exclude any results from the change log or the watcher which can be noisy and drown out results you may be more interested in. However if these lines are of interest, these constraints can be removed by hitting the 'x' in the top right of either one to remove them from the query.

To further clean up the results, the deduplication option right above the log output can be used. Setting it to "Signature" will cause duplicate lines to be rolled up into one with an indication of how many times the results printed, two times in the example below:



You can also see the code backing the query below the visual builder, in this case {job="accelerator_logs"} |= "LCLS" !~ "([A-Z]{2,4}:[^ ]+ changed from)" !~ "(F2:WATCHER)". This is written in the LogQL query language. We still have job=accelerator_logs, and then the remaining elements of the query use the |= operator which specifies we want only lines that contain the given string, and the !~ operator to exclude lines that match the given string. Descriptions of these operators as well as a more in-depth look at creating queries are available here: https://grafana.com/docs/loki/latest/query/log_queries/

So for this particular query, |= "LCLS" means we want any line which contains the string LCLS. If we were to change this to |= "FACET" then we would only get FACET related logs. Then we have the same regular expression to exclude as before.

Writing Your Own Queries

Continuing with our example above, let's say we are interested in figuring out what the most frequent log lines are from minute to minute so that we can reduce the amount of noise in our logs. Here is an example of building up one such query with an explanation below it



The first element is a json parser which makes our data easier to work with in the following steps allowing us to query based on the keys in our data such as origin, facility, and text. We then filter out any errors just to ensure that our query will execute correctly. The next element is a "Count over time" which is the look back range for our query. By setting it to 1m here we will be looking at all lines that were stored in just the past minute. Using too large of a value for this will result in a failure as too much data will be returned. We are working on increasing the limits for what can be returned, but hopefully clean up of logging will take place that will make this more manageable from the log generation side as well.

Next we want to get a sum based on the amount of unique text we see in each line. We use the "Sum by" option for this and give it the "text" label which matches the json key in our log data. So other potential options for this label would be accelerator, origin, user, facility, and severity.

Finally we give it a floor value of 5 so that we only see log lines that have printed the exact same text more than 5 times in the past minute. Since we're interested in cleaning up potentially unnecessarily noisy lines, this will filter out anything which printed 5 or fewer times in the past minute.

Then we go ahead and run the query and by default will see a plot where each trace is associated with a line of text. It's not the clearest default for this data, but the way of visualizing the data can be changed with the different options available, and even more are present if we make a dashboard out of this data which we will get to in the next section.


One toggle that will make writing custom queries easier is "Explain query". It's also particularly handy if you are starting with someone else's query and want to understand what it is doing. Clicking on that will show information about what each part of the query is doing, similar to what was written above. And if you prefer working directly with the LogQL language rather than the visual builder, you can select the Code option instead and directly write the code for the underlying query. Both "Explain query" and "Code" were selected in the image below, but note that the "Explain query" functionality works just fine with the builder interface as well.


Saving Custom Queries as Dashboards


In order to save dashboards on SLAC's Grafana your user account must be granted the correct permissions. This request can be made by going to the #comp-sdf slack channel and asking for your account to be allowed to create dashboards on Grafana. Ensure you have signed into your account on Grafana at least once before making this request.


Now let's say we are happy with our query, and would like to save it both for our own use, and to easily share it with others as well. The best way of doing this is to turn it into a dashboard, as this will give us more visualization options as well as the ability to add additional functionality. For example, we could have automated alerts if a certain log line prints too many times in a given time period. To give us a head start, we can just click on the "Add to dashboard" button from our example query from above, which will gives us a new dashboard with the query already filled in for us.



Note that in the resulting dashboard there may not be any data as it defaults to a larger time window than can be currently handled. Setting the dropdown in the top right to last 5 minutes should fix that. From here, clicking on the three dots in the top right of the panel and choosing edit will bring up a large number of customization options to look through. An overview of these options and dashboard creation would be beyond the scope of this page, but a lot of online documentation is available, including the official documentation here: https://grafana.com/docs/grafana/latest/dashboards/

When going through this process to create a dashboard that should actually be saved, a folder called accelerator logs has been set up and the dashboard can be saved into there.



  • No labels