New information (late 2022): Updating the L1 calibration database


Sometimes the instrument specialists give us new calibrations. Usually just a dead strip list for TKR, 3 (asym, MeV/DAC, peds) at a time for CAL. They need to be put in a standard place, registered in the calibration DB, and set active. This requires picking a time when validity switches from the previous set to the new one. We want this to happen between runs. The gaps between non-SAA runs are short, and CALDB doesn't do leap seconds, so I put the transitions near the middle of an SAA passage - the first one after the last run that we have data for.

If there is not already an existing issue for the update, start one in the JIRA PII tracker.  Here is an example that can be used as a template: PII-454 - Getting issue details... STATUS based off the ones they use for on-board hot strip masking updates (e.g., OBCONF-204 - Getting issue details... STATUS ). 

We have adopted the workflow that one person (the primary) will do the update and another (the secondary) will check it and sign off after verifying that the start time is valid and the file names in the database are correct. They can either work on it together or have the secondary check later but before the calibration is scheduled to take effect. Hopefully, this will avoid situations in the past where updates were done incorrectly (e.g., the file names were mistyped) that resulted in having to cleanup the processing afterwards.

In general, it is probably better to do the updates early in the week rather than later.

Step-by-step guide

  1. Make sure your environment is setup correctly:  
    1. Log into a SLAC machine. 
      1. If you are using ssh, you will need to have X11 forwarding enabled (the -X or -Y option if not enabled by default).
      2. Using the necessary GUI via X11 can be very slow. SLAC has a remote desktop service called FastX that should work better.  It has web and desktop clients. See their web page for setup and usage instructions. 
        1. The GUI will not run on a FastX server. From a terminal session on a FastX server, you will need to "ssh -Y rhel6-64".
        2. From within your terminal session on RHEL6-64, you can quickly/easily check/confirm X-window GUI sharing with (for example) the "xeyes" or "xclock" commands.
        3. NOTE: rdbGUI does not run on centos7
      3. The GUI does not run on fermilnx01 or fermilnx02.  It will fail with a message like: error while loading shared libraries.  If you get this, try running on a different machine. The rhel6-64 pool machines seem to work fine.
    2. If one is not already sourced in your .bashrc or .cshrc then:
      1. bash: source /afs/slac.stanford.edu/g/glast/ground/scripts/group.sh
      2. (t)csh: source /afs/slac.stanford.edu/g/glast/ground/scripts/group.cshrc
    3. This will set environment variables like $GLASTROOT and the path to the GUI you will use below.
    4. Make sure that you have write permissions to the $LATCalibRoot directory.
      1. If not, ask Tom Glanzman.
  2. Copy files to the "normal" place (if the instrument people haven't already):
    1. TKR: $LATCalibRoot/TKR
    2. CAL: $LATCalibRoot/CAL/p7repro
  3. Figure out the transition time:
    1. Go to the data processing page.
      1. Find the most recent run. It will be at the top, unless deliveries have arrived out of order.
      2. Click on the run number.
      3. Note the start time.
    2. Go to the mission planning timeline
      1. Find the first SAA passage that is still in the future processing-wise.
      2. Pick a time that is in the middle of the SAA passage. That is when we will start the new calibration.
      3. You may also want to note the start time of the next run after the SAA.  You will want to send that information to the data monitors list (see below) when you inform them of the new calibration.
  4. Start the rdbGUI: type "rdbGUI" at the command prompt in your FastX terminal.
  5. Go to the "File" menu and select "Open DB Schema".
  6. Navigate to the directory /afs/slac/g/glast/applications/dbSchemas/calib
    1. You can paste this into the File Name box. NOTE: to "paste" from clipboard into a Qt GUI window, using a Mac: ctrl-click-v
    2. You can bookmark this location for future sessions by clicking on the small red flag icon, and retrieve the bookmark by clicking on the red flag icon.
  7. Open the calib.xml file
    1. select the file "calib.xml" and click on "OK".
    2. then click on metadata_v2r1 under the Tables box.
  8. Under the Session menu, click on "Open connection".  This will open a pop up box.
  9. Fill in the box like the image below.   The "profile name" and "profile description" boxes can be anything.  The password is the same as the user name but with an "8" in place of the second "a".  If you click "Save", then the connection information will be stored as a profile (e.g., "calibrator profile") that you can load next time from the box on the left instead of typing everything in.
  10. Click "OK". Then verify that the chosen database "calib" appears in the Database window in the top left corner of the GUI.
  11. Once connected, click on the drop down where is says "ser_no" (serial number) and set it to "instrument".  It will fill in "LAT" in the box on the right.
  12. Click on the "More" button at the bottom left corner of the upper box.  This will add another drop down menu row.
  13. Select "flavor" from the "ser_no" pull-down menu.
  14. In the right hand box (that may say "vanilla"), type in "L1current".  There is no menu choice for it.
  15. Click on the "More" button again.
  16. Select "calib_type" (calibration type) from the "ser_no" pull-down menu.
  17. In the right hand box, select the type, e.g., 'TKR_DeadChan'.
    1. If you don't know the type that corresponds to the file you have, you can pick an option from the menu and hit "Send" to see what the current/previous calibration files are used for that type. 
    2. For easier reference, here's a table with some example files and the calib_type for them.

      file name examplecalib_typeNotes
      fit_gcrhists_lkhd_568m_572m_bigsum.gcr_asym_hist.xmlCAL_Asymthe "568m_572m" should increment by 6m for each new calibration
      fit_proton_calib_568m_572m_bigsum.calMPD.xmlCAL_MevPerDacthe "568m_572m" should increment by 6m for each new calibration
      pedavr_568m_572m.xml

      CAL_Ped

      the "568m_572m" should increment by 6m for each new calibration
      LAT_BadStrips_45.xmlTKR_DeadChanthe "45" should increment by 1 for each new calibration
  18. Then click the "Send" button at the bottom right corner of the upper box, i.e. on the right hand middle of the main rdbGUI window.
  19. You will get something that looks like this:
  20. Add the new calibration file.
    1. Select the last row in the bottom box.
    2. Right-click on a field (not the row number) in the last row and select "Copy Latest" from the menu. NOTE: to right-click on a Mac laptop, do a 2-finger tap on the trackpad, or use a 3-button mouse
    3. In the pop-up box that comes up, change the "vstart" time to the middle of the SAA passage chosen earlier.
    4. Change "data_ident" field to the file name for the XML file:
    5. Click "Send" on the tall and narrow "insert" pop-up and it will disappear.
    6. To show the new record in the calibration database that you have just added: re-click the "Send" button on the right hand middle of the main rdbGUI window.
  21. There should be a new last line.  Scroll over and check that the "vstart" time is correct and that the "vend" field of the previous line is the same.
  22. Do this for each of the calibration files that need to be updated.  You are basically done.
  23.  To quit from the rdbGUI, select "Quit" from the "File" pull-down menu at the top left of the GUI window.
  24. Send an email to whoever created the calibration files and cc the data monitor list (datamonlist@glast4.stanford.edu) to let them know when the calibration should take effect.
    1. Get the ID of the next run by using XTime or similar tool to convert start time of the next run after the SAA pass into MET.
    2. Include the start time and first run in the email, e.g., "vstart is 2018-09-28 03:43. First run should be 559799706."
  25. Update the JIRA issue with the same information. 
  26. Later check that the new calibration has been used.
    1. Go to data processing page and click on the L1Proc bar for the run after the change should have happened.
    2. Go to the "Substreams" table in the middle of the page and click on one of the streams for the "doChunk" task.
    3. Go down to the "Substreams" table and click on any one of the "doCrumb" streams.
    4. Click on the log file link for the "recon" process.
    5. Search in the log for appropriate XML file(s):
      1. you can search for the string "BadStrips" in the log file, to find the TKR bad strips XML file.
      2. NOTE: the XML file(s) are selected by a command of the form:

        SELECT ser_no FROM metadata_v2r1 WHERE ((completion="OK") AND (instrument="LAT") AND (calib_type="TKR_DeadChan") AND (flavor="L1current")
        AND ("2021-05-22 12:57:20">=vstart) AND ("2021-05-22 12:57:20"<vend) AND ("PROD"=PROC_LEVEL)) ORDER BY update_time desc
        so in the event that the "vend" field of the previous record does not update, then the XML file might be selected by the "update_time" value. So you might want to confirm that the desired new XML calibration file has the latest "update_time".

  27. Assuming everything is good, update and close the JIRA issue.

Troubleshooting

This section is to help troubleshoot any errors.

L1Proc failing

If the L1Proc deliveries start to fail after the calibration change takes effect, look at the logs mentioned above and search for any errors.   If you see something like:

XmlBaseCnv FATAL Unable to parse document $(LATCalibRoot)/CAL/p7repro/fit_proton_calib_634m_638m_bigsum.calMPD.xml aka /afs/slac/g/glast/ground/releases/calibrations//CAL/p7repro/fit_proton_calib_634m_638m_bigsum.calMPD.xml

then that calibration file was not actually copied to the TKR or CAL directories described above.  Copy it there and then either contact someone to rollback the process or do it yourself following the instructions in Things to know while on-call for Data Processing.  In this case, a command line rollback of delivery 210427007:

/afs/slac.stanford.edu/u/gl/glast/pipeline-II/prod/pipeline -m PROD rollbackStream --minimum 'L1Proc[210427007]'

was all that was needed to successfully process the delivery.

Duplicate rows

During one update, step 20 was done twice creating two rows in the database, and the vend value was not updated for the "old" calibration file.  The calibrator uses does not have permission to delete rows.  What you can do is edit the value in the table to manually set either using the rdbGUI (like in step 20) or logging into the database on the command line. 

Here's what was done on the command line.  First, log into the mysql database.  I think this works from pretty much any SLAC machine.  The password is the same as the one used for rdbGUI given above:

mysql -h glastCalibDB.slac.stanford.edu -u calibrator -p

Then change to the calib database and look at the table.  Below are the last few entries for the TKR calibration files:

MySQL [(none)]> use calib;
MySQL [calib]> select ser_no,flavor,data_ident,vstart,vend,update_time from metadata_v2r1 where calib_type = 'TKR_DeadChan' and ser_no > 1256;
+--------+-----------+------------------------------------------+---------------------+---------------------+---------------------+
| ser_no | flavor    | data_ident                               | vstart              | vend                | update_time         |
+--------+-----------+------------------------------------------+---------------------+---------------------+---------------------+
|   1263 | L1current | $(LATCalibRoot)/TKR/LAT_BadStrips_58.xml | 2020-08-03 15:30:00 | 2020-11-05 04:23:00 | 2020-11-05 02:28:20 |
|   1267 | L1current | $(LATCalibRoot)/TKR/LAT_BadStrips_59.xml | 2020-11-05 04:23:00 | 2021-02-03 15:41:00 | 2021-02-03 20:28:28 |
|   1271 | L1current | $(LATCalibRoot)/TKR/LAT_BadStrips_60.xml | 2021-02-03 15:41:00 | 2037-01-01 00:00:00 | 2021-02-03 20:28:28 |
|   1278 | L1current | $(LATCalibRoot)/TKR/LAT_BadStrips_61.xml | 2021-05-04 18:29:00 | 2037-01-01 00:00:00 | 2021-05-04 22:09:54 |
|   1279 | L1current | $(LATCalibRoot)/TKR/LAT_BadStrips_61.xml | 2021-05-04 18:30:00 | 2037-01-01 00:00:00 | 2021-05-04 21:45:51 |
+--------+-----------+------------------------------------------+---------------------+---------------------+---------------------


I then updated the vend times the "old" one and the initial duplicate.  For good measure, I also changed the flavor of the 1278 row to "test" from "L1current" since the pipeline selects only "L1current" files (see the note in step 25e).

MySQL [calib]> update metadata_v2r1 set vend = '2021-05-04 18:30:00' where ser_no = 1271;
MySQL [calib]> update metadata_v2r1 set vend = '2021-05-04 18:30:00' where ser_no = 1278;
MySQL [calib]> update metadata_v2r1 set flavor='test' where ser_no = 1278;
MySQL [calib]> select ser_no,flavor,data_ident,vstart,vend,update_time from metadata_v2r1 where calib_type = 'TKR_DeadChan' and ser_no > 1256;
+--------+-----------+------------------------------------------+---------------------+---------------------+---------------------+
| ser_no | flavor    | data_ident                               | vstart              | vend                | update_time         |
+--------+-----------+------------------------------------------+---------------------+---------------------+---------------------+
|   1263 | L1current | $(LATCalibRoot)/TKR/LAT_BadStrips_58.xml | 2020-08-03 15:30:00 | 2020-11-05 04:23:00 | 2020-11-05 02:28:20 |
|   1267 | L1current | $(LATCalibRoot)/TKR/LAT_BadStrips_59.xml | 2020-11-05 04:23:00 | 2021-02-03 15:41:00 | 2021-02-03 20:28:28 |
|   1271 | L1current | $(LATCalibRoot)/TKR/LAT_BadStrips_60.xml | 2021-02-03 15:41:00 | 2021-05-04 18:30:00 | 2021-05-25 07:00:51 |
|   1278 | test      | $(LATCalibRoot)/TKR/LAT_BadStrips_61.xml | 2021-05-04 18:29:00 | 2021-05-04 18:30:00 | 2021-05-25 07:01:55 |
|   1279 | L1current | $(LATCalibRoot)/TKR/LAT_BadStrips_61.xml | 2021-05-04 18:30:00 | 2037-01-01 00:00:00 | 2021-05-04 21:45:51 |
+--------+-----------+------------------------------------------+---------------------+---------------------+---------------------

A later check of the pipeline confirmed that it is using the 1279 entry.


Further troubleshooting instructions will be added as issues come up.








3 Comments

  1. Note, when following these instructions logged in to a centos7 machine, I am not able to launch rdbGUI, I had to log in to a rhel6-64 machine for that.

    1. Yeah, rdbGUI needs to be replaced.  Joanne thinks it would be difficult to get it working on newer systems.  I have a JIRA issue about it  PII-469 - Getting issue details... STATUS .  I've been working on and off on a python script that will automate a lot of the what rdbGUI is used for.  It's pretty much ready.  Maybe now that the latest Fermitools release is out, I can circle back to this.  I basically need to have you guys start testing it and give feedback. I've already created a draft confluence page with a description of it with updated instructions if you want to take a look. I need to get testing and transition plan together.

      1. To maybe add some helpful info, though maybe you are aware, I tried using the singularity container, but ran into a problem.

        commands:

        set myimage=/gpfs/slac/atlas/fs1/sw/singularity/slac-fermi.img.ext3
        singularity shell -B /nfs:/nfs -B /afs:/afs -B /gpfs:/gpfs $myimage

        source /afs/slac.stanford.edu/g/glast/ground/scripts/group.sh


        I still could not run rdbGUI.  While logged in to a rhel6-64 machine, I sourced the group script to find out where rdbGUI was:

        which rdbGUI:

        /afs/slac.stanford.edu/g/glast/applications/install/@sys/usr/bin/rdbGUI


        While in the singularity container:

        ls /afs/slac.stanford.edu/g/glast/applications/install/@sys/usr/

        bin  i386_rhel40  i386_rhel50  include  lib


        however if I try to look in the bin directory:

        ls /afs/slac.stanford.edu/g/glast/applications/install/@sys/usr/bin/
        ls: cannot access /afs/slac.stanford.edu/g/glast/applications/install/@sys/usr/bin/: No such file or directory