The notes on the L1 calibration procedure given here been fleshed out on the page Updating L1 calibrations.  You should probably go there if that's what your looking for.  You can still view the videos here if you want.

Meeting and tutorial sessions with Warren F. during the 2017 September Software Week. My (brief) notes are below, as well as the recordings of the Zoom sessions that I took.









  • Implementing a new Calibration
    • perform during SAA pass in the future (as far as processing is concerned)
    • log into "ssh -XY"
    • had to edit my .cshrc file to get some environment stuff enabled
    • bring up window with "$ rdbGUI &" command
    • File -> Open DB Schema
    •   file Name: afs/slac/g/glast/applications/dbSchemas/calib
    •   make a bookmark for this location (GUI button)
    •   double-click "calib.xml"
  • Session -> Open connection
    •   got info from Warren to fill in fields
  • ongoing infrastucture issue still seems to be preventing us from progressing
  • cd $LATCalibRoot/TKR
  • latest file is LAT_BadStrips_44.xml, expecting "45"
    • file not copied to dir
    • file copied to appropriate place, continuing
  • "Open connection" password: calibr8tor
    •  select: instrument
    •  click on "more"
    •  select:flavor
    •  3rd field: type "L1current"
    •  click on "more"
    •  select: calib_type
    •  3rd field: select "TKR_DeadChan
  • cp /afs/slac/g/glast/users/lsrea/badStrips/xml/LAT_BadStrips_45.xml afs/slac/g/glast/applications/dbSchemas/calib
    • check that you have write permissions to destination first
  • select last row at bottom of rdbGUI
    • right-click, "copy latest option"
    • change date to current date,
    • change time to middle of SAA passage chosen earlier
    • chane data_ident to filename of .xml file above
    • click "send"
    • check "vstart" time of newest last line
  • fixing error in HalfPipe
    • look at "Fermi LAT Data Processing" page on the portal site
    • this issue likely caused by an infrastructure problem from the previous night
    • clicked on a doChunks stream, then "messages", saw a "read timed out" message, indicating network trouble
    • use "bjobs" command to test connection to LSF server


  • Monitoring batch farm
    • need to be on a machine with a batch client (all public machines do)
    • e.g. rhel6-64n
  • script in "Things to Know" page Warren created to monitor batch system
    • shouldn't run too many things at once, otherwise sys is overloaded
    • excess jobs are stored in subdirs
    • shouldn't be more than a couple hours old
    • look for pending jobs, find lock files
    • look at pipeline page
      • look at the "summary" link
      • look at the "flagFT2" link
      • look at the "Show streams" link
      • saw a "TERMINATED" job, clicked on it
      • saw a "LOCK_RUN" job, clicked on it
      • log file never created, dead-end
    • go to dir that contains log file
      • tail logfile: lots of "permission denied" messages due to expired AFS tokens
    • back at pipelin page, click "View messages" link
      • put 10000 in " last ___ minutes" box
      • indicates problem between pipeline and batch system
  • only way to rectify a failed job is to bkill it
    • (in terminal)
      • jobs="288191 288152"
      • bjobs $jobs
    • takes an hour for the pipeline to realize the job is dead
  • No labels