Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Notes from Closeout Session: https://docs.google.com/document/d/1Ka4CkBBdhAkowJEzxd8_LyQCbSI2po41P4Tsrp6xviM

DOE Commitment:

  • 1.5 FTEs after 2018, down from 3 FTEs. Needs to be complete by end of FY ’18. Gradual ramp-up and transition desired.
  • Hardware purchasing? Server infrastructure, databases, etc.
  • IFC still funding hardware purchases
  • Hardware will come to stay at SLAC
  • ORACLE server migration soon

...

  • Went through externals and determined statuses/necessity of each package
  •  Also determined whether or not we needed to provided packages or if they were available in conda
  • ScienceTools Package Author Review

 

Other Notes About Science Tools:

  • Many pieces of analysis rely on ST being backwards-compatible.
  • Is the FSSC going to be doing more than just bug-fixes?
  • Will there be more development?
  • Fermipy unit tests catch a lot of issues
    • Advisable to run after Likelihood updates


Data Flow

Data Pipeline:

  • Data arrives from GSFC
  • L0 ingest triggered by FastCopy post-processing script (into database)
  • Halfpipe keeps track of what has been delivered
  • Once tidy & ordered, hands off to L1
  • L1 registers in data catalog & sends FITS to GSFC

Pipeline Monitoring (Warren & Michael Kuss): 

  • Current shifting scheme:
    •  Warren & Michael each take responsibility for 1/2 day.
    •  Maria Elena covers for when Warren on vacation.
    •  No coverage for Michael.
  • Large disorganized document 
    https://confluence.slac.stanford.edu/display/ds/Things+to+know+while+on-call+for+Data+Processing
    •  Needs to be split into child pages and updated
    •  As new person gets trained, would be good to start working on reorganization
    •  Some work done via command line to interact with LSF. 
    •  Most (95%) done via the web interface.
    •  Luca L. was looking at adding pipeline monitoring for young Torino folks.

 

Halfpipe (Steve Tether, maybe Gregg?):

Halfpipe has a specific Operations Task - gap reporting? (from Rob)

Warren: No, occurs before the Halfpipe…during FastCopy post-processing

 

Fast copy: https://confluence.slac.stanford.edu/display/ISOC/FASTCopy+processing+chain

processsci.py (?? I didn't really catch this)

  •  Launches gap finder to look for missing pieces
  •  Divides delivery into chunks and writes xml file
  •  Up to 20 chunks/delivery. Usually ~12 or so

Halfpipe reads chunked files:

  •  Extracts events from the datagrams
    •  Requires 2 and exactly 2 EPUs, otherwise will be rejected (in mergeEvents)
    •  i.e. If numEPUs =/= 2, stop
  •  Merges data from EPUs into a single time-ordered stream
    •  Necessary for down-stream processing
  •  Launches Level 1 pipeline

 

Question: What issues arise with the Halfpipe that need to be addressed?

Fast copy: https://confluence.slac.stanford.edu/display/ISOC/FASTCopy+processing+chain

processsci.py (?? I didn't really catch this)

  •  Launches gap finder to look for missing pieces
  •  Divides delivery into chunks and writes xml file
  •  Up to 20 chunks/delivery. Usually ~12 or so
  • Halfpipe keeps track of what has been delivered
  • Once tidy & ordered, hands off to L1
  • L1 registers in data catalog & sends FITS to GSFC


Halfpipe (Steve Tether, maybe Gregg?):

Halfpipe has a specific Operations Task - gap reporting? (from Rob)

Warren: No, occurs before the Halfpipe…during FastCopy post-processing

Halfpipe reads chunked files:

  •  Extracts events from the datagrams
    •  Requires 2 and exactly 2 EPUs, otherwise will be rejected (in mergeEvents)
    •  i.e. If numEPUs =/= 2, stop
  •  Merges data from EPUs into a single time-ordered stream
    •  Necessary for down-stream processing
  •  Launches Level 1 pipeline

 

Question: What issues arise with the Halfpipe that need to be addressed?

  •  Datagrams being omitted due to issue with merging logic.
    •  Runs broken up between deliveries. Datagrams fall through cracks.
    •  Has to be repiped.

 

Pipeline Monitoring (Warren & Michael Kuss): 

  • Current shifting scheme:
    •  Warren & Michael each take responsibility for 1/2 day.
    •  Maria Elena covers for when Warren on vacation.
    •  No coverage for Michael.
  • Large disorganized document 
    https://confluence.slac.stanford.edu/display/ds/Things+to+know+while+on-call+for+Data+Processing
    •  Needs to be split into child pages and updated
    •  As new person gets trained, would be good to start working on reorganization
    •  Some work done via command line to interact with LSF (batch submission). 
    •  Most (95%) done via the web interface.
    •  Luca L. was looking at adding pipeline monitoring for young Torino folks
  •  Datagrams being omitted due to issue with merging logic.
  •  Runs broken up between deliveries. Datagrams fall through cracks.
  •  Has to be repiped
    • .

 

ISOC Software Monitoring: 

...

/afs/slac/g/glast/ground/releases/calibrations/CAL/p7repro//

/afs/slac/g/glast/ground/releases/calibrations/TKR/

ACD calibration code location: /nfs/farm/afs/slac/g/glast/groundtak/releases/calibrations/TKR/GR-20-09-10.



ASP Discussion (Jim Chiang):

...

  • Needs someone that he could show the pipeline code and train to do heavy lifting when it comes to kicking the pipeline
  • Docker containers for something like the batch system may cause some problems, since
  • For something like the L1 pipeline, a number of images would need to be launched simultaneously
  •  Would size of the software cause problems with deployment?
  • We would need a system where you restrict loading images to the batch farm to prevent collisions/problems
  • There is probably a precedent for this, however, Matt has no experience deploying on this scale 
  • File size of ~1 GB is best, a few is manageable for production. 
  • IT dept supportive of docker@SLAC. There is 1 machine with RHEL7
  • Lyon is a much larger computing center - likely they will upgrade to Docker first
    • Now full support for Docker at Lyon (Fred)

Infrastructure:

  • Last purchase went into dev cluster
    • many nodes @RHEL6, upgrade to RHEL7 and doing docker with this
    • Still figuring out NFS/AFS sorted out with RHEL7. GPFS? 
  • It's good to come up with a plan because of security implications if NFS underneath. 
    • Use right docker (UID issues w/security)
  • SLAC will give us a few nodes for testing docker. Fall back way to install on user machines. (Brian)
    • AFS on RHEL6 docker
    • read files if world readable. 
    • NFS is hardest. 
  • Timeline for RHEL7, 12mo? 2018? (Matt)
    • RHEL7 support is dodgy. 
    • Configuration stuff is hard part

Flight Software:

...

  • to prevent collisions/problems
  • There is probably a precedent for this, however, Matt has no experience deploying on this scale 
  • File size of ~1 GB is best, a few is manageable for production. 
  • IT dept supportive of docker@SLAC. There is 1 machine with RHEL7
  • Lyon is a much larger computing center - likely they will upgrade to Docker first
    • Now full support for Docker at Lyon (Fred)

Infrastructure:

  • Last purchase went into dev cluster
    • many nodes @RHEL6, upgrade to RHEL7 and doing docker with this
    • Still figuring out NFS/AFS sorted out with RHEL7. GPFS? 
  • It's good to come up with a plan because of security implications if NFS underneath. 
    • Use right docker (UID issues w/security)
  • SLAC will give us a few nodes for testing docker. Fall back way to install on user machines. (Brian)
    • AFS on RHEL6 docker
    • read files if world readable. 
    • NFS is hardest. 
  • Timeline for RHEL7, 12mo? 2018? (Matt)
    • RHEL7 support is dodgy. 
    • Configuration stuff is hard part



Flight Software:

  •  Julie: No path to having anyone other than SLAC supporting flight software

LAT On-board Configuration:

 If we desired to change the on-board configuration, what happens?

  •  Jim Panetta knew the most about that. Took knowledge with him. But see this link.
  •  Gregg Thayer can do that nominally.
  •  Handshake in ISOC and GlastRelease has to be done as well
  •  MOOT/MOOD Table where the key is stored before it’s transmitted to flight software.
  •  Seems like it takes a while for the ground system to catch up before we can use on instrument.

From Gregg:

  •  Forming the input to the configuration build system is the least remembered part
  •  System for turning the crank and building configurations is fine
  •  Instructions for turning crank may need work
  •  Then need to check content of product before upload
  •  May 2010: Halt in data processing due to MOOT key mismatch with MPT
    •  Do we know how to handle the MPT?
    •  Gregg…yes.

Mission Planning/Flight Operations

Actions:

  • Finalize and document list of needed permissions on the ISOC Mission Planning page
  • Get Fermi Sharepoint access for NASA people (SLAC windows account is not enough)
  • Robin/Elizabeth/Jerry to propose round robin schedule for weekly mission planning
  • document support cron scripts for SAA checks and planning product generation
  • document occasional mission-week-boundary LAT command scheduling problem
  • NASA planners to take over LCI calibrations planning, effective immediately
  • FSSC to consider having LAT operations workstation/tool installation

September 2017 Actions:

  • any?

Routine Flight Operations Tasks

  • Monitoring, Trending, Reporting: for Weekly LAT Science reports and Quarterly LAT Science reports

Actions:

  • improve documentation in Confluence
  • make use of API to Google sheets for auto-updating LAT SSR usage and LAT trigger histories
  • move LAT monitoring tools and data files from /afs/slac/u/gl/rac/LATmetrics/ to /afs/slac/g/glast/isoc/flightOps/LATmetrics/ for easier shared use and support
    • Fix known bugs in agepoly.pl and x4saa.pl
  • re-discover use of the LAT Configuration GUI tool, and document it
  • migrate other Excel spreadsheet usage to non-Excel implementations
    • TKR on-board and ground bad strip history trending: need a solution (ipython notebook?) that makes time plots, and also 4x4 grid-based info output
    • CAL light output history trending

 If we desired to change the on-board configuration, what happens?

  •  Jim Panetta knew the most about that. Took knowledge with him
  •  Gregg Thayer can do that nominally.
  •  Handshake in ISOC and GlastRelease has to be done as well
  •  MOOT/MOOD Table where the key is stored before it’s transmitted to flight software.
  •  Seems like it takes a while for the ground system to catch up before we can use on instrument.

From Gregg:

  •  Forming the input to the configuration build system is the least remembered part
  •  System for turning the crank and building configurations is fine
  •  Instructions for turning crank may need work
  •  Then need to check content of product before upload
  •  May 2010: Halt in data processing due to MOOT key mismatch with MPT
    •  Do we know how to handle the MPT?
    •  Gregg…yes.

 

 

Mission Planning/Flight Operations

Actions:

  • Finalize and document list of needed permissions on the ISOC Mission Planning page
  • Get Fermi Sharepoint access for NASA people (SLAC windows account is not enough)
  • Robin/Elizabeth/Jerry to propose round robin schedule for weekly mission planning
  • document support cron scripts for SAA checks and planning product generation
  • document occasional mission-week-boundary LAT command scheduling problem
  • NASA planners to take over LCI calibrations planning, effective immediately
  • FSSC to consider having LAT operations workstation/tool installation

...

Mission Planning notes from Elizabeth:

...