Blog from July, 2011

Deploy MPT-ComponentManager to Flightops Production

Reason for Change

Final remediation for open SOAR S-Fermi-0293 "Temporary Loss of LAT Thermal Control". This change introduces a new daughter GUI to the Mission Planning tool which facilitates the uploading and removal of logically connected file sets as "components".

Test Procedure

Run the Mission Planning Gui on isoc-ops4 or isoc-ops5 using the Flightops TEST environment. Verify that PROCs and ATSes are produced correctly for uploads and removals respectively.

Related JIRAs

SSC-297@JIRA
IFO-91@JIRA

Documentation on new functionality

New documentation for the ComponentManager

Changes

New GUI in the MPT. New database entry in component table in the PROD db.

Reason for change

This is the version needed for Pass7. The switchover is planned for Monday, August 1st.

This is the version for PASS7_V6:
- adding electronMerit, electronFT1, extendedFT1, extendedLS1
- new GR, new monitoring tags, new cuts, new diffRsp
- removing Livetime Cubes, CompareDigiFastMon

Test Procedure

We have processed LPA and LCI runs in the DEV pipeline with this version of L1Proc.

We fully reprocessed a few hours of data from January 2011: http://glast-ground.slac.stanford.edu/Pipeline-II/si.jsp?stream=1877613&dataSourceMode=Dev

Rollback procedure

We can easily switch back to the previous version of L1Proc before the FSSC switches over to the new photon database (we probably have a few hours to rollback, if needed). This would have to be coordinated with ASP.

CCB Jira

SSC-295@JIRA

Details

L1Pipeline: L1Pipeline-02-09-00
- adding electronMerit, electronFT1, extendedFT1, extendedLS1
- new GR, new monitoring tags, new cuts, new diffRsp
- removing Livetime Cubes, CompareDigiFastMon

ScienceTools: ScienceTools-09-24-00
- several bugfixes taking care of the problem with gtdiffrsp

GlastRelease: GlastRelease-v17r35p23
- the transition from the v15r47 series is very significant
- here's the complete list of changes
- we think we understand the most relevant differences in systest

evtClassDefs: evtClassDefs-00-19-04
- adding cut files for P7V6 to be used in L1Proc

dataMonitoring/AlarmsCfg: AlarmsCfg-06-00-01
- a couple of small changes to the limits on the normalized rates for the p7 release.
- two alarms on the diffuse rates removed (they're not there anymore in Pass7).

dataMonitoring/Common: Common-06-10-04
- minor change to avoid a crash when a given branch in a trending tree is not found.

dataMonitoring/DigiReconCalMeritCfg: DigiReconCalMeritCfg-01-20-00
- modifications in the configuration files to deal with the new P7 variables GDQMQ-355@JIRA
- changes in config files to monitor the LLE events vs RA nd DEC GDQMQ-352@JIRA
- change in config file to add hit rates, normrates and GEM veto for tile 63 GDQMQ-356@JIRA
- compute Pass7 output monitoring variables in a different way; this fixes a problem when reading pass7 variables
- variables with zenith cut 60 deg were removed; variables with zenith cut 100 deg were added; naming convention was changed to keep consistency
- renaming the LLE quantities; added zenith angle cut to the LLE BinsRaDec quantities

svac/Monitor: Monitor-01-09-01
- changes in code to deal with the new variable rates with P7
- changes in code to monitor the LLE events vs RA nd DEC
- modifications in the code in order to properly retrieve and compute some of the Pass7 monitoring variables.

Complete set of tags for L1Proc 2.9

GlastRelease (sim/recon): GlastRelease-v17r35p23
ScienceTools (Level 2): ScienceTools-09-24-00
svac/L1Pipeline: L1Pipeline-02-09-00

calibTkrUtil: v2r9p1
calibGenTKR: v4r5

dataMonitoring/AlarmsCfg: AlarmsCfg-06-00-01
dataMonitoring/Common: Common-06-10-04
dataMonitoring/DigiReconCalMeritCfg: DigiReconCalMeritCfg-01-20-00
dataMonitoring/FastMon: FastMon-05-02-01
dataMonitoring/FastMonCfg: FastMonCfg-02-01-01
datMonitoring/IGRF: IGRF-02-01-00

svac/Monitor: Monitor-01-09-01
svac/EngineeringModelRoot: v4r4
svac/TestReport: TestReport-11-04-00

users/richard/pipelineDatasets: v0r6
ft2Util: v1r2p31
evtClassDefs: evtClassDefs-00-19-04
GPLtools: GPLtools-02-00-00

Reason for the change

  • In April computing division at IN2P3 informed us about the fact that they are moving from their current batch system BQS to a new system Sun Grid Engine (often also denoted SGE, GE or gridEngine).
  • In order to accommodate this change which will effectively replace all current BQS workers by the end of 2011 with SGE, the jobControl Daemon has to be updated.
  • Details for the current BQS implementation are provided here: P2 Architecture at Lyon. The details regarding CCIN2P3 here

Urgency

  • high as Lyon will switch fully to SGE sooner rather than later and we will loose our computing resources if we do not comply with the changes.

Details

  • A new GridEngineJobControlService module has been appended to the current org-glast-jobcontrol package. This class is to some large extent a duplicate of the existing BQSJobControlService module, except that the submission parameters are changed to accommodate SGE needs. All changes to this code are included in CVS.
  • In addition SGE does not provide the same commands for querying job status. Therefore a wrapper, ge-qselect in python has been written that uses native SGE commands but provides an output identical to the BQS qselect command. This code is under SVN version control and its latest version lives on ccglast.in2p3.fr in /glast_data/Pipeline2/gridEngine/ge-qselect
  • The new daemon is loaded together with the current BQS implementation through the described procedure in the CCIN2P3 Pipeline pages (see link above). I have created a little wrapper script called bsub-all.sh that starts the daemon and registers it along with the BQS service. In addition Lyon has opened port 1097 for communication with our pipeline infrastructure.
  • In order to accommodate all changes from BQS to SGE and not to rewrite major parts of the pipeline wrapper scripts, a number of variables need to be casted to old BQS variables. See the taskconfig.xml of LYON-TEST-AG-GR-v17r35p14 for details:
    taskconfig.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <pipeline
       xmlns="http://glast-ground.slac.stanford.edu/pipeline"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://glast-ground.slac.stanford.edu/pipeline http://glast-ground.slac.stanford.edu/Pipeline-II/schemas/2.0/pipeline.xsd">
    
       <task name="LYON-TEST-AG-GR-v17r35p14"
             version="1.1"
             type="LYON">
    
          <notation>
               more elaborate test case for GridEngine JobControl Daemon
          </notation>
    
          <variables>
    <!-- IN2P3 version -->
    
                <var name="GPL_SITE">IN2P3</var>
                <var name="BATCHSYSTEM">SGE</var>
                <var name="GPL_TASKROOT">/sps/glast/Pipeline2/MC-tasks/${pipeline.task}</var>
                <var name="GLASTROOT">/afs/in2p3.fr/group/glast/glastpro</var>
                <var name="GPL_SCRIPTS">${GLASTROOT}/ground/PipelineConfig/GPL/python</var>
                <var name="GPL2">${GLASTROOT}/ground/PipelineConfig/GPLtools/prod/</var>
                <var name="GPL_XROOTD_DIR">/afs/in2p3.fr/group/glast/glastpro/xroot/bin</var>
                <var name="GPL2_MESSAGELVL">DEBUG</var>
    
                <var name="GPL_BATCHCPULIMIT">1000</var> <!-- cputime in actual seconds... -->
                <var name="GPL_BATCHVMLIMIT">4000</var> <!-- that is the memory max for now -->
                <var name="GPL_BATCHSCRATCHLIMIT">4096M</var>            <!-- need scratch space in order to define TMPBATCH -->
                <var name="PIPELINE_LOGFILE">logFile.txt</var>
          <process name="runMonteCarlo">
                  <job batchOptions=" -l fsize=${GPL_BATCHSCRATCHLIMIT}" maxCPU="${GPL_BATCHCPULIMIT}" maxMemory="${GPL_BATCHVMLIMIT}" >
                    <!-- Need to redefine old BQS variables that are different in GE to work with wrapper scripts -->
                    echo "CAST OLD ENV VARS TO BQS FAKE"
                    export QSUB_HOME=${SGE_CWD_PATH}
                    export QSUB_HOST=${SGE_CELL}
                    export QSUB_SHELL=${SGE_O_CSHELL}
                    export QSUB_USER=${SGE_O_LOGNAME}
                    export QSUB_WORKDIR=${SGE_O_WORKDIR}
                    export QSUB_REQNAME=${JOB_ID}
                    export QSUB_REQID=${SGE_O_HOST}
                    export TMPBATCH=${TMPDIR}
                    ### cool that's what we needed i guess SZ
    ...
    </process>
    </task>
    </pipeline>
    
  • In addition the runMonteCarlo.py script needs a minor modification to support the logscan. See the aforementioned task for details.

Test Conditions

  • on glastlnx12 there is the current package org-glast-jobcontrol-1.10-SNAPSHOT.jar which can be sourced to use the jython script in the glast-home directory. A sample test that does a few counts and returns an email is contained in a jython script in the glast-home directory.
    GridEngine.jy
    glast@glastlnx12 $ setenv CLASSPATH org-glast-jobcontrol-1.10-SNAPSHOT.jar
    glast@glastlnx12 $ jython GridEngineTest.jy
    *sys-package-mgr*: processing new jar, '/afs/slac.stanford.edu/u/gl/glast/org-glast-jobcontrol-1.10-SNAPSHOT.jar'
    *sys-package-mgr*: can't write cache file for '/afs/slac.stanford.edu/u/gl/glast/org-glast-jobcontrol-1.10-SNAPSHOT.jar'
    *sys-package-mgr*: processing new jar, '/afs/slac.stanford.edu/package/java/i386_linux2/jdk1.6.0_26/jre/lib/resources.jar'
    *sys-package-mgr*: can't write cache file for '/afs/slac.stanford.edu/package/java/i386_linux2/jdk1.6.0_26/jre/lib/resources.jar'
    *sys-package-mgr*: processing new jar, '/afs/slac.stanford.edu/package/java/i386_linux2/jdk1.6.0_26/jre/lib/rt.jar'
    *sys-package-mgr*: can't write cache file for '/afs/slac.stanford.edu/package/java/i386_linux2/jdk1.6.0_26/jre/lib/rt.jar'
    *sys-package-mgr*: processing new jar, '/afs/slac.stanford.edu/package/java/i386_linux2/jdk1.6.0_26/jre/lib/jsse.jar'
    *sys-package-mgr*: can't write cache file for '/afs/slac.stanford.edu/package/java/i386_linux2/jdk1.6.0_26/jre/lib/jsse.jar'
    *sys-package-mgr*: processing new jar, '/afs/slac.stanford.edu/package/java/i386_linux2/jdk1.6.0_26/jre/lib/jce.jar'
    *sys-package-mgr*: can't write cache file for '/afs/slac.stanford.edu/package/java/i386_linux2/jdk1.6.0_26/jre/lib/jce.jar'
    *sys-package-mgr*: processing new jar, '/afs/slac.stanford.edu/package/java/i386_linux2/jdk1.6.0_26/jre/lib/charsets.jar'
    *sys-package-mgr*: can't write cache file for '/afs/slac.stanford.edu/package/java/i386_linux2/jdk1.6.0_26/jre/lib/charsets.jar'
    *sys-package-mgr*: processing new jar, '/afs/slac.stanford.edu/package/java/i386_linux2/jdk1.6.0_26/jre/lib/ext/sunjce_provider.jar'
    *sys-package-mgr*: can't write cache file for '/afs/slac.stanford.edu/package/java/i386_linux2/jdk1.6.0_26/jre/lib/ext/sunjce_provider.jar'
    *sys-package-mgr*: processing new jar, '/afs/slac.stanford.edu/package/java/i386_linux2/jdk1.6.0_26/jre/lib/ext/sunpkcs11.jar'
    *sys-package-mgr*: can't write cache file for '/afs/slac.stanford.edu/package/java/i386_linux2/jdk1.6.0_26/jre/lib/ext/sunpkcs11.jar'
    *sys-package-mgr*: processing new jar, '/afs/slac.stanford.edu/package/java/i386_linux2/jdk1.6.0_26/jre/lib/ext/dnsns.jar'
    *sys-package-mgr*: can't write cache file for '/afs/slac.stanford.edu/package/java/i386_linux2/jdk1.6.0_26/jre/lib/ext/dnsns.jar'
    *sys-package-mgr*: processing new jar, '/afs/slac.stanford.edu/package/java/i386_linux2/jdk1.6.0_26/jre/lib/ext/localedata.jar'
    *sys-package-mgr*: can't write cache file for '/afs/slac.stanford.edu/package/java/i386_linux2/jdk1.6.0_26/jre/lib/ext/localedata.jar'
    *sys-package-mgr*: can't write index file
    cycle  0  status:  Job 2217825 PENDING null  sleep for 15 seconds
    cycle  1  status:  Job 2217825 PENDING null  sleep for 15 seconds
    cycle  2  status:  Job 2217825 PENDING null  sleep for 15 seconds
    cycle  3  status:  Job 2217825 PENDING null  sleep for 15 seconds
    cycle  4  status:  Job 2217825 RUNNING ccwsge0467.in2p3.fr  sleep for 15 seconds
    cycle  5  status:  Job 2217825 RUNNING ccwsge0467.in2p3.fr  sleep for 15 seconds
    cycle  6  status:  Job 2217825 RUNNING ccwsge0467.in2p3.fr  sleep for 15 seconds
    cycle  7  status:  Job 2217825 RUNNING ccwsge0467.in2p3.fr  sleep for 15 seconds
    cycle  8  status:  Job 2217825 DONE ccwsge0467.in2p3.fr  sleep for 15 seconds
    cycle  9  status:  Job 2217825 DONE ccwsge0467.in2p3.fr  sleep for 15 seconds
    done.
    
  • In addition Tony has updated the development pipeline with the new java class and tests can be done using its webinterface
  • Currently there are two tasks which are identical clones of task AG-GR-v17r35p14-IRFS76BK-allE: BQS-TEST-AG-GR-v17r35p14 and LYON-TEST-AG-GR-v17r35p14. Both tests live in the development version of the pipeline but are stored in the usual MC-Tasks directory, i.e. on /nfs/farm/g/glast/u44/IN2P3/MC-Tasks/

Reasons for Change

  • Switch to Pass 7
  • Switch to redhat5-x86_64-64bit-gcc41 builds
  • Switch to ScienceTools-09-24-00

Test Procedure

Tested in dev on reprocessed Pass 7 data in /ASP/P7V6_P120 and on data in /ASP/TestSims2

Rollback Procedure

Since this entails a switch to Pass 7, a roll-back of ASP independent of L1 isn't possible. If L1 rolls-back to Pass 6 processing, then ASP would need to roll-back to ASP-04-00-00.

CCB Jira

SSC-293@JIRA

Details

  • Asp_containerSettings (ASP-05-00-00)*
    • add ST syspfiles to PFILES env var
  • AspLauncher-02-00-00*
    • update xml model files to point P7V6 diffuse model
    • update xml model to use redhat5-x86_64-64bit-gcc41 builds
  • BayesianBlocks-03-00-00
  • asp_pgwave-02-00-00*
    • update xml model files to point P7V6 diffuse model
    • update xml model to use redhat5-x86_64-64bit-gcc41 builds
    • modify event selection to use P7SOURCE_V6
    • set matplotlib backend to 'Agg' explicitly to avoid tkAgg requirements on DISPLAY env var
  • drpMonitoring-02-00-00*
    • update xml model files to point P7V6 diffuse model
    • update xml model to use redhat5-x86_64-64bit-gcc41 builds
  • grbASP-05-00-00*
    • update xml model files to point P7V6 diffuse model
    • update xml model to use redhat5-x86_64-64bit-gcc41 builds
    • update xml model to use EXTENDEDFT1 group data
    • apply Pass 7 event selections to transient data
    • set matplotlib backend to 'Agg' explicitly to avoid tkAgg requirements on DISPLAY env var
  • pyASP-04-00-00*
    • add CLHEP:: namespace qualifiers
    • add scripts and xml task def to register P7V6_P120 test data
  • asp_healpix-02-02-05*
    • updates for gcc4.4
  • asp_skymaps-01-13-06*
    • add CLHEP:: namespace qualifiers
  • asp_pointlike-06-14-04*
    • add CLHEP:: namespace qualifiers