P115 Reprocessing

status: Complete
last update: 15 July 2010

This page is a record of the configuration for the P115 reprocessing project, motivated by the Pass 7.2 event classification. This project involves reprocessing with Pass7 classification trees and (ultimately) new IRFs. The name "P115" derives from the word "processing" and the initial file version to be used for the output data products, e.g., r0123456789_v115_merit.root.

  • P115-LEO-MERIT - Reclassify data from P106-LEO (full reprocessing of earth limb data from L&EO period) rather than Level 1, producing MERIT, FILTEREDMERIT and ELECTRONMERIT
  • P115-LEO-FT1 - From P115-LEO-MERIT, generate FT1, LS1, LS3, ELECTRONFT1 products, including the calculation of the diffuse response.

Datafile names, versions and locations

Data file version numbers for this reprocessing will begin with v115.

XROOT location and file naming

Location template:

/glast/Data/Flight/Reprocess/<reprocessName>/<dataType>

Locations for P110:

/glast/Data/Flight/Reprocess/P115-LEO/merit
/glast/Data/Flight/Reprocess/P115-LEO/filteredmerit
/glast/Data/Flight/Reprocess/P115-LEO/electronmerit
/glast/Data/Flight/Reprocess/P115-LEO/ft1
/glast/Data/Flight/Reprocess/P115-LEO/electronft1
/glast/Data/Flight/Reprocess/P115-LEO/ls1
/glast/Data/Flight/Reprocess/P115-LEO/ls3

File naming:

Data Type

Send to FSSC

Naming template

MERIT

No

r<run#>_<version>_<dataType>.root

FILTEREDMERIT

No

r<run#>_<version>_<dataType>.root

ELECTRONMERIT

No

r<run#>_<version>_<dataType>.root

ELECTRONFT1

No

r<run#>_<version>_<dataType>.fit

FT1

No

gll_ph_r<run#>_<version>.fit

LS1

No

gll_ev_r<run#>_<version>.fit

LS3

No

gll_lt_r<run#>_<version>.fit

DataCatalog location and naming

Logical directory and group template:

Data/Flight/Reprocess/<reprocessName>:<dataType>

Note that the <dataType> field (following the colon) is a DataCatalog 'group' name. All file names are of the form r<run#> as indicated by the examples below.

Logical directories for P115:

Data/Flight/Reprocess/P115-LEO:MERIT r0239557414
Data/Flight/Reprocess/P115-LEO:FILTEREDMERIT r0239557414
Data/Flight/Reprocess/P115-LEO:ELECTRONMERIT r0239557414
Data/Flight/Reprocess/P115-LEO:FT1 r0239557414
Data/Flight/Reprocess/P115-LEO:ELECTRONFT1 r0239557414
Data/Flight/Reprocess/P115-LEO:LS1 r0239557414
Data/Flight/Reprocess/P115-LEO:LS3 r0239557414

Data Sample

The currently defined data sample for P110 and P110-LEO reprocessing includes:

P115-LEO

(MET)

(UTC)

First run

237783740

2008-07-15 03:02:20

Last run

244406327

2008-09-29 18:38:47

Total runs

200

Total MERIT events

488,288,751

Total photon events

139,475,840

Total electron events

736,950

Note that the L&EO data represent a discontiguous set of runs.

Bookkeeping

  1. (This page): Define ingredients of reprocessing (processing code/configuration changes)
  2. Processing History database: http://glast-ground.slac.stanford.edu/HistoryProcessing/HProcessingRuns.jsp?processingname=P115-LEO
    1. List of all reprocessings
    2. List of all data runs reprocessed
    3. Pointers to all input data files (-> dataCatalog)
    4. Pointers to associated task processes (-> Pipeline II status)
  3. Data Catalog database: http://glast-ground.slac.stanford.edu/DataCatalog/folder.jsp
    1. Lists of and pointers to all output data files
    2. Meta data associated with each output data product

P115-LEO-MERIT

Status chronology

  • 8 Jul 2010 - Construct task
  • 9 Jul 2010 - Task complete

Configuration

Task Location

/nfs/farm/g/glast/u38/Reprocess-tasks/P115-LEO-MERIT

Task Status

http://glast-ground.slac.stanford.edu/Pipeline-II/index.jsp

GlastRelease

v17r35p1gr02

Skimmer

07-07-00

Input Data Selection

Set of 200 runs selected by Anders Borgland

Input Run List

ftp://ftp-glast.slac.stanford.edu/glast.u38/Reprocess-tasks/P115-LEO-MERIT/config/runFile.txt

photonFilter

evtClassDefs v0r6p1 CTBParticleType==0 && CTBClassLevel>0

electronFilter

CTBParticleType==1

jobOpts

ftp://ftp-glast.slac.stanford.edu/glast.u38/Reprocess-tasks/P115-LEO-MERIT/config/reClassify.txt

Output Data Products

MERIT, FILTEREDMERIT, ELECTRONMERIT

Timing and Scaling

The processClump job step averaged 20-30 hequ-min or 70-80 fell-min.
The mergeClumps job step averaged 5-15 hequ-min or 8-20 fell-min.

It was observed that xroot became very stressed when the number of processClump jobs reached ~1000. The stress increased when mergeClumps jobs became part of the mix. Pipeline throttle needed.

P115-LEO-FT1

Status chronology

  • 9 Jul 2010 - Assemble task
  • 12 Jul 2010 - After a 20-run test, begin production processing
  • 15 Jul 2010 - Task complete

Configuration

Task Location

/nfs/farm/g/glast/u38/Reprocess-tasks/P115-LEO-FT1

Task Status

http://glast-ground.slac.stanford.edu/Pipeline-II/index.jsp

Input Data Selection

MERIT (from P106-LEO-MERIT), FT2 (from P110-FT2)

Input Run List

ftp://ftp-glast.slac.stanford.edu/glast.u38/Reprocess-tasks/P115-LEO-FT1/config/runFile.txt

evtClassDefs

00-16-00

meritFilter

pass7_FSW_cuts,
(FswGamState==0||FswGamState==3) && (CTBCORE>0) && (CTBBestEnergyProb>0) && (CTBBestEnergy>10) && (CTBBestEnergyRatio<5) && (CTBClassLevel>0)

eventClassifier

Pass7_Classifier.py

ScienceTools

09-17-00 (SCons build)

Code Variant

redhat4-i686-32bit-gcc34 or redhat5-i686-32bit-gcc41

Diffuse Model

/afs/slac.stanford.edu/g/glast/ground/releases/analysisFiles/diffuse/v2/source_model_v02.xml [Ref]

Diffuse Response IRFs

P7_v2_diff, P7_v2_extrad, P7_v2_datac

IRFs

implemented as 'custom irf', files in /afs/slac.stanford.edu/g/glast/ground/PipelineConfig/IRFS/Pass7.2

Output Data Products

FT1, LS1, LS3, ELECTRONFT1

Processing chain for FITS data products

Data Product

makeFT1

gtdiffrsp

gtmktime

gtltcube

FT1

true

true for
evclsmin==8,9,10

true

false

LS1

true

false

true

false

LS3

false

false

false

true

ELECTRONFT1

true

false

true

false

Note on diffuse response calculation: gtdiffrsp is called three times in succession. The first time with IRF P7_v2_diff and evclsmin==8, followed by IRF P7_v2_extrad and evclsmin==9, and finally IRF P7_v2_datac and evclsmin==10. The resulting FT1 file has six columns of diffuse response, two columns (galactic and extragalactic response) for each of the three IRFs. This creates a non-standard FT1 file by FSSC standards as they expect only five diffuse response columns.

Timing and Scaling

Timing is dominated by the gtdiffrsp steps in the mergeClumps job step. There is a wide range of processing times due to: different classes of batch machines; different numbers of events to process; batch jobs being temporarily put into system suspend (SSUSP) states [due to pre-emptive queues]; or, possibly, due to processing dependencies in the data. Elapsed processing time for a single run ranges from 4 to 12+ hours (assuming no shortage of batch machines). The average CPU time per clump is 360 min +/- 160 min.

Thirteen of 200 jobs required xxl batch queue and then took >30 hours to complete.

  • No labels