P120 Reprocessing
status: In Progress
last update: 28 July 2010
This page is a record of the configuration for the P120 reprocessing project, event reclassification using Pass 7.3. This project involves reprocessing with Pass7 classification trees and (ultimately) new IRFs. The name "P120" derives from the word "processing" and the initial file version to be used for the output data products, e.g., r0123456789_v120_merit.root.
- P120-MERIT - this task reads DIGI+RECON+MERIT and produces reprocessed MERIT + FILTEREDMERIT (photons) + ELECTRONMERIT
- P120-FITS - this task will (eventually) read FILTEREDMERIT and produce FT1 (photons) + LS1 (merit-like FITS file for photons) + electron FITS file + LS3 (live-time cube)
Datafile names, versions and locations
Data file version numbers for this reprocessing will begin with v110.
XROOT location and file naming
Location template:
/glast/Data/Flight/Reprocess/<reprocessName>/<dataType>
Locations for P120:
/glast/Data/Flight/Reprocess/P120/merit /glast/Data/Flight/Reprocess/P120/filteredmerit /glast/Data/Flight/Reprocess/P120/electronmerit /glast/Data/Flight/Reprocess/P120/ft1 /glast/Data/Flight/Reprocess/P120/electronft1 /glast/Data/Flight/Reprocess/P120/ls1 /glast/Data/Flight/Reprocess/P120/ls3
File naming:
Data Type |
Send to FSSC |
Naming template |
---|---|---|
MERIT |
No |
r<run#>_<version>_<dataType>.root |
FILTEREDMERIT |
No |
r<run#>_<version>_<dataType>.root |
ELECTRONMERIT |
No |
r<run#>_<version>_<dataType>.root |
ELECTRONFT1 |
No |
r<run#>_<version>_<dataType>.fit |
FT1 |
Yes |
gll_ph_r<run#>_<version>.fit |
LS1 |
Yes |
gll_ev_r<run#>_<version>.fit |
LS3 |
Maybe |
gll_lt_r<run#>_<version>.fit |
Example:
/glast/Data/Flight/Reprocess/P120/merit/r0239557414_v120_merit.root /glast/Data/Flight/Reprocess/P120/filteredmerit/r0239557414_v120_filteredmerit.root /glast/Data/Flight/Reprocess/P120/electronmerit/r0239557414_v120_electronmerit.root /glast/Data/Flight/Reprocess/P120/ft1/gll_ph_r0239559565_v120.fit /glast/Data/Flight/Reprocess/P120/electronft1/r0239557414_v120_electronft1.fit /glast/Data/Flight/Reprocess/P120/ls1/gll_ev_r0239559565_v120.fit /glast/Data/Flight/Reprocess/P120/ls3/gll_lt_r0239559565_v120.fit
DataCatalog location and naming
Logical directory and group template:
Data/Flight/Reprocess/<reprocessName>:<dataType>
Note that the <dataType> field (following the colon) is a DataCatalog 'group' name, and file names are of the form r<run#>.
Naming examples:
Data/Flight/Reprocess/P120:MERIT r0239557414 Data/Flight/Reprocess/P120:FILTEREDMERIT r0239557414 Data/Flight/Reprocess/P120:FT1 r0239557414 Data/Flight/Reprocess/P120:LS1 r0239557414 Data/Flight/Reprocess/P120:LS3 r0239557414
Data Sample
The currently defined data sample for P120 reprocessing includes:
First run |
239557414 (MET), 2008-08-04 15:43:34 (UTC) |
Last run |
301071924 (MET), 2010-07-17 15:05:24 (UTC) |
Total runs |
10697 |
Total MERIT events |
23,273,219,670 |
Total FT1 events |
n/a |
Bookkeeping
- (This page): Define ingredients of reprocessing (processing code/configuration changes)
- Processing History database: http://glast-ground.slac.stanford.edu/HistoryProcessing/HProcessingRuns.jsp?processingname=P120
- List of all reprocessings
- List of all data runs reprocessed
- Pointers to all input data files (-> dataCatalog)
- Pointers to associated task processes (-> Pipeline II status)
- Data Catalog database: http://glast-ground.slac.stanford.edu/DataCatalog/folder.jsp
- Lists of and pointers to all output data files
- Meta data associated with each output data product
P120-MERIT
Status chronology
- 7/28/2010 - block 1 re-reprocessing complete
- 7/27/2010 - New GlastRelease (v17r35p10) containing new evtUtils, "to make the FT1EventClass bits compatible with the ScienceTools". Cleanup, including removing all files created last week during the first attempt.
- 7/21/2010 - block 1 reprocessing complete
- 7/20/2010 - agree upon 'pilot block' of runs (239557417 - 243220241), 637 runs. Begin...
- 7/19/2010 - submit first test run. success. await feedback
Configuration
Task Location |
/nfs/farm/g/glast/u38/Reprocess-tasks/P120-MERIT |
Task Status |
http://glast-ground.slac.stanford.edu/Pipeline-II/index.jsp |
GlastRelease |
v17r35p8 v17r35p10 |
Input Data Selection |
"standard" from https://confluence.slac.stanford.edu/display/SCIGRPS/LAT+Dataset+Definitionsalong with "&& (RunQuality != "Bad" || is_null ( RunQuality )" |
Input Run List |
ftp://ftp-glast.slac.stanford.edu/glast.u38/Reprocess-tasks/P120-MERIT/config/runFile.txt |
photonFilter |
CTBParticleType==0 && CTBClassLevel>0 |
electronFilter |
CTBParticleType==1 |
jobOpts |
ftp://ftp-glast.slac.stanford.edu/glast.u38/Reprocess-tasks/P120-MERIT/config/reClassify.txt |
Output Data Products |
Timing and Scaling
- (block 1 results) The processClump step is taking ~40 hequ-minutes (or ~65 fell-minutes). With >500 simultaneous jobs running, there is little noticeable strain on xroot. There are five servers in the yellow-orange load range and they are claiming ~110-130 MB/s I/O rate.
- The mergeClumps step is taking ~5 hequ-minutes
- It was observed that submitting 70 runs at once put a strain on /u30, home of GlastRelease. Some 93 of ~540 jobs failed with I/O error, but succeeded upon rollback.
Load balancing
Introduce new trickleStreams.py script to (partially) assess pipeline activity and only the number of jobs considered appropriate based on available data.
maxProcessClumps = 600 ## prevent overload of xroot maxMergeClumps = 20 ## prevent overload of xroot (inactive) maxStreamsPerCycle = 20 ## prevent overload of /u30 on startup timePerCycle = 900 ## 15 minutes: allow time for dust to settle
With these parameters, it took ~ 5 hours to reach a point where fewer than 20 jobs per cycle were regularly submitted. Another 4.5 hours for the task to complete. On average, one run generated 7.5 processClump batch jobs.
P120-FT1
Status chronology
Configuration
WARNING: NOT UP TO DATE!
Task Location |
/nfs/farm/g/glast/u38/Reprocess-tasks/P120-FT1 |
Task Status |
http://glast-ground.slac.stanford.edu/Pipeline-II/index.jsp |
Input Data Selection |
MERIT (from P120-MERIT), FT2 (from P100-FT2 and Level1) |
Input Run List |
ftp://ftp-glast.slac.stanford.edu/glast.u38/Reprocess-tasks/P120-FT1/config/runFile.txt |
evtClassDefs |
00-16-00 |
meritFilter |
pass7_FSW_cuts, |
eventClassifier |
Pass7_Classifier.py |
ScienceTools |
09-15-05 (SCons build) |
Code Variant |
forced to redhat4-i686-32bit-gcc34 |
Diffuse Model |
/afs/slac.stanford.edu/g/glast/ground/releases/analysisFiles/diffuse/v2/source_model_v02.xml ) |
Diffuse Response IRFs |
P7_v2_diff, P7_v2_extrad, P7_v2_datac |
IRFs |
implemented as 'custom irf', files in /afs/slac.stanford.edu/g/glast/ground/PipelineConfig/IRFS/Pass7.2 |
Output Data Products |
Processing chain for FITS data products
Data Product |
makeFT1 |
gtdiffrsp |
gtmktime |
gtltcube |
---|---|---|---|---|
FT1 |
true |
true for |
true |
false |
LS1 |
true |
false |
true |
false |
LS3 |
false |
false |
false |
true |
ELECTRONFT1 |
true |
false |
true |
false |
Note on 'Code Variant': The SLAC batch farm contains a mixture of architectures , both hardware (Intel/AMD and 32-/64-bit) and software (RedHat Enterprise Linux 3, 4 and 5, gcc 3.2, 3.4, 4.1, etc.). GLAST/Fermi code builds on many newer combinations, but is not yet validated on them.
Note on diffuse response calculation: gtdiffrsp is called three times in succession. The first time with IRF P7_v2_diff and evclsmin==8, followed by IRF P7_v2_extrad and evclsmin==9, and finally IRF P7_v2_datac and evclsmin==10. The resulting FT1 file has six columns of diffuse response, two columns (galactic and extragalactic response) for each of the three IRFs. This creates a non-standard FT1 file by FSSC standards as they expect only five diffuse response columns.
Timing
n/a