status: Running
last update: 5 April 2012
This page is a record of the configuration and execution of the P202 reprocessing project, full reprocessing from DIGIs using Pass7 analysis code. This project involves reprocessing with Pass7 classification trees and new IRFs. This task will read DIGI files and emit RECON, MERIT, GCR and CAL ROOT files, and the standard array of FITS files. It will be a CPU-intensive and storage-intensive enterprise requiring months of elapsed time and of order 1 Pbyte of storage. At the time of this task beginning, there will be about 20,000 science runs in Fermi (3.5 years accumulation).
To avoid occupying a new 1 PB of disk space, the plan is to remove old RECON files once they have been reprocessed. This is a shell game that involves some amount of buffer space and then waiting until the new RECON file has been created and (to some extent) validated before removal. The old RECON files will be retained on tape in the HPSS system and they will be available via xroot (but with some delay as these large files are staged in).
The name "P202" derives from the word "processing" and the initial file version to be used for the output data products, e.g., r0123456789_v202_merit.root.
"New generation" tasks (using SCons, OO and common python scripts, etc.)
- P202-ROOT - This task reads DIGI and produces reprocessed RECON + CAL + GCR + MERIT + FILTEREDMERIT (photons) + ELECTRONMERIT
- P202-FITS - This task reads MERIT and produces FT1 (photons) + EXTENDEDFT1 + LS1 (merit-like FITS file for photons) + EXTENDEDLS1 + ELECTRONFITS file
Datafile names, versions and locations
Data file version numbers for this reprocessing will begin with v202.
XROOT location and file naming
Location template:
/glast/Data/Flight/Reprocess/<reprocessName>/<dataType>
Locations for P202:
/glast/Data/Flight/Reprocess/P202/recon /glast/Data/Flight/Reprocess/P202/cal /glast/Data/Flight/Reprocess/P202/gcr /glast/Data/Flight/Reprocess/P202/merit /glast/Data/Flight/Reprocess/P202/filteredmerit /glast/Data/Flight/Reprocess/P202/electronmerit /glast/Data/Flight/Reprocess/P202/ft1 /glast/Data/Flight/Reprocess/P202/extendedft1 /glast/Data/Flight/Reprocess/P202/electronft1 /glast/Data/Flight/Reprocess/P202/ls1 /glast/Data/Flight/Reprocess/P202/extendedls1
File naming:
Data Type |
aka |
Send to FSSC |
Naming template |
---|---|---|---|
RECON |
|
No |
r<run#>_<version>_<dataType>.root |
CAL |
|
No |
r<run#>_<version>_<dataType>.root |
GCR |
|
No |
r<run#>_<version>_<dataType>.root |
MERIT |
|
No |
r<run#>_<version>_<dataType>.root |
FILTEREDMERIT |
|
No |
r<run#>_<version>_<dataType>.root |
ELECTRONMERIT |
|
No |
r<run#>_<version>_<dataType>.root |
ELECTRONFT1 |
|
No |
gll_el_p<procVer>_r<run#>_<version>.fit |
EXTENDEDFT1 |
|
No |
gll_xp_p<procVer>_r<run#>_<version>.fit |
FT1 |
LS-002 |
Yes |
gll_ph_p<procVer>_r<run#>_<version>.fit |
EXTENDEDLS1 |
|
No |
gll_xe_p<procVer>_r<run#>_<version>.fit |
LS1 |
LS-001 |
Yes |
gll_ev_p<procVer>_r<run#>_<version>.fit |
Note: 'procVer' is a field added to the file name (and the keyword "PROC_VER" in the primary header) added to the FFD 5/12/2010. Ref: http://fermi.gsfc.nasa.gov/ssc/dev/current_documents/Science_DP_FFD_RevA.pdf
Examples:
/glast/Data/Flight/Reprocess/P200/recon/r0239557414_v202_recon.root /glast/Data/Flight/Reprocess/P200/cal/r0239557414_v202_cal.root /glast/Data/Flight/Reprocess/P200/gcr/r0239557414_v202_gcr.root /glast/Data/Flight/Reprocess/P200/merit/r0239557414_v202_merit.root /glast/Data/Flight/Reprocess/P200/filteredmerit/r0239557414_v202_filteredmerit.root /glast/Data/Flight/Reprocess/P200/electronmerit/r0239557414_v202_electronmerit.root /glast/Data/Flight/Reprocess/P200/extendedft1/gll_xp_p202_r0239559565_v202.fit /glast/Data/Flight/Reprocess/P200/ft1/gll_ph_p202_r0239559565_v202.fit /glast/Data/Flight/Reprocess/P200/electronft1/gll_el_p202_r0239559565_v202.fit /glast/Data/Flight/Reprocess/P200/extendedls1/gll_xe_p202_r0239559565_v202.fit /glast/Data/Flight/Reprocess/P200/ls1/gll_ev_p202_r0239559565_v202.fit
DataCatalog location and naming
Logical directory and group template:
Data/Flight/Reprocess/<reprocessName>:<dataType>
Note that the <dataType> field (following the colon) is a DataCatalog 'group' name, and file names are of the form r<run#>.
Naming examples:
Data/Flight/Reprocess/P202:RECON r0239557414 Data/Flight/Reprocess/P202:CAL r0239557414 Data/Flight/Reprocess/P202:GCR r0239557414 Data/Flight/Reprocess/P202:MERIT r0239557414 Data/Flight/Reprocess/P202:FILTEREDMERIT r0239557414 Data/Flight/Reprocess/P202:EXTENDEDFT1 r0239557414 Data/Flight/Reprocess/P202:FT1 r0239557414 Data/Flight/Reprocess/P202:ELECTRONFT1 r0239557414 Data/Flight/Reprocess/P202:EXTENDEDLS1 r0239557414 Data/Flight/Reprocess/P202:LS1 r0239557414
Data Sample
The currently defined data sample for P202 reprocessing includes:
First run |
239557414 (MET), 2008-08-04 15:43:34 (UTC) |
|
Last run |
348951073 (MET), 2012-01-22 18:51:13 (UTC) |
|
Total runs |
19,172 |
|
Total input DIGI events |
41,856,513,685 |
|
Total RECON events |
|
|
Total CAL events |
|
|
Total GCR events |
|
|
Total MERIT events |
|
all "events" |
Total EXTENDEDFT1/LS1 events |
|
all photon event classes |
Total LS1 (FSSC selection) events |
|
event classes (bits) 0,2,3,4 (transient, source, clean, ultraclean) |
Total FT1 (FSSC selection) events |
|
event classes (bits) 2,3,4 (source, clean, ultraclean) |
Total disk space used |
N/A |
|
NOTE: One run, 242429468, of type TrigTest was declared 'good for science' and has been included.
Progress at the 1-year mark:
First run |
239557414 (MET), 2008-08-04 15:43:34 (UTC) |
|
|
Last run |
271999199 (MET), 2009-08-15 03:19:57 (UTC) |
|
|
Total runs |
5600 |
|
|
Total input DIGI events |
11,928,911,465 |
|
|
Total RECON events |
11,928,911,465 |
161.4 TB |
|
Total CAL events |
11,928,911,465 |
36.1 TB |
|
Total GCR events |
11,928,911,465 |
260.5 GB |
|
Total MERIT events |
11,928,911,465 |
9.6 TB |
all triggered events |
Total FILTEREDMERIT events |
1,572,783,868 |
1.3 TB |
all photon event classes |
Total EXTENDEDFT1 |
1,572,783,826 |
143.7 GB |
all photon event classes |
Total LS1 events |
1,572,783,826 |
255.0 GB |
all photon event classes |
Total LS1 (FSSC selection) events |
271,923,333 |
44.2 GB |
event classes (bits) 0,2,3,4 (transient, source, clean, ultraclean) |
Total FT1 (FSSC selection) events |
24,261,962 |
2.4 GB |
event classes (bits) 2,3,4 (source, clean, ultraclean) |
[to be continued...]
Bookkeeping
- (This page): Define ingredients of reprocessing (processing code/configuration changes)
- Processing History database: http://glast-ground.slac.stanford.edu/HistoryProcessing/HProcessingRuns.jsp?processingname=P202
- List of all reprocessings
- List of all data runs reprocessed
- Pointers to all input data files (-> dataCatalog)
- Pointers to associated task processes (-> Pipeline II status)
- Data Catalog database: http://glast-ground.slac.stanford.edu/DataCatalog/folder.jsp
- Lists of and pointers to all output data files
- Meta data associated with each output data product
P202-ROOT
Status chronology
- 2/13/2012 - begin trials with final calibration and alignments from Leon; 5 runs reprocessed
- 2/14/2012 - trials continue with blocks of 15, 20, 25 and 50 runs reprocessed (each run generates ~20 batch jobs)
- 2/16/2012 - begin trickleStream production. Initial config:
=============================================================================== TRICKLE PARMS =============================================================================== task = P202-ROOT maxRuns = 19172 firstStep = setupRun steps = [['/processRun processClump', 1500, 20], ['mergeClumps', 70, 1]] maxStreamsPerCycle = 20 timePerCycle = 300 ===============================================================================
- 2/21/2012 - One clump reprocessed with pointer to new mySQL DB (stream 710.0)
- 2/22/2012 - 776 runs complete. Pausing task.
- 3/15/2012 - resume task. New goal is 1-year of data (~5600 runs)
- 3/31/2012 - 1-year complete (5600 runs). There have been a few nasty problems which need to be fixed before continuing.
- 4/10/2012 - Unknown 'glitch' may have caused a few 100's of jobs to crash and take sukly46 along with them.
- 4/11/2012 - 10:40pm lightening strikes SLAC power lines. Site-wide power outage. Stream 7795 was the last stream submitted prior to the outage.
S/W component |
bug fix |
status |
---|---|---|
New ROOT version |
5-min 'transaction timeout' triggered by xroot data server reboot |
done 4/3/2012 |
New GlastRelease |
1) include new ROOT version (above); 2) exit with non-zero RC on ROOT write error |
done 4/5/2012, GR 17-35-24-rp04 |
New GPL_TOOLS |
check size/checksum of file written to xroot with known size/checksum |
pending |
Tuned xroot on new Dell servers |
silent file truncation when volume fills up JIRA |
done 4/4/2012 (100 MB min space limit -> 100 GB; file system space check cadence changed from 10 min to 2 min) |
New xroot client tools |
complain when xroot data server fails on write |
done 4/3/2012, v3.1.1 |
New TSkim |
1) new ROOT version (above); 2) complain on ROOT write errors |
done 4/5/2012, v08-02-01 |
New xroot redirector |
required step toward enabling HPSS staging |
done 4/3/2012, v3.1.1 |
Note also that the FILTEREDMERIT files contain 42 more events than the EXTENDEDFT1 files; they should be identical.
- 4/5/2012 - resume task. New goal is entire science dataset.
- 4/12/2012 - due to possible overload of sulky46/u18 writing a lot of core files, have introduced one change to processClumps.py: prepend "ulimit -c 0;" to gleam command to disable all core file generation. This starts approx with run 7605 (+/-).
- 5/9/2012 - major pipeline issue...shut down pipeline and allow to drain (due to tomorrow's major outage)
- 5/10/2012 - 13:40 outage over.
- Update GR from 17-35-24-rp04 to 17-35-24-rp07 in which the only change is replacing the 5-minute xroot time-out with 8 hours. This change effective with stream 14314 and previously failed pieces of four other runs: 14247.6, 14273.23, 14274.8, 14231.9.
- Leon advises that as of today, calibrations are valid only thru ~15 Dec 2011 (run 345574915) - which is somewhere around stream 18,400. He asks Sasha to produce more up-to-date calibs.
- 5/28/2012 - 15:30 Complete (through 31 March 2012)
- Data Catalog summary:
There are discrepancies to track down!
Name
Type
Files
Events
Size
Created (UTC)
Links
Group
20229
44,125,599,595
128.7 TB
25-Jan-2012 00:53:31
Group
5600
0
2.5 GB
02-Mar-2012 00:06:07
Group
20229
90,904,582
205.7 GB
25-Jan-2012 00:53:32
Group
5600
1,572,783,826
143.7 GB
02-Mar-2012 00:06:09
Group
5600
1,572,783,826
255.0 GB
02-Mar-2012 00:06:09
Group
20229
6,291,396,710
5.3 TB
25-Jan-2012 00:53:29
Group
5600
24,261,962
2.4 GB
02-Mar-2012 00:06:06
Group
20229
44,123,014,456
942.7 GB
25-Jan-2012 00:53:31
Group
5600
271,923,333
44.2 GB
02-Mar-2012 00:06:08
Group
20229
44,125,679,961
35.4 TB
25-Jan-2012 00:53:30
Group
20229
44,123,612,977
590.0 TB
25-Jan-2012 00:53:33
- Final trickleStream configuration:
=============================================================================== TRICKLE PARMS =============================================================================== task = P202-ROOT maxRuns = 20229 firstStep = setupRun steps = [['/processRun processClump', 2000, 21], ['mergeClumps', 200, 1]] maxStreamsPerCycle = 20 timePerCycle = 300 ------DEBUG---------------- maxCycles = 0 chatter = False dryRun = False ===============================================================================
- Data Catalog summary:
Configuration
Task Location |
/nfs/farm/g/glast/u38/Reprocess-tasks/P202-ROOT |
Task Status |
|
GlastRelease |
17-35-24-gr17 (SCons RHEL4-32 build) |
Run Selection |
based on a modified "standard" selection, see https://confluence.slac.stanford.edu/display/SCIGRPS/Official+LAT+Datasets |
s/c data |
"standard" Public Release 2 https://confluence.slac.stanford.edu/display/SCIGRPS/Official+LAT+Datasets |
Input Run List |
ftp://ftp-glast.slac.stanford.edu/glast.u38/Reprocess-tasks/P202-ROOT/config/runList.txt |
photonFilter |
CTBParticleType==1 && ((FT1EventClass & 0x00003EFF)!=0) |
electronFilter |
CTBParticleType==1 |
Code Variants |
redhat4-i686-32bit-gcc34 (Optimized) |
jobOpts |
ftp://ftp-glast.slac.stanford.edu/glast.u38/Reprocess-tasks/P202-ROOT/config/doRecon.txt |
Output Data Products |
Timing and Scaling
- processClump
- with 1300 jobs completed, the average time to run varies by processor type from 220 min (hequ) to 370 min (boer).
- with nearly 10,000 runs complete, the plots appear below:
- with 1300 jobs completed, the average time to run varies by processor type from 220 min (hequ) to 370 min (boer).
- mergeClumps
- with 42 jobs completed, the average time to run varies by processor type from 5-30 minutes.
- with 42 jobs completed, the average time to run varies by processor type from 5-30 minutes.
Load balancing
trickleStream parameters:
P202-FITS
This task generates all desired FITS data products.
Status chronology
- 3/2/2012 - Define block 1 as the 776 runs in P202-ROOT block 1. Configure trickleStream and begin (14:08)
- 3/31/2012 - Define block 2 as 5600 runs. Reconfig trickleStream and begin (18:05)
- 4/01/2012 - Block 2 complete (most of the 4824 jobs completed in about six hours w/1000 job limit).
Configuration
Task Location |
/nfs/farm/g/glast/u38/Reprocess-tasks/P202-FITS |
Task Status |
http://glast-ground.slac.stanford.edu/Pipeline-II/task.jsp?task=75031156 |
Input Data |
MERIT (direct from P202-ROOT) |
spacecraft data |
same as P202-ROOT |
Input Run List |
ftp://ftp-glast.slac.stanford.edu/glast.u38/Reprocess-tasks/P202-FITS/config/runList.txt |
evtClassDefs |
00-19-04 (6/23/2011) |
eventClassMap |
EvtClassDefs_P7V6.xml |
ScienceTools |
09-27-01 (2/15/2012) |
Code Variants |
redhat5-i686-32bit-gcc41 (Optimized) |
Diffuse Model |
based on contents of /afs/slac.stanford.edu/g/glast/ground/GLAST_EXT/diffuseModels/v2r0 |
Diffuse Response |
'source' using P7SOURCE_V6 IRF |
IRFs |
P6V7, contained within ScienceTools release |
Output Data Products |
Generation of output data products:
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="f5c2b372-5a23-42e3-a392-259d8bdf8c19"><ac:plain-text-body><![CDATA[ |
Data Product |
destination |
data content [1] |
event selection [1] |
makeFT1 |
gtselect |
gtdiffrsp |
gtmktime |
]]></ac:plain-text-body></ac:structured-macro> |
---|---|---|---|---|---|---|---|---|---|
EXTENDEDFT1 |
SLAC |
FT1variables |
((FT1EventClass & 0x00003EFF)!=0) |
|
|
|
|
||
FT1 |
FSSC+SLAC |
FT1variables |
'source' and above |
|
|
(inherited) |
|
||
EXTENDEDLS1 |
SLAC |
LS1variables |
((FT1EventClass & 0x00003EFF)!=0) |
|
|
|
|
||
LS1 |
FSSC+SLAC |
LS1variables |
'transient' and above |
|
|
(inherited) |
|
||
ELECTRONFT1 |
SLAC |
FT1variables |
CTBParticleType==1 |
|
|
|
|
[1] /afs/slac/g/glast/ground/releases/volume04/evtClassDefs/00-19-04/data
Note that diffuse response is calculated for 'source' and 'clean' event classes only.
Note on 'Code Variant': The SLAC batch farm contains a mixture of architectures , both hardware (Intel/AMD 64-bit) and software (RHEL5-64, gcc v4.1, etc.). At this time, GlastRelease builds only on RHEL4-32, while ScienceTools builds for RHEL5-32, RHEL5-64.