...
First run | 239557414 (MET), 2008-08-04 15:43:34 (UTC) |
|
Last run | 348951073 354923690 (MET), 2012-0103-22 1831 21:5154:13 48 (UTC) |
|
Total runs | 19 20,172 229 |
|
Total input DIGI events | 41 44,856125,513679,685 961 |
|
Total RECON events | 44,125,679,961 |
|
Total CAL events | 44,125,679,961 |
|
Total GCR events | 44,125,679,961 |
|
Total MERIT events | 44,125,679,961 | all "events" |
Total EXTENDEDFT1/LS1 FILTEREDMERIT events | 6,291,396,711 | all photon event classes |
Total ELECTRONMERIT events | 90,904,582 | all electron events |
Generation of FITS files is a second step in the reprocessing and has only been run on the first year of data. Stay tuned...
Total EXTENDEDFT1/LS1 events |
| selected |
| all photon event classes |
Total LS1 (FSSC selection) events |
| event classes (bits) 0,2,3,4 (transient, source, clean, ultraclean) | ||
Total FT1 (FSSC selection) events |
| event classes (bits) 2,3,4 (source, clean, ultraclean) | ||
Total disk space used | N/A |
|
NOTE: One run, 242429468, of type TrigTest was declared 'good for science' and has been included.
Progress at the 1-year mark:
First run | 239557414 (MET), 2008-08-04 15:43:34 (UTC) |
| |
Last run | 271999199 (MET), 2009-08-15 03:19:57 (UTC) |
| |
Total runs | 5600 |
| |
Total input DIGI events | 11,928,911,465 |
| |
Total RECON events | 11,928,911,465 | 161.4 TB | |
Total CAL events | 11,928,911,465 | 36.1 TB | |
Total GCR events | 11,928,911,465 | 260.5 GB | |
Total MERIT events | 11,928,911,465 | 9.6 TB | all triggered events |
Total FILTEREDMERIT events | 1,572,783,868 | 1.3 TB | all photon event classes |
Total EXTENDEDFT1 | 1,572,783,826 | 143.7 GB | all photon event classes |
Total LS1 events | 1,572,783,826 | 255.0 GB | all photon event classes |
Total LS1 (FSSC selection) events | 271,923,333 | 44.2 GB | event classes (bits) 0,2,3,4 (transient, source, clean, ultraclean) |
Total FT1 (FSSC selection) events | 24,261,962 | 2.4 GB | event classes (bits) 2,3,4 (source, clean, ultraclean) |
Wiki Markup |
---|
\[to be continued...\] |
Bookkeeping
- (This page): Define ingredients of reprocessing (processing code/configuration changes)
- Processing History database: http://glast-ground.slac.stanford.edu/HistoryProcessing/HProcessingRuns.jsp?processingname=P202
- List of all reprocessings
- List of all data runs reprocessed
- Pointers to all input data files (-> dataCatalog)
- Pointers to associated task processes (-> Pipeline II status)
- Data Catalog database: http://glast-ground.slac.stanford.edu/DataCatalog/folder.jsp
- Lists of and pointers to all output data files
- Meta data associated with each output data product
...
Bookkeeping
- (This page): Define ingredients of reprocessing (processing code/configuration changes)
- Processing History database: http://glast-ground.slac.stanford.edu/HistoryProcessing/HProcessingRuns.jsp?processingname=P202
- List of all reprocessings
- List of all data runs reprocessed
- Pointers to all input data files (-> dataCatalog)
- Pointers to associated task processes (-> Pipeline II status)
- Data Catalog database: http://glast-ground.slac.stanford.edu/DataCatalog/folder.jsp
- Lists of and pointers to all output data files
- Meta data associated with each output data product
...
P202-ROOT
Anchor | ||||
---|---|---|---|---|
|
Status chronology
- 2/13/2012 - begin trials with final calibration and alignments from Leon; 5 runs reprocessed
- 2/14/2012 - trials continue with blocks of 15, 20, 25 and 50 runs reprocessed (each run generates ~20 batch jobs)
- 2/16/2012 - begin trickleStream production. Initial config:
Code Block =============================================================================== TRICKLE PARMS =============================================================================== task = P202-ROOT maxRuns = 19172 firstStep = setupRun steps = [['/processRun processClump', 1500, 20], ['mergeClumps', 70, 1]] maxStreamsPerCycle = 20 timePerCycle = 300 ===============================================================================
- 2/21/2012 - One clump reprocessed with pointer to new mySQL DB (stream 710.0)
- 2/22/2012 - 776 runs complete. Pausing task.
- 3/15/2012 - resume task. New goal is 1-year of data (~5600 runs)
- 3/31/2012 - 1-year complete (5600 runs). There have been a few nasty problems which need to be fixed before continuing. Note also that the FILTEREDMERIT files contain 42 more events than the EXTENDEDFT1 files; they should be identical.
S/W component
bug fix
status
New ROOT version
5-min 'transaction timeout' triggered by xroot data server reboot
done 4/3/2012
New GlastRelease
1) include new ROOT version (above); 2) exit with non-zero RC on ROOT write error
done 4/5/2012, GR 17-35-24-rp04
New GPL_TOOLS
check size/checksum of file written to xroot with known size/checksum
pending
Tuned xroot on new Dell servers
silent file truncation when volume fills up JIRA
done 4/4/2012 (100 MB min space limit -> 100 GB; file system space check cadence changed from 10 min to 2 min)
New xroot client tools
complain when xroot data server fails on write
done 4/3/2012, v3.1.1
New TSkim
1) new ROOT version (above); 2) complain on ROOT write errors
done 4/5/2012, v08-02-01
New xroot redirector
required step toward enabling HPSS staging
done 4/3/2012, v3.1.1
- 4/5/2012 - resume task. New goal is entire science dataset.
- 4/10/2012 - Unknown 'glitch' may have caused a few 100's of jobs to crash and take sulky46 along with them.
- 4/11/2012 - 10:40pm lightening strikes SLAC power lines. Site-wide power outage. Stream 7795 was the last stream submitted prior to the outage.
- 4/12/2012 - due to possible overload of sulky46/u18 writing a lot of core files, have introduced one change to processClumps.py: prepend "ulimit -c 0;" to gleam command to disable all core file generation. This starts approx with run 7605 (+/-).
- 5/9/2012 - major pipeline issue...shut down pipeline and allow to drain (due to tomorrow's major outage)
- 5/10/2012 - 13:40 outage over.
- Update GR from 17-35-24-rp04 to 17-35-24-rp07 in which the only change is replacing the 5-minute xroot time-out with 8 hours. This change effective with stream 14314 and previously failed pieces of four other runs: 14247.6, 14273.23, 14274.8, 14231.9.
- Leon advises that as of today, calibrations are valid only thru ~15 Dec 2011 (run 345574915) - which is somewhere around stream 18,400. He asks Sasha to produce more up-to-date calibs.
- 5/18/2012 - all calibrations now valid through 6 May 2012. No need to pause P202 task.
- 5/28/2012 - 15:30 Complete (through 31 March 2012)
- Data Catalog summary:
Name
Type
Files
Events
Size
Created (UTC)
Links
Group
20229
44,125,599,595
128.7 TB
25-Jan-2012 00:53:31
Group
5600
0
2.5 GB
02-Mar-2012 00:06:07
Group
20229
90,904,582
205.7 GB
25-Jan-2012 00:53:32
Group
5600
1,572,783,826
143.7 GB
02-Mar-2012 00:06:09
Group
5600
1,572,783,826
255.0 GB
02-Mar-2012 00:06:09
Group
20229
6,291,396,710
5.3 TB
25-Jan-2012 00:53:29
Group
5600
24,261,962
2.4 GB
02-Mar-2012 00:06:06
Group
20229
44,123,014,456
942.7 GB
25-Jan-2012 00:53:31
Group
5600
271,923,333
44.2 GB
02-Mar-2012 00:06:08
Group
20229
44,125,679,961
35.4 TB
25-Jan-2012 00:53:30
Group
20229
44,123,612,977
590.0
- Data Catalog summary:
...
Status chronology
- 2/13/2012 - begin trials with final calibration and alignments from Leon; 5 runs reprocessed
- 2/14/2012 - trials continue with blocks of 15, 20, 25 and 50 runs reprocessed (each run generates ~20 batch jobs)
- 2/16/2012 - begin trickleStream production. Initial config:
Code Block =============================================================================== TRICKLE PARMS =============================================================================== task = P202-ROOT maxRuns = 19172 firstStep = setupRun steps = [['/processRun processClump', 1500, 20], ['mergeClumps', 70, 1]] maxStreamsPerCycle = 20 timePerCycle = 300 ===============================================================================
- 2/21/2012 - One clump reprocessed with pointer to new mySQL DB (stream 710.0)
- 2/22/2012 - 776 runs complete. Pausing task.
- 3/15/2012 - resume task. New goal is 1-year of data (~5600 runs)
- 3/31/2012 - 1-year complete (5600 runs). There have been a few nasty problems which need to be fixed before continuing. Note also that the FILTEREDMERIT files contain 42 more events than the EXTENDEDFT1 files; they should be identical.
S/W component
bug fix
status
New ROOT version
5-min 'transaction timeout' triggered by xroot data server reboot
done 4/3/2012
New GlastRelease
1) include new ROOT version (above); 2) exit with non-zero RC on ROOT write error
done 4/5/2012, GR 17-35-24-rp04
New GPL_TOOLS
check size/checksum of file written to xroot with known size/checksum
pending
Tuned xroot on new Dell servers
silent file truncation when volume fills up JIRA
done 4/4/2012 (100 MB min space limit -> 100 GB; file system space check cadence changed from 10 min to 2 min)
New xroot client tools
complain when xroot data server fails on write
done 4/3/2012, v3.1.1
New TSkim
1) new ROOT version (above); 2) complain on ROOT write errors
done 4/5/2012, v08-02-01
New xroot redirector
required step toward enabling HPSS staging
done 4/3/2012, v3.1.1
- 4/5/2012 - resume task. New goal is entire science dataset.
- 4/10/2012 - Unknown 'glitch' may have caused a few 100's of jobs to crash and take sulky46 along with them.
- 4/11/2012 - 10:40pm lightening strikes SLAC power lines. Site-wide power outage. Stream 7795 was the last stream submitted prior to the outage.
- 4/12/2012 - due to possible overload of sulky46/u18 writing a lot of core files, have introduced one change to processClumps.py: prepend "ulimit -c 0;" to gleam command to disable all core file generation. This starts approx with run 7605 (+/-).
- 5/9/2012 - major pipeline issue...shut down pipeline and allow to drain (due to tomorrow's major outage)
- 5/10/2012 - 13:40 outage over.
- Update GR from 17-35-24-rp04 to 17-35-24-rp07 in which the only change is replacing the 5-minute xroot time-out with 8 hours. This change effective with stream 14314 and previously failed pieces of four other runs: 14247.6, 14273.23, 14274.8, 14231.9.
- Leon advises that as of today, calibrations are valid only thru ~15 Dec 2011 (run 345574915) - which is somewhere around stream 18,400. He asks Sasha to produce more up-to-date calibs.
- 5/18/2012 - all calibrations now valid through 6 May 2012. No need to pause P202 task.
- 5/28/2012 - 15:30 Complete (through 31 March 2012)
- Data Catalog summary: There are discrepancies to track down!
Name
Type
Files
Events
Size
Created (UTC)
Links
Group
20229
44,125,599,595
128.7 TB
25-Jan-2012 00:53:31 33
Group
5600
0
2.5 GB
02-Mar-2012 00:06:07
Group
20229
90,904,582
205.7 GB
25-Jan-2012 00:53:32
Group
5600
1,572,783,826
143.7 GB
02-Mar-2012 00:06:09
Group
5600
1,572,783,826
255.0 GB
02-Mar-2012 00:06:09
Group
20229
6,291,396,710
5.3 TB
25-Jan-2012 00:53:29
Group
5600
24,261,962
2.4 GB
02-Mar-2012 00:06:06
Group
20229
44,123,014,456
942.7 GB
25-Jan-2012 00:53:31
Group
5600
271,923,333
44.2 GB
02-Mar-2012 00:06:08
Group
20229
44,125,679,961
35.4 TB
25-Jan-2012 00:53:30
Group
20229
44,123,612,977
590.0 TB
25-Jan-2012 00:53:33
Turns out to be three problematic runs/streams:- 272707024/5723 - I/O prob, corrupt files, entire stream rolled back
- 279108810/6847 - xroot transient access prob., re-registered in dataCat
- 284813327/7848 - xroot transient access prob., re-registered in dataCat
====== There are discrepancies to track down!Code Block
Turns out to be three problematic runs/streams:- 272707024/5723 - I/O prob, corrupt files, entire stream rolled back
- 279108810/6847 - xroot transient access prob., re-registered in dataCat
- 284813327/7848 - xroot transient access prob., re-registered in dataCat
- Data Catalog summary:
- Final trickleStream configuration:
=============== TRICKLE PARMS =============================================================================== task = P202-ROOT maxRuns = 20229 firstStep = setupRun steps = [['/processRun processClump', 2000, 21], ['mergeClumps', 200, 1]] maxStreamsPerCycle = 20 timePerCycle = 300 ------DEBUG---------------- maxCycles = 0 chatter = False dryRun = False ===============================================================================Code Block =============================================================================== TRICKLE PARMS =============================================================================== task = P202-ROOT maxRuns = 20229 firstStep = setupRun steps = [['/processRun processClump', 2000, 21], ['mergeClumps', 200, 1]] maxStreamsPerCycle = 20 timePerCycle = 300 ------DEBUG---------------- maxCycles = 0 chatter = False dryRun = False ===============================================================================
- 5/31/2012 - Rolling back all or part of the three runs above solved the discrepancies in # events. New dataCatalog tally looks like this:
Name | Type | Files | Events | Size | Created (UTC) | Links |
---|---|---|---|---|---|---|
Group | 20229 | 44,125,679,961 | 128.7 TB | 25-Jan-2012 00:53:31 | ||
Group | 20229 | 90,904,582 | 205.7 GB | 25-Jan-2012 00:53:32 | ||
Group | 20229 | 6,291,396,711 | 5.3 TB | 25-Jan-2012 00:53:29 | ||
Group | 20229 | 44,125,679,961 | 942.7 GB | 25-Jan-2012 00:53:31 | ||
Group | 5600 | 271,923,333 | 44.2 GB | 02-Mar-2012 00:06:08 | ||
Group | 20229 | 44,125,679,961 | 35.4 TB | 25-Jan-2012 00:53:30 | ||
Group | 20229 | 44,125,679,961 | 590.0 TB | 25-Jan-2012 00:53:33 |
Configuration
Task Location | /nfs/farm/g/glast/u38/Reprocess-tasks/P202-ROOT |
Task Status | |
GlastRelease | 17-35-24-gr17 (SCons RHEL4-32 build) |
Run Selection | based on a modified "standard" selection, see https://confluence.slac.stanford.edu/display/SCIGRPS/Official+LAT+Datasets |
s/c data | "standard" Public Release 2 https://confluence.slac.stanford.edu/display/SCIGRPS/Official+LAT+Datasets |
Input Run List | ftp://ftp-glast.slac.stanford.edu/glast.u38/Reprocess-tasks/P202-ROOT/config/runList.txt |
photonFilter | CTBParticleType==1 && ((FT1EventClass & 0x00003EFF)!=0) |
electronFilter | CTBParticleType==1 |
Code Variants | redhat4-i686-32bit-gcc34 (Optimized) |
jobOpts | ftp://ftp-glast.slac.stanford.edu/glast.u38/Reprocess-tasks/P202-ROOT/config/doRecon.txt |
Output Data Products |
...
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="ea60160a858c225b-68b99397-46a04d43-94369d50-376e3c7a8edc2f319402620d"><ac:plain-text-body><![CDATA[ | Data Product | destination | data content [1] | event selection [1] | makeFT1 | gtselect | gtdiffrsp | gtmktime | ]]></ac:plain-text-body></ac:structured-macro> |
---|---|---|---|---|---|---|---|---|---|
EXTENDEDFT1 | SLAC | FT1variables | ((FT1EventClass & 0x00003EFF)!=0) | | | | | ||
FT1 | FSSC+SLAC | FT1variables | 'source' and above | | | (inherited) | | ||
EXTENDEDLS1 | SLAC | LS1variables | ((FT1EventClass & 0x00003EFF)!=0) | | | | | ||
LS1 | FSSC+SLAC | LS1variables | 'transient' and above | | | (inherited) | | ||
ELECTRONFT1 | SLAC | FT1variables | CTBParticleType==1 | | | | |
...