status: Complete
last update: 07 Nov 2013

This page is a record of the configuration and execution of the P202 reprocessing project, full reprocessing from DIGIs using Pass7 analysis code. This project involves reprocessing with Pass7 classification trees and up-to-date alignment/calibration data. This task will read DIGI files and emit RECON, MERIT, GCR and CAL ROOT files, and the standard array of FITS files. It will be a CPU-intensive and storage-intensive enterprise requiring months of elapsed time and of order 0.7 Pbyte of storage. At the time of this task beginning, there will be about 20,000 science runs in Fermi (3.5 years accumulation).

To avoid occupying a new 0.7 PB of disk space, the plan is to remove old RECON files once they have been reprocessed. This is a shell game that involves some amount of buffer space and then waiting until the new RECON file has been created and (to some extent) validated before removal. The old RECON files will be retained on tape in the HPSS system and they will be available via xroot (but with some delay as these large files are staged in). In addition, old CAL files will be removed from disk without being stored on tape.

The name "P202" derives from the word "processing" and the initial file version to be used for the output data products, e.g., r0123456789_v202_merit.root.

"New generation" tasks (using SCons builds, rewritten task scripts, common python scripts, etc.)

Datafile names, versions and locations

Data file version numbers for this reprocessing will begin with v202.

XROOT location and file naming

Location template:

/glast/Data/Flight/Reprocess/<reprocessName>/<dataType>

Locations for P202:

/glast/Data/Flight/Reprocess/P202/recon
/glast/Data/Flight/Reprocess/P202/cal
/glast/Data/Flight/Reprocess/P202/gcr
/glast/Data/Flight/Reprocess/P202/merit
/glast/Data/Flight/Reprocess/P202/filteredmerit
/glast/Data/Flight/Reprocess/P202/electronmerit
/glast/Data/Flight/Reprocess/P202/ft1
/glast/Data/Flight/Reprocess/P202/extendedft1
/glast/Data/Flight/Reprocess/P202/electronft1
/glast/Data/Flight/Reprocess/P202/ls1
/glast/Data/Flight/Reprocess/P202/extendedls1

File naming:

Data Type

aka

Send to FSSC

Naming template

RECON

 

No

r<run#>_<version>_<dataType>.root

CAL

 

No

r<run#>_<version>_<dataType>.root

GCR

 

No

r<run#>_<version>_<dataType>.root

MERIT

 

No

r<run#>_<version>_<dataType>.root

FILTEREDMERIT

 

No

r<run#>_<version>_<dataType>.root

ELECTRONMERIT

 

No

r<run#>_<version>_<dataType>.root

ELECTRONFT1

 

No

gll_el_p<procVer>_r<run#>_<version>.fit

EXTENDEDFT1

 

No

gll_xp_p<procVer>_r<run#>_<version>.fit

FT1

LS-002

Yes

gll_ph_p<procVer>_r<run#>_<version>.fit

EXTENDEDLS1

 

No

gll_xe_p<procVer>_r<run#>_<version>.fit

LS1

LS-001

Yes

gll_ev_p<procVer>_r<run#>_<version>.fit

Note: 'procVer' is a field added to the file name (and the keyword "PROC_VER" in the primary header) added to the FFD 5/12/2010. Ref: http://fermi.gsfc.nasa.gov/ssc/dev/current_documents/Science_DP_FFD_RevA.pdf

Examples:

/glast/Data/Flight/Reprocess/P200/recon/r0239557414_v202_recon.root
/glast/Data/Flight/Reprocess/P200/cal/r0239557414_v202_cal.root
/glast/Data/Flight/Reprocess/P200/gcr/r0239557414_v202_gcr.root
/glast/Data/Flight/Reprocess/P200/merit/r0239557414_v202_merit.root
/glast/Data/Flight/Reprocess/P200/filteredmerit/r0239557414_v202_filteredmerit.root
/glast/Data/Flight/Reprocess/P200/electronmerit/r0239557414_v202_electronmerit.root
/glast/Data/Flight/Reprocess/P200/extendedft1/gll_xp_p202_r0239559565_v202.fit
/glast/Data/Flight/Reprocess/P200/ft1/gll_ph_p202_r0239559565_v202.fit
/glast/Data/Flight/Reprocess/P200/electronft1/gll_el_p202_r0239559565_v202.fit
/glast/Data/Flight/Reprocess/P200/extendedls1/gll_xe_p202_r0239559565_v202.fit
/glast/Data/Flight/Reprocess/P200/ls1/gll_ev_p202_r0239559565_v202.fit
DataCatalog location and naming

Logical directory and group template:

Data/Flight/Reprocess/<reprocessName>:<dataType>

Note that the <dataType> field (following the colon) is a DataCatalog 'group' name, and file names are of the form r<run#>.

Naming examples:

Data/Flight/Reprocess/P202:RECON r0239557414
Data/Flight/Reprocess/P202:CAL r0239557414
Data/Flight/Reprocess/P202:GCR r0239557414
Data/Flight/Reprocess/P202:MERIT r0239557414
Data/Flight/Reprocess/P202:FILTEREDMERIT r0239557414
Data/Flight/Reprocess/P202:EXTENDEDFT1 r0239557414
Data/Flight/Reprocess/P202:FT1 r0239557414
Data/Flight/Reprocess/P202:ELECTRONFT1 r0239557414
Data/Flight/Reprocess/P202:EXTENDEDLS1 r0239557414
Data/Flight/Reprocess/P202:LS1 r0239557414

Data Sample

The currently defined data sample (as of May 2012) for P202 reprocessing includes:

First run

239557414 (MET), 2008-08-04 15:43:34 (UTC)

Last run

354923690 (MET), 2012-03-31 21:54:48 (UTC)

Total runs

20,229

Total input DIGI events

44,125,679,961

 

Total RECON events

44,125,679,961

 

Total CAL events

44,125,679,961

 

Total GCR events

44,125,679,961

 

Total MERIT events

44,125,679,961

all "events"

Total FILTEREDMERIT events

6,291,424,926

selected photon event classes

Total ELECTRONMERIT events

90,904,582

all electron events

Generation of FITS files is a second step in the reprocessing and has only been run on the first year of data. Stay tuned...

Total EXTENDEDFT1/LS1 events

6,291,424,926

selected photon event classes

Total LS1 (FSSC selection) events

1,325,204,821

event classes (bits) 0,2,3,4 (transient, source, clean, ultraclean)

Total FT1 (FSSC selection) events

189,323,074

event classes (bits) 2,3,4 (source, clean, ultraclean)

Total disk space used

762.4 TB

 

Total effective disk footprint

43.7 TB

after removal of old RECON and CAL files

NOTE: One run, 242429468, of type TrigTest was declared 'good for science' and has been included.

Bookkeeping

  1. (This page): Define ingredients of reprocessing (processing code/configuration changes)
  2. Processing History database: http://glast-ground.slac.stanford.edu/HistoryProcessing/HProcessingRuns.jsp?processingname=P202
    1. List of all reprocessings
    2. List of all data runs reprocessed
    3. Pointers to all input data files (-> dataCatalog)
    4. Pointers to associated task processes (-> Pipeline II status)
  3. Data Catalog database: http://glast-ground.slac.stanford.edu/DataCatalog/folder.jsp
    1. Lists of and pointers to all output data files
    2. Meta data associated with each output data product

P202-ROOT

Status chronology

Stream

Run

Comment

25232

383219654

Truncated run (~9 min), recovered, rolled back

26263

338868584

mysteriously appeared in most recent genRunFile cycle, had to append to end of runList

What happened? Warren says this run is perfectly normal. Could have the "Intents" changed? This single orphan run, tacked onto the end of block7 (run 389089696) and will be known as "block 8" (one new run and one updated run).

Configuration

Task Location

/nfs/farm/g/glast/u38/Reprocess-tasks/P202-ROOT

Task Status

http://glast-ground.slac.stanford.edu/Pipeline-II/index.jsp

GlastRelease

17-35-24-gr17 and 17-35-24-rp04 (SCons RHEL4-32 build)

Run Selection

based on a modified "standard" selection, see https://confluence.slac.stanford.edu/display/SCIGRPS/Official+LAT+Datasets
(((sIntent=="nomSciOps" || sIntent=="nomSO_noSk_noCno_optGccc_allEna" || sIntent=="nomSciOps_diagEna" || (sIntent=="nomSciOps_Emin5MeV"&&RunMin>242070455) || nRun==242429468 ) && (RunQuality != "Bad" || is_null ( RunQuality ) ) ) || sIntent=="nadirOps" )

s/c data

"standard" Public Release 2 https://confluence.slac.stanford.edu/display/SCIGRPS/Official+LAT+Datasets

Input Run List

ftp://ftp-glast.slac.stanford.edu/glast.u38/Reprocess-tasks/P202-ROOT/config/runList.txt

photonFilter

CTBParticleType==1 && ((FT1EventClass & 0x00003EFF)!=0)
pass7.6_Extended_cuts_L1 in evtClassDefs

electronFilter

CTBParticleType==1

Code Variants

redhat4-i686-32bit-gcc34 (Optimized)

jobOpts

ftp://ftp-glast.slac.stanford.edu/glast.u38/Reprocess-tasks/P202-ROOT/config/doRecon.txt

Output Data Products

RECON, GCR, CAL, MERIT, FILTEREDMERIT, ELECTRONMERIT

Timing and Scaling

Load balancing

trickleStream parameters (see above).


P202-FITS

This task generates all desired FITS data products.

Status chronology

Name

Files

Events

Size

CAL

26263

57,227,320,767

166.0 TB

DIGIGAP

24200

0

19.0 kB

ELECTRONFT1

26263

0

10.9 GB

ELECTRONMERIT

26263

116,799,950

263.6 GB

EXTENDEDFT1

26263

8,284,002,713

725.9 GB

EXTENDEDLS1

26263

8,284,002,713

1.3 TB

FILTEREDMERIT

26263

8,284,037,323

7.0 TB

FT1

26263

268,810,274

24.2 GB

GCR

26263

57,227,320,767

1.2 TB

LS1

26263

1,782,493,106

289.6 GB

MERIT

26263

57,227,320,767

45.8 TB

RECON

26263

57,227,320,767

763.1 TB

Discrepancy between FILTEREDMERIT and EXTENDED{LS1,FT1}. This turns out to be an issue with tstart/tstop for run 383219654.

Name

Files

Events

Size

Created (UTC)

CAL

27873

60,682,674,790

175.8 TB

25-Jan-2012 00:53:31

ELECTRONFT1

27873

0

11.5 GB

02-Mar-2012 00:06:07

ELECTRONMERIT

27873

123,494,286

278.5 GB

25-Jan-2012 00:53:32

EXTENDEDFT1

27873

8,811,129,094

772.1 GB

02-Mar-2012 00:06:09

EXTENDEDLS1

27873

8,811,129,094

1.4 TB

02-Mar-2012 00:06:09

FILTEREDMERIT

27873

8,811,129,090

7.5 TB

25-Jan-2012 00:53:29

FT1

27873

289,969,364

26.1 GB

02-Mar-2012 00:06:06

GCR

27873

60,682,674,790

1.3 TB

25-Jan-2012 00:53:31

LS1

27873

1,903,568,484

309.3 GB

02-Mar-2012 00:06:08

MERIT

27873

60,682,674,790

48.6 TB

25-Jan-2012 00:53:30

RECON

27873

60,682,674,790

808.5 TB

25-Jan-2012 00:53:33

Configuration

Task Location

/nfs/farm/g/glast/u38/Reprocess-tasks/P202-FITS

Task Status

http://glast-ground.slac.stanford.edu/Pipeline-II/task.jsp?task=107152539

Input Data

MERIT (direct from P202-ROOT)

spacecraft data

same as P202-ROOT

Input Run List

ftp://ftp-glast.slac.stanford.edu/glast.u38/Reprocess-tasks/P202-FITS/config/runList.txt

evtClassDefs

00-19-05 (March 2013, changed pass_ver to P7REP)

eventClassMap

EvtClassDefs_P7V6.xml

ScienceTools

09-32-03 (6/7/2013) (but ST may report themselves as 09-32-02 due to RM snafu)

Code Variants

redhat5-x86_64-64bit-gcc41, redhat6-x86_64-64bit-gcc44 (Optimized)

Diffuse Model

based on contents of /afs/slac.stanford.edu/g/glast/ground/GLAST_EXT/diffuseModels/v4r0
(see https://confluence.slac.stanford.edu/display/SCIGRPS/Quick+Start+with+Pass+7)

Diffuse Response

'source' using P7REP_SOURCE_V15 IRF
'clean' using P7REP_CLEAN_V15 IRF

IRFs

P7REP_*_V15, contained within ScienceTools release

Output Data Products

FT1, LS1, EXTENDEDFT1, EXTENDEDLS1, ELECTRONFT1

Generation of output data products:

Data Product

destination

data content [1]

event selection [1]

makeFT1

gtselect

gtdiffrsp

gtmktime

EXTENDEDFT1

SLAC

FT1variables

((FT1EventClass & 0x00003EFF)!=0)
pass7.6_Extended_cuts_L1

(tick)

(error)

(tick)

(tick)

FT1

FSSC+SLAC

FT1variables

'source' and above
EVENT_CLASS bits 2,3,4
evclass=2 filtered from EXTENDEDFT1

(error)

(tick)

(inherited)

(tick)

EXTENDEDLS1

SLAC

LS1variables

((FT1EventClass & 0x00003EFF)!=0)
pass7.6_Extended_cuts_L1

(tick)

(error)

(tick)

(tick)

LS1

FSSC+SLAC

LS1variables

'transient' and above
EVENT_CLASS bits 0,2,3,4
evclass=0 filtered from EXTENDEDLS1

(error)

(tick)

(inherited)

(tick)

ELECTRONFT1

SLAC

FT1variables

CTBParticleType==1
pass7.6_Electrons_cuts_L1

(tick)

(error)

(error)

(tick)

[1] /afs/slac/g/glast/ground/releases/volume04/evtClassDefs/00-19-04/data

Note that diffuse response is calculated for 'source' and 'clean' event classes only.

Note on 'Code Variant': The SLAC batch farm contains a mixture of architectures , both hardware (Intel/AMD 64-bit) and software (RHEL5-64, gcc v4.1, etc.). At this time, GlastRelease builds only on RHEL4-32, while ScienceTools builds for RHEL5-32, RHEL5-64.

Timing


P202-LEO-ROOT

Status chronology

Name

Type

Files

Events

Size

Created (UTC)

CAL

Group

262

608,752,392

1.7 TB

10-Aug-2012 10:17:29

ELECTRONMERIT

Group

262

1,077,986

2.3 GB

10-Aug-2012 10:17:30

FILTEREDMERIT

Group

262

142,672,239

120.4 GB

10-Aug-2012 10:17:27

GCR

Group

262

608,752,392

13.6 GB

10-Aug-2012 10:17:28

MERIT

Group

262

608,752,392

499.9 GB

10-Aug-2012 10:17:27

RECON

Group

262

608,752,392

8.2 TB

10-Aug-2012 10:17:30

Configuration

Identical with P202-ROOT except for the list of runs to be processed...with one exception: to reprocess the four extra (out-of-order) L&EO runs, disable the event list sort.

Timing


P202 Update Checklist

A checklist for updating a new block of reprocessed data.

 

Before

 

determine first and last runs to reprocess.

 

update genRunFile.csh and generate new list

 

run checkRunList.py with new and old run lists

 

run tkdiff with new and old run lists

 

verify calibration constants are valid for new block

 

check if new generation FT2 was introduced mid-block

 

update trickleStream.py with new run count

 

During

 

monitor NFS and xroot performance

 

periodically cleanup xroot scratch space

 

periodically cleanup old RECON/CAL files (via list to Wilko)

 

After

 

run log scanner for silent root/xroot failures

 

check dataCatalog statistics for consistency

 

run xroot scratch cleanup procedure

 

provide Wilko with list of old L1 RECON/CAL files to be removed from xroot disk