You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 28 Next »

status: Running
last update: 27 February 2012

This page is a record of the configuration and execution of the P202 reprocessing project, full reprocessing from DIGIs using Pass7 analysis code. This project involves reprocessing with Pass7 classification trees and new IRFs. This task will read DIGI files and emit RECON, MERIT, GCR and CAL ROOT files, and the standard array of FITS files. It will be a CPU-intensive and storage-intensive enterprise requiring months of elapsed time and of order 1 Pbyte of storage. At the time of this task beginning, there will be about 20,000 science runs in Fermi (3.5 years accumulation).

To avoid occupying a new 1 PB of disk space, the plan is to remove old RECON files once they have been reprocessed. This is a shell game that involves some amount of buffer space and then waiting until the new RECON file has been created and (to some extent) validated before removal. The old RECON files will be retained on tape in the HPSS system and they will be available via xroot (but with some delay as these large files are staged in).

The name "P202" derives from the word "processing" and the initial file version to be used for the output data products, e.g., r0123456789_v202_merit.root.

"New generation" tasks (using SCons, OO and common python scripts, etc.)
  • P202-ROOT - This task reads DIGI and produces reprocessed RECON + CAL + GCR + MERIT + FILTEREDMERIT (photons) + ELECTRONMERIT
  • P202-FITS - This task reads MERIT and produces FT1 (photons) + EXTENDEDFT1 + LS1 (merit-like FITS file for photons) + EXTENDEDLS1 + ELECTRONFITS file

    Datafile names, versions and locations

Data file version numbers for this reprocessing will begin with v202.

XROOT location and file naming

Location template:

/glast/Data/Flight/Reprocess/<reprocessName>/<dataType>

Locations for P202:

/glast/Data/Flight/Reprocess/P202/recon
/glast/Data/Flight/Reprocess/P202/cal
/glast/Data/Flight/Reprocess/P202/gcr
/glast/Data/Flight/Reprocess/P202/merit
/glast/Data/Flight/Reprocess/P202/filteredmerit
/glast/Data/Flight/Reprocess/P202/electronmerit
/glast/Data/Flight/Reprocess/P202/ft1
/glast/Data/Flight/Reprocess/P202/extendedft1
/glast/Data/Flight/Reprocess/P202/electronft1
/glast/Data/Flight/Reprocess/P202/ls1
/glast/Data/Flight/Reprocess/P202/extendedls1

File naming:

Data Type

aka

Send to FSSC

Naming template

RECON

 

No

r<run#>_<version>_<dataType>.root

CAL

 

No

r<run#>_<version>_<dataType>.root

GCR

 

No

r<run#>_<version>_<dataType>.root

MERIT

 

No

r<run#>_<version>_<dataType>.root

FILTEREDMERIT

 

No

r<run#>_<version>_<dataType>.root

ELECTRONMERIT

 

No

r<run#>_<version>_<dataType>.root

ELECTRONFT1

 

No

gll_el_p<procVer>_r<run#>_<version>.fit

EXTENDEDFT1

 

No

gll_xp_p<procVer>_r<run#>_<version>.fit

FT1

LS-002

Yes

gll_ph_p<procVer>_r<run#>_<version>.fit

EXTENDEDLS1

 

No

gll_xe_p<procVer>_r<run#>_<version>.fit

LS1

LS-001

Yes

gll_ev_p<procVer>_r<run#>_<version>.fit

Note: 'procVer' is a field added to the file name (and the keyword "PROC_VER" in the primary header) added to the FFD 5/12/2010. Ref: http://fermi.gsfc.nasa.gov/ssc/dev/current_documents/Science_DP_FFD_RevA.pdf

Examples:

/glast/Data/Flight/Reprocess/P200/recon/r0239557414_v202_recon.root
/glast/Data/Flight/Reprocess/P200/cal/r0239557414_v202_cal.root
/glast/Data/Flight/Reprocess/P200/gcr/r0239557414_v202_gcr.root
/glast/Data/Flight/Reprocess/P200/merit/r0239557414_v202_merit.root
/glast/Data/Flight/Reprocess/P200/filteredmerit/r0239557414_v202_filteredmerit.root
/glast/Data/Flight/Reprocess/P200/electronmerit/r0239557414_v202_electronmerit.root
/glast/Data/Flight/Reprocess/P200/extendedft1/gll_xp_p202_r0239559565_v202.fit
/glast/Data/Flight/Reprocess/P200/ft1/gll_ph_p202_r0239559565_v202.fit
/glast/Data/Flight/Reprocess/P200/electronft1/gll_el_p202_r0239559565_v202.fit
/glast/Data/Flight/Reprocess/P200/extendedls1/gll_xe_p202_r0239559565_v202.fit
/glast/Data/Flight/Reprocess/P200/ls1/gll_ev_p202_r0239559565_v202.fit
DataCatalog location and naming

Logical directory and group template:

Data/Flight/Reprocess/<reprocessName>:<dataType>

Note that the <dataType> field (following the colon) is a DataCatalog 'group' name, and file names are of the form r<run#>.

Naming examples:

Data/Flight/Reprocess/P202:RECON r0239557414
Data/Flight/Reprocess/P202:CAL r0239557414
Data/Flight/Reprocess/P202:GCR r0239557414
Data/Flight/Reprocess/P202:MERIT r0239557414
Data/Flight/Reprocess/P202:FILTEREDMERIT r0239557414
Data/Flight/Reprocess/P202:EXTENDEDFT1 r0239557414
Data/Flight/Reprocess/P202:FT1 r0239557414
Data/Flight/Reprocess/P202:ELECTRONFT1 r0239557414
Data/Flight/Reprocess/P202:EXTENDEDLS1 r0239557414
Data/Flight/Reprocess/P202:LS1 r0239557414

Data Sample

The currently defined data sample for P202 reprocessing includes:

First run

239557414 (MET), 2008-08-04 15:43:34 (UTC)

 

Last run

348951073 (MET), 2012-01-22 18:51:13 (UTC)

 

Total runs

19,172

 

Total input DIGI events

41,856,513,685

 

Total RECON events

 

 

Total CAL events

 

 

Total GCR events

 

 

Total MERIT events

 

all "events"

Total EXTENDEDFT1/LS1 events

 

all photon event classes

Total LS1 (FSSC selection) events

 

event classes (bits) 0,2,3,4 (transient, source, clean, ultraclean)

Total FT1 (FSSC selection) events

 

event classes (bits) 2,3,4 (source, clean, ultraclean)

Total disk space used

N/A

 

NOTE: One run, 242429468, of type TrigTest was declared 'good for science' but long after this task got started, so it has been intentionally omitted.

[to be continued...]

Bookkeeping

  1. (This page): Define ingredients of reprocessing (processing code/configuration changes)
  2. Processing History database: http://glast-ground.slac.stanford.edu/HistoryProcessing/HProcessingRuns.jsp?processingname=P202
    1. List of all reprocessings
    2. List of all data runs reprocessed
    3. Pointers to all input data files (-> dataCatalog)
    4. Pointers to associated task processes (-> Pipeline II status)
  3. Data Catalog database: http://glast-ground.slac.stanford.edu/DataCatalog/folder.jsp
    1. Lists of and pointers to all output data files
    2. Meta data associated with each output data product

P202-ROOT

Status chronology

  • 2/13/2012 - begin trials with final calibration and alignments from Leon; 5 runs reprocessed
  • 2/14/2012 - trials continue with blocks of 15, 20, 25 and 50 runs reprocessed (each run generates ~20 batch jobs)
  • 2/16/2012 - begin trickleStream production. Initial config:
    ===============================================================================
      TRICKLE PARMS
    ===============================================================================
    task =  P202-ROOT
    maxRuns =  19172
    firstStep =  setupRun
    steps =  [['/processRun processClump', 1500, 20], ['mergeClumps', 70, 1]]
    maxStreamsPerCycle =  20
    timePerCycle =  300
    ===============================================================================
    
  • 2/21/2012 - One clump reprocessed with pointer to new mySQL DB (stream 710.0)
  • 2/22/2012 - 776 runs complete. Pausing task.
  • 3/15/2012 - resume task. New goal is 1-year of data (~5600 runs)
  • 3/31/2012 - 1-year complete (5600 runs). DataCatalog summary:

Name

Files

Events

Size

FILTEREDMERIT

5600

1,572,783,868

1.3 TB

MERIT

5600

11,928,911,465

9.6 TB

CAL

5600

11,928,911,465

36.1 TB

GCR

5600

11,928,911,465

260.5 GB

RECON

5600

11,928,911,465

161.4 TB

Configuration

Task Location

/nfs/farm/g/glast/u38/Reprocess-tasks/P202-ROOT

Task Status

http://glast-ground.slac.stanford.edu/Pipeline-II/index.jsp

GlastRelease

17-35-24-gr17 (SCons RHEL4-32 build)

Run Selection

based on a modified "standard" selection, see https://confluence.slac.stanford.edu/display/SCIGRPS/Official+LAT+Datasets
(((sIntent=="nomSciOps" || sIntent=="nomSO_noSk_noCno_optGccc_allEna" || sIntent=="nomSciOps_diagEna" || (sIntent=="nomSciOps_Emin5MeV"&&RunMin>242070455) || nRun==242429468 ) && (RunQuality != "Bad" || is_null ( RunQuality ) ) ) || sIntent=="nadirOps" )

s/c data

"standard" Public Release 2 https://confluence.slac.stanford.edu/display/SCIGRPS/Official+LAT+Datasets

Input Run List

ftp://ftp-glast.slac.stanford.edu/glast.u38/Reprocess-tasks/P202-ROOT/config/runList.txt

photonFilter

CTBParticleType==1 && ((FT1EventClass & 0x00003EFF)!=0)
pass7.6_Extended_cuts_L1 in evtClassDefs

electronFilter

CTBParticleType==1

Code Variants

redhat4-i686-32bit-gcc34 (Optimized)

jobOpts

ftp://ftp-glast.slac.stanford.edu/glast.u38/Reprocess-tasks/P202-ROOT/config/doRecon.txt

Output Data Products

RECON, GCR, CAL, MERIT, FILTEREDMERIT, ELECTRONMERIT

Timing and Scaling

  • processClump - with 1300 jobs completed, the average time to run varies by processor type from 220 min (hequ) to 370 min (boer).
  • mergeClumps - with 42 jobs completed, the average time to run varies by processor type from 5-30 minutes.

Load balancing

trickleStream parameters:


P202-FITS

This task generates all desired FITS data products.

Status chronology

  • 3/2/2012 - Define block 1 as the 776 runs in P202-ROOT block 1. Configure trickleStream and begin (14:08)
  • 3/31/2012 - Define block 2 as 5600 runs. Reconfig trickleStream and begin (18:05)
  • 4/01/2012 - Block 2 complete (4824 jobs in about six hours).

Configuration

Task Location

/nfs/farm/g/glast/u38/Reprocess-tasks/P202-FITS

Task Status

http://glast-ground.slac.stanford.edu/Pipeline-II/task.jsp?task=75031156

Input Data

MERIT (direct from P202-ROOT)

spacecraft data

same as P202-ROOT

Input Run List

ftp://ftp-glast.slac.stanford.edu/glast.u38/Reprocess-tasks/P202-FITS/config/runList.txt

evtClassDefs

00-19-04 (6/23/2011)

eventClassMap

EvtClassDefs_P7V6.xml

ScienceTools

09-27-01 (2/15/2012)

Code Variants

redhat5-i686-32bit-gcc41 (Optimized)

Diffuse Model

based on contents of /afs/slac.stanford.edu/g/glast/ground/GLAST_EXT/diffuseModels/v2r0
(see

https://confluence.slac.stanford.edu/display/SCIGRPS/Quick+Start+with+Pass+7

)

Diffuse Response

'source' using P7SOURCE_V6 IRF
'clean' using P7CLEAN_V6 IRF

IRFs

P6V7, contained within ScienceTools release

Output Data Products

FT1, LS1, EXTENDEDFT1, EXTENDEDLS1, ELECTRONFT1

Generation of output data products:

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="4862633d-f030-4077-b682-1dad174d8a16"><ac:plain-text-body><![CDATA[

Data Product

destination

data content [1]

event selection [1]

makeFT1

gtselect

gtdiffrsp

gtmktime

]]></ac:plain-text-body></ac:structured-macro>

EXTENDEDFT1

SLAC

FT1variables

((FT1EventClass & 0x00003EFF)!=0)
pass7.6_Extended_cuts_L1

(tick)

(error)

(tick)

(tick)

FT1

FSSC+SLAC

FT1variables

'source' and above
EVENT_CLASS bits 2,3,4
evclass=2 filtered from EXTENDEDFT1

(error)

(tick)

(inherited)

(tick)

EXTENDEDLS1

SLAC

LS1variables

((FT1EventClass & 0x00003EFF)!=0)
pass7.6_Extended_cuts_L1

(tick)

(error)

(tick)

(tick)

LS1

FSSC+SLAC

LS1variables

'transient' and above
EVENT_CLASS bits 0,2,3,4
evclass=0 filtered from EXTENDEDLS1

(error)

(tick)

(inherited)

(tick)

ELECTRONFT1

SLAC

FT1variables

CTBParticleType==1
pass7.6_Electrons_cuts_L1

(tick)

(error)

(error)

(tick)

[1] /afs/slac/g/glast/ground/releases/volume04/evtClassDefs/00-19-04/data

Note that diffuse response is calculated for 'source' and 'clean' event classes only.

Note on 'Code Variant': The SLAC batch farm contains a mixture of architectures , both hardware (Intel/AMD 64-bit) and software (RHEL5-64, gcc v4.1, etc.). At this time, GlastRelease builds only on RHEL4-32, while ScienceTools builds for RHEL5-32, RHEL5-64.

Timing

  • No labels