You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 20 Next »

P120 Reprocessing

status: In Progress
last update: 9 August 2010

This page is a record of the configuration for the P120 reprocessing project, event reclassification using Pass 7.3. This project involves reprocessing with Pass7 classification trees and (ultimately) new IRFs. The name "P120" derives from the word "processing" and the initial file version to be used for the output data products, e.g., r0123456789_v120_merit.root.

  • P120-MERIT - this task reads DIGI+RECON+MERIT and produces reprocessed MERIT + FILTEREDMERIT (photons) + ELECTRONMERIT
  • P120-FITS - this task will (eventually) read FILTEREDMERIT and produce FT1 (photons) + LS1 (merit-like FITS file for photons) + electron FITS file + LS3 (live-time cube)

Datafile names, versions and locations

Data file version numbers for this reprocessing will begin with v110.

XROOT location and file naming

Location template:

/glast/Data/Flight/Reprocess/<reprocessName>/<dataType>

Locations for P120:

/glast/Data/Flight/Reprocess/P120/merit
/glast/Data/Flight/Reprocess/P120/filteredmerit
/glast/Data/Flight/Reprocess/P120/electronmerit
/glast/Data/Flight/Reprocess/P120/ft1
/glast/Data/Flight/Reprocess/P120/electronft1
/glast/Data/Flight/Reprocess/P120/ls1
/glast/Data/Flight/Reprocess/P120/ls3

File naming:

Data Type

aka

Send to FSSC

Naming template

MERIT

 

No

r<run#>_<version>_<dataType>.root

FILTEREDMERIT

 

No

r<run#>_<version>_<dataType>.root

ELECTRONMERIT

 

No

r<run#>_<version>_<dataType>.root

ELECTRONFT1

 

No

r<run#>_<version>_<dataType>.fit

FT1

LS-002

Yes

gll_ph_r<run#>_<version>.fit

LS1

LS-001

Yes

gll_ev_r<run#>_<version>.fit

LS3

LS-003

Maybe

gll_lt_r<run#>_<version>.fit

Example:

/glast/Data/Flight/Reprocess/P120/merit/r0239557414_v120_merit.root
/glast/Data/Flight/Reprocess/P120/filteredmerit/r0239557414_v120_filteredmerit.root
/glast/Data/Flight/Reprocess/P120/electronmerit/r0239557414_v120_electronmerit.root
/glast/Data/Flight/Reprocess/P120/ft1/gll_ph_pYYY_r0239559565_v120.fit
/glast/Data/Flight/Reprocess/P120/electronft1/r0239557414_v120_electronft1.fit
/glast/Data/Flight/Reprocess/P120/ls1/gll_ev_pYYY_r0239559565_v120.fit
/glast/Data/Flight/Reprocess/P120/ls3/gll_lt_pYYY_r0239559565_v120.fit

where 'pYYY' is the new PROC_VER (processing version) in the FITS files.

DataCatalog location and naming

Logical directory and group template:

Data/Flight/Reprocess/<reprocessName>:<dataType>

Note that the <dataType> field (following the colon) is a DataCatalog 'group' name, and file names are of the form r<run#>.

Naming examples:

Data/Flight/Reprocess/P120:MERIT r0239557414
Data/Flight/Reprocess/P120:FILTEREDMERIT r0239557414
Data/Flight/Reprocess/P120:FT1 r0239557414
Data/Flight/Reprocess/P120:LS1 r0239557414
Data/Flight/Reprocess/P120:LS3 r0239557414

Data Sample

The currently defined data sample for P120 reprocessing includes:

First run

239557414 (MET), 2008-08-04 15:43:34 (UTC)

Last run

302313321 (MET), 2010-07-31 23:55:21 (UTC)

Total runs

10916

Total MERIT events

23,758,285,498

Total FT1 events

n/a

Bookkeeping

  1. (This page): Define ingredients of reprocessing (processing code/configuration changes)
  2. Processing History database: http://glast-ground.slac.stanford.edu/HistoryProcessing/HProcessingRuns.jsp?processingname=P120
    1. List of all reprocessings
    2. List of all data runs reprocessed
    3. Pointers to all input data files (-> dataCatalog)
    4. Pointers to associated task processes (-> Pipeline II status)
  3. Data Catalog database: http://glast-ground.slac.stanford.edu/DataCatalog/folder.jsp
    1. Lists of and pointers to all output data files
    2. Meta data associated with each output data product

P120-MERIT

Status chronology

  • 8/8/2010 - block 2 reprocessing complete. Many xroot server problems. (5 days to process 2084 runs)
  • 8/3/2010 - begin block 2 reprocessing (through 255132033 MET), bringing the total runs reprocessed to 2721, about 5-1/2 months of data.
  • 7/28/2010 - block 1 re-reprocessing complete
  • 7/27/2010 - New GlastRelease (v17r35p10) containing new evtUtils, "to make the FT1EventClass bits compatible with the ScienceTools". Cleanup, including removing all files created last week during the first attempt.
  • 7/21/2010 - block 1 reprocessing complete
  • 7/20/2010 - agree upon 'pilot block' of runs (239557417 - 243220241), 637 runs. Begin...
  • 7/19/2010 - submit first test run. success. await feedback

Configuration

Task Location

/nfs/farm/g/glast/u38/Reprocess-tasks/P120-MERIT

Task Status

http://glast-ground.slac.stanford.edu/Pipeline-II/index.jsp

GlastRelease

v17r35p8 v17r35p10

Input Data Selection

"standard" from

https://confluence.slac.stanford.edu/display/SCIGRPS/LAT+Dataset+Definitions

along with "&& (RunQuality != "Bad" || is_null ( RunQuality )"

Input Run List

ftp://ftp-glast.slac.stanford.edu/glast.u38/Reprocess-tasks/P120-MERIT/config/runFile.txt

photonFilter

CTBParticleType==0 && CTBClassLevel>0

electronFilter

CTBParticleType==1

jobOpts

ftp://ftp-glast.slac.stanford.edu/glast.u38/Reprocess-tasks/P120-MERIT/config/reClassify.txt

Output Data Products

MERIT, FILTEREDMERIT, ELECTRONMERIT

Timing and Scaling

  • (block 1 results) The processClump step is taking ~40 hequ-minutes (or ~65 fell-minutes). With >500 simultaneous jobs running, there is little noticeable strain on xroot. There are five servers in the yellow-orange load range and they are claiming ~110-130 MB/s I/O rate.
  • The mergeClumps step is taking ~5 hequ-minutes
  • It was observed that submitting 70 runs at once put a strain on /u30, home of GlastRelease. Some 93 of ~540 jobs failed with I/O error, but succeeded upon rollback.

Load balancing

Introduce new trickleStreams.py script to (partially) assess pipeline activity and only the number of jobs considered appropriate based on available data.

maxProcessClumps = 600     ## prevent overload of xroot
maxMergeClumps = 20        ## prevent overload of xroot (inactive)
maxStreamsPerCycle = 20    ## prevent overload of /u30 on startup
timePerCycle = 900         ## 15 minutes:  allow time for dust to settle

With these parameters, it took ~ 5 hours to reach a point where fewer than 20 jobs per cycle were regularly submitted. Another 4.5 hours for the task to complete. On average, one run generated 7.5 processClump batch jobs.

P120-FT1

Status chronology

Configuration

WARNING: NOT YET ACCURATE!

Task Location

/nfs/farm/g/glast/u38/Reprocess-tasks/P120-FT1

Task Status

http://glast-ground.slac.stanford.edu/Pipeline-II/index.jsp

Input Data Selection

MERIT (from P120-MERIT), FT2 (from P105-FT2 and Level1)

Input Run List

ftp://ftp-glast.slac.stanford.edu/glast.u38/Reprocess-tasks/P120-FT1/config/runFile.txt

Reprocessing Mode

reFT1

evtClassDefs

00-16-00 (NEED TO UPDATE)

meritFilter

FT1EventClass!=0

eventClassifier

Pass7_Classifier.py (OBSOLETE)

eventClassMap

EvtClassDefs_P7V3.xml (in evtUtils)

ScienceTools

09-17-00 (NEED TO UPDATE) (SCons build)

Code Variant

redhat4-i686-32bit-gcc34 or redhat5-i686-32bit-gcc41

Diffuse Model

/afs/slac.stanford.edu/g/glast/ground/releases/analysisFiles/diffuse/v2/source_model_v02.xml
(

https://confluence.slac.stanford.edu/display/SCIGRPS/Diffuse+Model+for+Analysis+of+LAT+Data

)

Diffuse Response IRFs

P7trans_v3mc, P7source_v3mc (TEMPORARY)

IRFs

implemented as 'custom irf', files in /afs/slac.stanford.edu/g/glast/ground/PipelineConfig/IRFS/Pass7.3

Output Data Products

FT1, LS1, LS3, ELECTRONFT1

Processing chain for FITS data products

Data Product

makeFT1

gtdiffrsp

gtmktime

gtltcube

FT1

true

true for
evclsmin==0,1

true

false

(NEEDS TO CHANGE)

LS1

true

false

true

false

LS3

false

false

false

true

ELECTRONFT1

true

false

true

false

Note on 'Code Variant': The SLAC batch farm contains a mixture of architectures , both hardware (Intel/AMD 64-bit) and software (RedHat Enterprise Linux 4, and 5, gcc 3.4, 4.1, etc.).

Note on diffuse response calculation: (OBSOLETE)gtdiffrsp is called three times in succession. The first time with IRF P7_v2_diff and evclsmin==8, followed by IRF P7_v2_extrad and evclsmin==9, and finally IRF P7_v2_datac and evclsmin==10. The resulting FT1 file has six columns of diffuse response, two columns (galactic and extragalactic response) for each of the three IRFs. This creates a non-standard FT1 file by FSSC standards as they expect only five diffuse response columns.(OBSOLETE)

Timing

n/a

  • No labels