You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 43 Next »

P120 Reprocessing

status: In Progress
last update: 31 August 2010

This page is a record of the configuration for the P120 reprocessing project, event reclassification using Pass 7.3. This project involves reprocessing with Pass7 classification trees and (ultimately) new IRFs. The name "P120" derives from the word "processing" and the initial file version to be used for the output data products, e.g., r0123456789_v120_merit.root.

  • P120-MERIT - this task reads DIGI+RECON+MERIT and produces reprocessed MERIT + FILTEREDMERIT (photons) + ELECTRONMERIT
  • P120-FT1 - this task will (eventually) read FILTEREDMERIT and produce FT1 (photons) + LS1 (merit-like FITS file for photons) + electron FITS file + LS3 (live-time cube)
  • P120-LEO-MERIT - this task reads DIGI+RECON+MERIT and produces reprocessed MERIT + FILTEREDMERIT (photons) + ELECTRONMERIT for 200 runs of earth limb (L&EO) data

Datafile names, versions and locations

Data file version numbers for this reprocessing will begin with v120.

XROOT location and file naming

Location template:

/glast/Data/Flight/Reprocess/<reprocessName>/<dataType>

Locations for P120:

/glast/Data/Flight/Reprocess/P120/merit
/glast/Data/Flight/Reprocess/P120/filteredmerit
/glast/Data/Flight/Reprocess/P120/electronmerit
/glast/Data/Flight/Reprocess/P120/ft1
/glast/Data/Flight/Reprocess/P120/electronft1
/glast/Data/Flight/Reprocess/P120/ls1
/glast/Data/Flight/Reprocess/P120/ls3

File naming:

Data Type

aka

Send to FSSC

Naming template

MERIT

 

No

r<run#>_<version>_<dataType>.root

FILTEREDMERIT

 

No

r<run#>_<version>_<dataType>.root

ELECTRONMERIT

 

No

r<run#>_<version>_<dataType>.root

ELECTRONFT1

 

No

gll_elp<procVer>r<run#>_<version>.fit

FT1

LS-002

Yes

gll_php<procVer>r<run#>_<version>.fit

LS1

LS-001

Yes

gll_evp<procVer>r<run#>_<version>.fit

LS3

LS-003

Maybe

gll_ltp<procVer>r<run#>_<version>.fit

Note: 'procVer' is a field added to the file name (and the keyword "PROC_VER" in the primary header) added to the FFD 5/12/2010. Ref: http://fermi.gsfc.nasa.gov/ssc/dev/current_documents/Science_DP_ICD_RevA.pdf

Example:

/glast/Data/Flight/Reprocess/P120/merit/r0239557414_v120_merit.root
/glast/Data/Flight/Reprocess/P120/filteredmerit/r0239557414_v120_filteredmerit.root
/glast/Data/Flight/Reprocess/P120/electronmerit/r0239557414_v120_electronmerit.root
/glast/Data/Flight/Reprocess/P120/ft1/gll_ph_p120_r0239559565_v120.fit
/glast/Data/Flight/Reprocess/P120/electronft1/gll_el_p120_r0239559565_v120.fit
/glast/Data/Flight/Reprocess/P120/ls1/gll_ev_p120_r0239559565_v120.fit
/glast/Data/Flight/Reprocess/P120/ls3/gll_lt_p120_r0239559565_v120.fit
DataCatalog location and naming

Logical directory and group template:

Data/Flight/Reprocess/<reprocessName>:<dataType>

Note that the <dataType> field (following the colon) is a DataCatalog 'group' name, and file names are of the form r<run#>.

Naming examples:

Data/Flight/Reprocess/P120:MERIT r0239557414
Data/Flight/Reprocess/P120:FILTEREDMERIT r0239557414
Data/Flight/Reprocess/P120:FT1 r0239557414
Data/Flight/Reprocess/P120:ELECTRONFT1 r0239557414
Data/Flight/Reprocess/P120:LS1 r0239557414
Data/Flight/Reprocess/P120:LS3 r0239557414

Data Sample

The currently defined data sample for P120 reprocessing includes:

First run

239557414 (MET), 2008-08-04 15:43:34 (UTC)

Last run

302313321 (MET), 2010-07-31 23:55:21 (UTC)

Total runs

10916

Total MERIT events

23,758,285,498

Total FT1 events

n/a

Bookkeeping

  1. (This page): Define ingredients of reprocessing (processing code/configuration changes)
  2. Processing History database: http://glast-ground.slac.stanford.edu/HistoryProcessing/HProcessingRuns.jsp?processingname=P120
    1. List of all reprocessings
    2. List of all data runs reprocessed
    3. Pointers to all input data files (-> dataCatalog)
    4. Pointers to associated task processes (-> Pipeline II status)
  3. Data Catalog database: http://glast-ground.slac.stanford.edu/DataCatalog/folder.jsp
    1. Lists of and pointers to all output data files
    2. Meta data associated with each output data product

P120-MERIT

Status chronology

  • 8/29/2010 - Discovered three merge steps that silently failed (xroot file access). TASK complete.
  • 8/28/2010 - processing formally complete (10916 runs), but some discrepancy in # of events
  • 8/26/2010 - serious xroot problems. See initial distribution of files across xroot servers. From this report (courtesy Wilko) it is easy to see where problems are likely to arise - when the number of servers involved is small, e.g. two or three.
  • 8/19/2010 - production continues at a crawl due to xroot server difficulties
  • 8/16/2010 - resume full production, but at a slow trickle (max 350 simultaneous processClump jobs)
  • 8/8/2010 - block 2 reprocessing complete. Many xroot server problems. (5 days to process 2084 runs)
  • 8/3/2010 - begin block 2 reprocessing (through 255132033 MET), bringing the total runs reprocessed to 2721, about 5-1/2 months of data.
  • 7/28/2010 - block 1 re-reprocessing complete
  • 7/27/2010 - New GlastRelease (v17r35p10) containing new evtUtils, "to make the FT1EventClass bits compatible with the ScienceTools". Cleanup, including removing all files created last week during the first attempt.
  • 7/21/2010 - block 1 reprocessing complete
  • 7/20/2010 - agree upon 'pilot block' of runs (239557417 - 243220241), 637 runs. Begin...
  • 7/19/2010 - submit first test run. success. await feedback

Configuration

Task Location

/nfs/farm/g/glast/u38/Reprocess-tasks/P120-MERIT

Task Status

http://glast-ground.slac.stanford.edu/Pipeline-II/index.jsp

GlastRelease

v17r35p8 v17r35p10

Input Data Selection

"standard" from

https://confluence.slac.stanford.edu/display/SCIGRPS/LAT+Dataset+Definitions

along with "&& (RunQuality != "Bad" || is_null ( RunQuality )"

s/c data

FT2 from P105 (runs 239557414 - 271844560), then from current Level 1 production

Input Run List

ftp://ftp-glast.slac.stanford.edu/glast.u38/Reprocess-tasks/P120-MERIT/config/runFile.txt

photonFilter

CTBParticleType==0 && CTBClassLevel>0

electronFilter

CTBParticleType==1

jobOpts

ftp://ftp-glast.slac.stanford.edu/glast.u38/Reprocess-tasks/P120-MERIT/config/reClassify.txt

Output Data Products

MERIT, FILTEREDMERIT, ELECTRONMERIT

Timing and Scaling

  • (beyond block 2 results) Due to xroot problems (overstressing a small number of machines) the processing throughput dropped to 25-30 runs/hour (190-225 jobs/hour)
    • Wilko begins redistributing files around the xroot system in order to balance the load. This is only partially done by task completion.
    • Logs of job submission can be found here
  • (block 1 results) The processClump step is taking ~40 hequ-minutes (or ~65 fell-minutes). With >500 simultaneous jobs running, there is little noticeable strain on xroot. There are five servers in the yellow-orange load range and they are claiming ~110-130 MB/s I/O rate.
  • The mergeClumps step is taking ~5 hequ-minutes
  • It was observed that submitting 70 runs at once put a strain on /u30, home of GlastRelease. Some 93 of ~540 jobs failed with I/O error, but succeeded upon rollback.

Load balancing

Introduce new trickleStreams.py script to (partially) assess pipeline activity and only the number of jobs considered appropriate based on available data.
(block 1)

maxProcessClumps = 600     ## prevent overload of xroot
maxMergeClumps = 20        ## prevent overload of xroot (inactive)
maxStreamsPerCycle = 20    ## prevent overload of /u30 on startup
timePerCycle = 900         ## 15 minutes:  allow time for dust to settle

With these parameters, it took ~ 5 hours to reach a point where fewer than 20 jobs per cycle were regularly submitted. Another 4.5 hours for the task to complete. On average, one run generated 7.5 processClump batch jobs.

For subsequent data (beyond block 2), xroot displayed such stress, that the maxProcessClumps limit was reduced to 250 or 300.

P120-FT1

This task will be run twice: Pass 1 will perform event classification for source and transient events and allow analysis to produce diffuse class IRFs; Pass 2 will be identical to Pass 1 but will include diffuse classification. The latest word from C&A is that diffuse response will only be calculated for 'source' class events.

Status chronology

  • 8/31/2010 - Pass 1 of this task is complete (through 31 July 2010)
  • 8/30/2010 - Problem with makeFT1 stressing /u38. Jim makes update to fitsGenApps => ST 09-18-03, put into production at stream 1400.
  • 8/29/2010 - Begin Pass 1 of task...

Configuration

Task Location

/nfs/farm/g/glast/u38/Reprocess-tasks/P120-FT1

Task Status

http://glast-ground.slac.stanford.edu/Pipeline-II/index.jsp

Input Data Selection

MERIT (from P120-MERIT)

Input Run List

ftp://ftp-glast.slac.stanford.edu/glast.u38/Reprocess-tasks/P120-FT1/config/runFile.txt

Reprocessing Mode

reFT1

evtClassDefs

00-17-00

meritFilter

FT1EventClass!=0

eventClassifier

Pass7_Classifier.py (OBSOLETE)

eventClassMap

EvtClassDefs_P7V3.xml (in evtUtils)

s/c data

FT2 from P105 (runs 239557414 - 271844560), then from current Level 1 production

ScienceTools

09-18-01 through stream 1399, then 09-18-03 (SCons build)

Code Variants

redhat4-i686-32bit-gcc34 and redhat5-i686-32bit-gcc41 (Optimized)

Diffuse Model

/afs/slac.stanford.edu/g/glast/ground/releases/analysisFiles/diffuse/v2/source_model_v02.xml
(

https://confluence.slac.stanford.edu/display/SCIGRPS/Diffuse+Model+for+Analysis+of+LAT+Data

)

Diffuse Response IRFs

P7trans_v3mc, P7source_v3mc (TEMPORARY)

IRFs

implemented as 'custom irf', files in /afs/slac.stanford.edu/g/glast/ground/PipelineConfig/IRFS/Pass7.3

Output Data Products

FT1, LS1, ELECTRONFT1

Processing chain for FITS data products

Data Product

makeFT1

gtdiffrsp

gtmktime

gtltcube

FT1

true

true for
evclsmin==1

true

false

LS1

true

false

true

false

ELECTRONFT1

true

false

true

false

Note that diffuse response is calculated only for 'source' class events (and not transient); diffuse class events are not yet classified.

Note on 'Code Variant': The SLAC batch farm contains a mixture of architectures , both hardware (Intel/AMD 64-bit) and software (RedHat Enterprise Linux 4, and 5, gcc 3.4, 4.1, etc.).

Note on diffuse response calculation: (OBSOLETE)gtdiffrsp is called three times in succession. The first time with IRF P7_v2_diff and evclsmin==8, followed by IRF P7_v2_extrad and evclsmin==9, and finally IRF P7_v2_datac and evclsmin==10. The resulting FT1 file has six columns of diffuse response, two columns (galactic and extragalactic response) for each of the three IRFs. This creates a non-standard FT1 file by FSSC standards as they expect only five diffuse response columns.(OBSOLETE)

Timing

  • 8/31/2010 - With P120-MERIT files nicely distributed across xroot servers, there were no xroot limitations to the processing. After the update to makeFT1, there was no longer an issue with overloading /u38 ($PWD). The next bottleneck was the pipeline processing itself. This task consists of three batch jobs and four scriptlets; it was observed that the pipeline allowed hundreds of jobs to dwell in the READY state for extended periods of time, thus making it impossible to keep LSF saturated. Nevertheless, the maximum number of simultaneous jobs approached 2000. The task essentially completed in 8 hours, although some lingerers kept 'running' for another nine hours (mostly in SSUSP). A profile of job processing rate appears in this plot:

P120-LEO-MERIT

Status chronology

  • 8/16/2010 - Task complete (199 runs)
  • 8/13/2010 - Create task

Configuration

Identical to the P120-MERIT task, except use FT2 files from P110 reprocessing.

  • No labels