Pass7 Full Reprocessing

The data handling for a full Pass7 (GR v17) reprocessing is discussed here.

The basic idea is to fully reprocess all good LAT science data (from 4 Aug 2011) through reconstruction, with the latest calibrations and reflecting code changes since GR v15. Whether this is done at SLAC, Lyon or some combination, along with associated technical issues is the discussion on this page.

A first meeting to discuss this topic was held at SLAC on 8 Sep 2011. Those in attendance: Richard Dubois, Len Moss, John Bartelt, Andy Hanushevsky, Wilko Kroger, Maria Elena Monzani, Tony Johnson, Brian Van Klaveren, and two on the phone from Europe: Stephan Zimmer and Fred Piron.

Please add your discussions!

Reprocessing Task I/O Characterization (Tom)
For more general discussion of computing issues at CC-IN2P3, see here and here

Possible options at CC-IN2P3

This is a summary of the discussions I (Fred) had with Rachid Lemrani (CC-IN2P3) on Sep 13 to 17.

reminder of the amount of data: we're talking about hundreds of TB on time scales of several weeks typically – this includes 80 TB (3 year) digi files and ~700 TB for recon files, maybe 50% if the other half of reproc is run at SLAC.
IRODS is available at CC-IN2P3:
- see http://cc.in2p3.fr/IRODS
- already used for data transfer and storage by other experiments (e.g., standard tool for Babar, double Chooz...).
- interfaced with hpss.
- transfer is comparable to bbftp (about 100Mb/s for slac, thus ~1 TB/day). Typical speed for Babar tranfers is 750 TB in 6 months. It might be possible to boost this speed by using more machines, but this remains a challenge (RL). Also, these 100Mb/s are for sequential files transfer, and simultaneous transfers could be faster (TBC, Cf RL).
input files (digi):
- can be transferred from SLAC with IRODS (transfer command can be run from either site) prior to reprocessing campaign.
- if each digi file is read once in most cases, we could use IRODS disk as buffer. The job gets the file to be treated and erases it once terminated.
- if each digi file is read many times, then put IRODS digi files on hpss (automatically) and read them via xrootd.
- any job can access the data on hpss via xrootd or data on IRODS – xrootd has a big buffer of a few PB on disk, which is shared by many experiments. Any requested file is copied to the buffer disk at first request (stagging), then available for other tasks / jobs (thus faster the next times) – files remain on the buffer for a couple of months (if no cleaning, and as long as it does not saturate – otherwise erased).
output files (recon):
- if the transfer on the fly (at the end of each job, e.g. like for other outputs files in current MC tasks at Lyon) is not possible (b/c of, e.g., bandwidth issue, and/or if we want to transfer back other files in priority), use buffering on IRODS disk and erase a soon as the file is transferred.
- the IRODS buffer can be adjusted (to be discussed), but not up to hundreds of TB. Alternatively, storing recon files on hpss and transfer them back later could be feasible, at a rate of ~1 TB/day (details need to be investigated though).
other smaller / non-ROOT files: IRODS can handle them.
note first that gridEngine uses the same workers as BQS (only the batch system changed), thus 20-30 GB is the maximum scratch disk we can get -> dividing runs in clumps seems unavoidable.
how many cores available? Could we get more cores for a couple of months?
- the relevant number is actually the nb of CPU-hours. Each user group sends a request to the CCIN2P3 at the end of year n for year n+1, per trimester (this demand was coordinated by Claudia in the past for Fermi – will be done by Isabelle Moreau from now on, who is our new czar at CENBG Bordeaux), which helps the CCIN2P3 to anticipate the needs among all experiments (a scheduler checks the average on each trimester, in order to distribute and guarantee the computing power to all groups).
- Fermi has exceeded its quota in 2011 (per trimester and for the whole year) because we had not anticipated the big MC prod – Claudia had to update our demand on the fly...
- for any other specific demand, e.g. increasing the instantaneous nb of cores during several months, we have to make a precise estimate of the CPU resources and open a ticket – and from now until end of 2011, any new request for CPU resources should probably also be submitted to Dominique.
- in this respect, the doubling of the nb of cores available for Fermi early 2011 (from 600 to 1200 cores) apparently represents more than doubling the computing power (i.e. what was agreed in the MoU), and we should clarify with Dominique what is planned for 2012.
Rachid would be available for the next EVO meeting on reprocessing.

Space shortcuts

Child pages

Possible options at CC-IN2P3