-------- Original Message --------
Subject: Lyon reprocessing
Date: Fri, 09 Sep 2011 15:20:14 -0700
From: Tom Glanzman <glanzman@stanford.edu>
To: Dubois, Richard <richard@slac.stanford.edu>, Tony Johnson <tony_johnson@slac.stanford.edu>, Brian Van Klaveren <bvan@slac.stanford.edu>, Len Moss <ljm@slac.stanford.edu>, John Bartelt <bartelt@SLAC.Stanford.EDU>, Andy Hanushevsky <abh@slac.stanford.edu>, WILKO@SLAC.STANFORD.EDU, Frederic Piron <piron@in2p3.fr>, Stephan Zimmer <zimmer@fysik.su.se>, Maria Elena Monzani <monzani@slac.stanford.edu>

Folks,

I did not take written notes at yesterday's meeting, except for the homework assignments. The list may be incomplete.

  • Fred will check with CCIN2P3 about how much I/O buffer storage might be available for us (hopefully many TB!)
  • Tony/Brian will assess the difficulty and time required to change the pipeline to allow multi-site tasks (different job steps at different sites)
  • Wilko will try to saturate the SLAC-Lyon pipe and report on throughput statistics, and how reliable those figures have been over time.
  • Tom will re-evaluate Task I/O characteristics.
  • John (question) will inquire about WAN upgrades
  • Richard will allocate $$ to solve any technical problems we come up with (sorry Richard, you left early and could defend yourself)

For my part, I've assembled some data from the Fermi dataCatalog and from a test 'full reprocessing' task in which 22 runs have so far been reprocessed. The results are in this spreadsheet: https://docs.google.com/spreadsheet/ccc?key=0AqMtb_sevIuSdElFdXpKQXVTczhTejN6SnpDUGtmMVE&hl=en_US#gid=0

The blue section is directly from the dataCatalog as of yesterday, sorted by average file size. An ongoing negotiation for which ROOT file types must be regenerated currently specifies: RECON, CAL, MERIT, FILTEREDMERIT, ELECTRONMERIT, GCR. The yellow section summarizes the bulk I/O requirements for a full reprocessing of >17,000 runs (and still growing). Given what Wilko reported yesterday, the SLAC-Lyon network can sustain only 100 MB/s. Do we know whether one can depend on that figure 24x7 for months on end?

Only the first of two reprocessing tasks is analysed, that which reads DIGI files and emits other ROOT data products. The second task, which reads MERIT files and emits various FITS files is much less I/O intensive. Further, the model assumes that all files make the trip across the Atlantic only once (no extra trips for visiting scratch space).

The upshot is that if we were to efficiently utilize a 2000 core allocation at CCIN2P3, that would likely overburden the WAN network by a factor of 5-6.

The transferral of the data alone would take 3 months (assuming perfect efficiency).

Richard reminded me that for the reprocessing, the most time-critical aspect is getting the FT1 files home to SLAC. Thus, if all other data products were given a lower transfer priority, using the Lyon farm would still help speed things along. To do that, a vast reservoir of storage at Lyon (>700 TB) would be needed.

  • Tom
  • No labels