Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • 2/13/2012 - begin trials with final calibration and alignments from Leon; 5 runs reprocessed
  • 2/14/2012 - trials continue with blocks of 15, 20, 25 and 50 runs reprocessed (each run generates ~20 batch jobs)
  • 2/16/2012 - begin trickleStream production. Initial config:
    Code Block
    ===============================================================================
      TRICKLE PARMS
    ===============================================================================
    task =  P202-ROOT
    maxRuns =  19172
    firstStep =  setupRun
    steps =  [['/processRun processClump', 1500, 20], ['mergeClumps', 70, 1]]
    maxStreamsPerCycle =  20
    timePerCycle =  300
    ===============================================================================
    
  • 2/21/2012 - One clump reprocessed with pointer to new mySQL DB (stream 710.0)
  • 2/22/2012 - 776 runs complete. Pausing task.
  • 3/15/2012 - resume task. New goal is 1-year of data (~5600 runs)
  • 3/31/2012 - 1-year complete (5600 runs). There have been a few nasty problems which need to be fixed before continuing.

    S/W component

    bug fix

    status

    New ROOT version

    5-min 'transaction timeout' triggered by xroot data server reboot

    done 4/3/2012

    New GlastRelease

    1) include new ROOT version (above); 2) exit with non-zero RC on ROOT write error

    done 4/5/2012, GR 17-35-24-rp04

    New GPL_TOOLS(question)

    check size/checksum of file written to xroot with known size/checksum

    pending

    Tuned xroot on new Dell servers

    silent file truncation when volume fills up JIRA

    done 4/4/2012 (100 MB min space limit -> 100 GB; file system space check cadence changed from 10 min to 2 min)

    New xroot client tools

    complain when xroot data server fails on write

    done 4/3/2012, v3.1.1

    New TSkim

    1) new ROOT version (above); 2) complain on ROOT write errors

    done 4/5/2012, v08-02-01

    New xroot redirector

    required step toward enabling HPSS staging

    done 4/3/2012, v3.1.1

    Note also that the FILTEREDMERIT files contain 42 more events than the EXTENDEDFT1 files; they should be identical.
  • 4/5/2012 - resume task. New goal is entire science dataset.
  • 4/10/2012 - Unknown 'glitch' may have caused a few 100's of jobs to crash and take sulky46 along with them.
  • 4/11/2012 - 10:40pm lightening strikes SLAC power lines. Site-wide power outage. Stream 7795 was the last stream submitted prior to the outage.
  • 4/12/2012 - due to possible overload of sulky46/u18 writing a lot of core files, have introduced one change to processClumps.py: prepend "ulimit -c 0;" to gleam command to disable all core file generation. This starts approx with run 7605 (+/-).
  • 5/9/2012 - major pipeline issue...shut down pipeline and allow to drain (due to tomorrow's major outage)
  • 5/10/2012 - 13:40 outage over.
    • Update GR from 17-35-24-rp04 to 17-35-24-rp07 in which the only change is replacing the 5-minute xroot time-out with 8 hours. This change effective with stream 14314 and previously failed pieces of four other runs: 14247.6, 14273.23, 14274.8, 14231.9.
    • Leon advises that as of today, calibrations are valid only thru ~15 Dec 2011 (run 345574915) - which is somewhere around stream 18,400. He asks Sasha to produce more up-to-date calibs.
  • 5/18/2012 - all calibrations now valid through 6 May 2012. No need to pause P202 task.
  • 5/28/2012 - 15:30 Complete (through 31 March 2012)
    • Data Catalog summary:  

      Name

      Type

      Files

      Events

      Size

      Created (UTC)

      Links

      CAL

      Group

      20229

      44,125,599,595

      128.7 TB

      25-Jan-2012 00:53:31

      Files

      ELECTRONFT1

      Group

      5600

      0

      2.5 GB

      02-Mar-2012 00:06:07

      Files

      ELECTRONMERIT

      Group

      20229

      90,904,582

      205.7 GB

      25-Jan-2012 00:53:32

      Files

      EXTENDEDFT1

      Group

      5600

      1,572,783,826

      143.7 GB

      02-Mar-2012 00:06:09

      Files

      EXTENDEDLS1

      Group

      5600

      1,572,783,826

      255.0 GB

      02-Mar-2012 00:06:09

      Files

      FILTEREDMERIT

      Group

      20229

      6,291,396,710

      5.3 TB

      25-Jan-2012 00:53:29

      Files

      FT1

      Group

      5600

      24,261,962

      2.4 GB

      02-Mar-2012 00:06:06

      Files

      GCR

      Group

      20229

      44,123,014,456

      942.7 GB

      25-Jan-2012 00:53:31

      Files

      LS1

      Group

      5600

      271,923,333

      44.2 GB

      02-Mar-2012 00:06:08

      Files

      MERIT

      Group

      20229

      44,125,679,961

      35.4 TB

      25-Jan-2012 00:53:30

      Files

      RECON

      Group

      20229

      44,123,612,977

      590.0 TB

      25-Jan-2012 00:53:33

      Files

      There are discrepancies to track down!
      Turns out to be three problematic runs/streams:
      • 272707024/5723 - I/O prob, corrupt files, entire stream rolled back
      • 279108810/6847 - xroot transient access prob., re-registered in dataCat
      • 284813327/7848 - xroot transient access prob., re-registered in dataCat
    • Final trickleStream configuration:
      Code Block
      ===============================================================================
        TRICKLE PARMS
      ===============================================================================
      task =  P202-ROOT
      maxRuns =  20229
      firstStep =  setupRun
      steps =  [['/processRun processClump', 2000, 21], ['mergeClumps', 200, 1]]
      maxStreamsPerCycle =  20
      timePerCycle =  300
      ------DEBUG----------------
      maxCycles =  0
      chatter =  False
      dryRun =  False
      ===============================================================================
      

...

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="b692dc72e05ba4ba-92f9ee07-42964929-ac96b7f4-5ec3f9d9f19e8fe84e3256f6"><ac:plain-text-body><![CDATA[

Data Product

destination

data content [1]

event selection [1]

makeFT1

gtselect

gtdiffrsp

gtmktime

]]></ac:plain-text-body></ac:structured-macro>

EXTENDEDFT1

SLAC

FT1variables

((FT1EventClass & 0x00003EFF)!=0)
pass7.6_Extended_cuts_L1

(tick)

(error)

(tick)

(tick)

FT1

FSSC+SLAC

FT1variables

'source' and above
EVENT_CLASS bits 2,3,4
evclass=2 filtered from EXTENDEDFT1

(error)

(tick)

(inherited)

(tick)

EXTENDEDLS1

SLAC

LS1variables

((FT1EventClass & 0x00003EFF)!=0)
pass7.6_Extended_cuts_L1

(tick)

(error)

(tick)

(tick)

LS1

FSSC+SLAC

LS1variables

'transient' and above
EVENT_CLASS bits 0,2,3,4
evclass=0 filtered from EXTENDEDLS1

(error)

(tick)

(inherited)

(tick)

ELECTRONFT1

SLAC

FT1variables

CTBParticleType==1
pass7.6_Electrons_cuts_L1

(tick)

(error)

(error)

(tick)

...