Notes on the Big Run and Monte Carlo Data Access

[prepared for short presentation on 19 March 2008]

What is the Big Run?

What types of MC datasets are typically generated?

What MC datasets have been generated?

What triggers the generation of new MC datasets?

What is scheduled for the near (pre-launch) future?

What about Pass6?

How does one stay up-to-date with regard to exactly what is available and planned?

Once I've figured out what data to use, how do I get my hands on it?

What is the Big Run?

A catch-all term encompassing much of all production Monte Carlo work since December 2007. There are many goals (which change with time!), including:

produce a very large background dataset (8 full days)
produce a 5-day background+full sky model for OpsSim2
produce a 1-year interleaved background+full sky model dataset

What types of MC datasets are typically generated?

Relatively frequent "standard" datasets (with selected, typical configurations)
- allGamma
  - 100% gammas between 18 MeV and 562 GeV
  - Normal trigger, NO OBF
  - ~10M evts generated
  - generated on upper hemisphere
- allMuon
  - 100% muons
  - NO trigger, NO OBF
  - ~few M evts generated
- sample day background
  - all background sources
    ***spanning one calendar day (0.2s runs every minute => 1440 runs)
  - NO trigger, NO OBF
- background
  - all background sources
  - temporally contiguous runs (large statistics...~1 B evts generated)
  - normal trigger, normal OBF

Less frequent specialized datasets
- background + full sky model
- Interleaved background + full sky model
- GRBgrid (a GRB in every job)
- special trigger and/or OBF
- Unusual orientations of the s/c, incl. pointed observations
- special orbit files

What MC datasets have been generated?

All production MC datasets are created as GLAST Pipeline "tasks" via batch jobs on the SLAC and Lyon compute farms.
The production of essentially all MC datasets is documented on this web page, Service Challenge Monte Carlo Processing Summary
Note that this Confluence page documents the production, but neither the code nor the complete configuration that goes into the production.

What triggers the generation of new MC datasets?

A (significant) new GlastRelease (e.g., a new generation of classification tree analysis)
Request from C&A group
Request from other analysis or detector group
Requests must (ultimately) be channeled through Richard Dubois

What is scheduled for the near (pre-launch) future?

Will we do the 1-year interleaved background + full sky model run? Open question...
See Richard's Big Run Checklist page for all the details
Richard and Julie are your best contacts for Big Run plans (but they are both really busy, so first try to discover the answer yourself)
Hopefully fixing the rough edges discovered during OpsSim2 data processing review

What about Pass6?

There are as of this moment NO production MC datasets with Pass6 analysis
As of yesterday morning still hammering out the finishing touches on the definition of Pass6 and the mechanism to reprocess some existing data
Once Pass6 is up and running, one should expect all future MC datasets to use it
With some luck, the opssim2 dataset (8 full days background + sky model) may be reprocessed by sometime next week...stay tuned.

How does one stay up-to-date with regard to exactly what is available and planned?

The C&A group represents the largest initiator of new MC datasets...so stay in touch!
Watch Richard's Big Run Checklist
Watch the Service Challenge Monte Carlo Processing Summary page in confluence for new entries
If already using an MC dataset, pay attention to its date of creation - be suspect if it is >> 1 month old, as there may be something more recent.
To see what has changed between GlastReleases, https://confluence.slac.stanford.edu/display/SAS/GlastRelease+Updates

Once I've figured out what data to use, how do I get my hands on it?

This is obviously a very hot topic. OpsSim2 generated a lot of feedback and the situation is ... dynamic.
Since November 2007, all MC data is no longer stored in NFS, but in xroot - a high-performance but specialized disk-based storage system
More recently, Level 1 processed data is also in xroot. (I do not know the full story on half-pipe or ASP.)
Quick data access tutorial is here: https://confluence.slac.stanford.edu/download/attachments/4784526/DataAccess-Glanzman-20080304.ppt?version=1
Some specific questions that have come up in the recent past are answered here: https://confluence.slac.stanford.edu/display/ds/Data+Access+FAQ
Be sure to consult the Workbook: http://glast-ground.slac.stanford.edu/workbook/

Space shortcuts

Child pages

Notes on the Big Run and Monte Carlo Data Access

What is the Big Run?

What types of MC datasets are typically generated?

What MC datasets have been generated?

What triggers the generation of new MC datasets?

What is scheduled for the near (pre-launch) future?

What about Pass6?

How does one stay up-to-date with regard to exactly what is available and planned?

Once I've figured out what data to use, how do I get my hands on it?

What is the Big Run?

What types of MC datasets are typically generated?

What MC datasets have been generated?

What triggers the generation of new MC datasets?

What is scheduled for the near (pre-launch) future?

What about Pass6?

How does one stay up-to-date with regard to exactly what is available and planned?

Once I've figured out what data to use, how do I get my hands on it?

Space shortcuts

Child pages

Notes on MC Data Access

Notes on the Big Run and Monte Carlo Data Access

What is the Big Run?

What *types* of MC datasets are typically generated?

What MC datasets have been generated?

What triggers the generation of new MC datasets?

What is scheduled for the near (pre-launch) future?

What about Pass6?

How does one stay up-to-date with regard to exactly what is available and planned?

Once I've figured out what data to use, how do I get my hands on it?

What is the Big Run?

What *types* of MC datasets are typically generated?

What MC datasets have been generated?

What triggers the generation of new MC datasets?

What is scheduled for the near (pre-launch) future?

What about Pass6?

How does one stay up-to-date with regard to exactly what is available and planned?

Once I've figured out what data to use, how do I get my hands on it?

What types of MC datasets are typically generated?

What types of MC datasets are typically generated?