With Data Management as part of the strategic plan for Lab-wide computing, a workshop would be a good stepping stone to bringing the data management community together and identifying common tools and directions to move in. Data management is here loosely defined as moving and storing files/data, cataloguing and accessing them.
In broad brush:
- get more detail of existing solutions at SLAC: Fermi, LCLS, BABAR...
- interview projects in need of DM solutions - eg cosmology computing; maybe someone like Todd Martinez; SSRL, LCLS users etc.
- collect the people from the first bullet to discuss common solutions that could be recommended to pursue.
- think about evolution of current DM solutions to a more common base over some timeframe?
Time frame: Feb 2012
Duration: 2 days
Tony Johnson has listed these items (with the odd edit) as potentially of interest:
- encompass disk and tape storage technologies and file-systems (lustre, xrootd etc).
- The aspects of DAQ concerned with pushing those bits out
- Transfer from DAQ to long term storage
- The data access patterns, analysis and reconstruction bandwidth
- Data storage formats (root, fits, hdf5, databases, ...). Optimization
of data storage formats for SSD - Need for scaling storage to support massive parallelization (multiple
simultaneous write streams, massively parallel read streams) - Data Catalogs, eg iRODS etc
- Offsite access to data, e.g. for Fermi download manager, or Globus Online
- Skimming tools, e.g. for Fermi skimmer, astro server
- Long-term data archive, "Cloud" storage