Overview

Currently the data catalog is being used by Fermi and EXO, and used as a side effect of using the pipeline by several other groups (CDMS, CTA).

LSST is interested in using the data catalog  (for test data at least, and possibly for DESC and data handling if its dependence on oracle was removed). The download manager is currently used by SSRL and they could potentially use more of the data catalog for making data which is not currently accessible from JCSG directly available from SLAC (although they do not currently seem very interested in this). 

The data catalog is also potentially usable as part of a future photon science data portal. 

This page explores the possibilities for future development of the data catalog -- assuming for the moment that we have unlimited resources to achieve this.

Current features

See also outstanding issues.

Data Catalog 2.0

Improved modularity

The data catalog currently has 5 different components

  1. Web interface
  2. Database back-end ("middleware")
  3. Line mode client
  4. Download manager (web based and line-mode)
  5. File "crawler"

The download manager and crawler are not strongly coupled to the rest of the system, but the top 3 items are fairly tightly coupled, in particular they all access the database directly. Ideally the web interface would access the database via an abstraction layer, so that the web interface could be used with any backend providing similar functionality. If we add restful interfaces to the web interface it would probably make sense for the line mode client to use those interfaces (this would move the current restriction that the line-mode client can run only at SLAC).

New Features

Database Independence

Currently the data catalog is tightly coupled to Oracle through the use of

Ideally this dependence could be removed through the use of a proper object-relational mapping layer, such as hibernate. How hard would it be to migrate existing data if we used hibernate?

Integration with other tools

Implementation Language

Currently the data catalog is implemented entirely in Java, and integrated into the SRS web application framework. When deciding on a new version it would be worth considering whether this remains the right approach. Additional HTML5 functionality would require the adoption of some web application framework for which there are many possibilities (GWT, jQuery, ...). Splitting the web front-end from the middleware would make it possible at least to implement different components in different languages. Fully functional restful interfaces would make interfacing from any language much easier. 

Links

Some links to conceivably relevant tools found on the internet: