Data Catalog -- Dataset Attributes

Datasets have two types of meta-data:

Typed standard attributes
- attached to every dataset entry in the catalog
- include: (file format, data type, file time-stamp, file size, number of events, run numbers, ...)
- fields are data-typed so comparison is simple

Untyped user-supplied attributes
- attached at user's request
- arbitrary name and value
- non-typed: All stored as strings in database -- comparison complicated

Users will want to select datasets based on 'cuts' applied to the attribute values.

Some simple (like a Root TCut?) query language will have to be adopted.

Potential problem is the comparison of strings for the user supplied meta-data. For example a string comparison of "20" and "100" will assert that 100 < 20.

Should also be able to restrict the query to specified logical folders (and possibly sub-folders) and/or dataset groups.

A query might look like this:

findDatasets("/my/logical/folder/*", "DatasetType=MC && RunMin > 50 && RunMin < 100 && ((tstart > 20080615 && tstart < 20080630) || (tstart > 20080715 && tstart < 20080730))")

Which would search in all folders and groups starting with "/my/logical/folder/" for Monte Carlo files with a starting run between 50 and 100 and that were produced between the 15th and 30th of June or July 2008.

Space shortcuts

Child pages