Datasets have two types of meta-data:

  • Typed standard attributes
    • attached to every dataset entry in the catalog
    • include: (file format, data type, file time-stamp, file size, number of events, run numbers, ...)
    • fields are data-typed so comparison is simple
  • Untyped user-supplied attributes
    • attached at user's request
    • arbitrary name and value
    • non-typed:  All stored as strings in database -- comparison complicated

Users will want to select datasets based on 'cuts' applied to the attribute values.

Some simple (like a Root TCut?) query language will have to be adopted.

Potential problem is the comparison of strings for the user supplied meta-data.  For example a string comparison of "20" and "100" will assert that 100 < 20.

Should also be able to restrict the query to specified logical folders (and possibly sub-folders) and/or dataset groups.

A query might look like this:

    findDatasets("/my/logical/folder/*", "DatasetType=MC && RunMin > 50 && RunMin < 100 && ((tstart > 20080615 && tstart < 20080630) || (tstart > 20080715 && tstart < 20080730))") 

    Which would search in all folders and groups starting with "/my/logical/folder/" for Monte Carlo files with a starting run between 50 and 100 and that were produced between the 15th and 30th of June or July 2008.
 

  • No labels