Development of a RESTful API for the Datacatalog
Generic development of the datacatalog's RESTful API started with an implementation for EXO which targeted only getting information about datasets. There have been some changes since then, but the core of the features are the same. As we've moved away from an EXO specific implementation, we've added several core features that will enable the use of the datacatalog in general for many experiments, and adopted a plug-in architecture for experiment-specific additions.
Core Features
Resources
The core resources start with a base REST URI. Here are some examples and definitions of the URI structure.
http://srs.slac.stanford.edu/rest/datacat
http://srs.slac.stanford.edu/rest/datacat/path
http://srs.slac.stanford.edu/rest/datacat/path/LSST/SensorAnalysis
Core resources with HTTP methods and the operations allowed
REST Resource base URI | Resource path | GET requests | POST requests | PATCH(PUT) requests | DELETE requests |
---|---|---|---|---|---|
/path | Directly access any datacatalog element by full path | Will return a PATH item (folder, group, dataset) | Unsupported. | Unsupported. | Unsupported. |
/datasets | If the resource path is a group or a folder, this resource will | Directly address a dataset and get full information about | [Not Implemented Yet] | [Not Implemented Yet] | [Not Implemented Yet] |
/children | The children of a given container object (Folder or Group) | When supplied with a folder, return all groups and | Not Supported | Not Supported | Not Supported |
/groups | If the resource path is a folder, this resource will only support | Directly access a group and get full information about it's | Create a new Group with the given location. | Update the group name, | (Not likely to be supported for a while) |
/folders | Address a Folder only | Get the folder and information about objects in it. | Create a new Folder with the given location. | Move/Rename/modify metadata for | (Not likely to be supported for a while) |
Plugin resources
Resources are also able to be programmed in via a plugin. For more information on plugins and how they work, see below. They will be available at the Application base URI
Experiment: http://srs.slac.stanford.edu/rest/datacat/exo Experiment Resource "runs": http://srs.slac.stanford.edu/rest/datacat/exo/runs Experiment: http://srs.slac.stanford.edu/rest/datacat/lsst Experiment Resource "sensors": http://srs.slac.stanford.edu/rest/datacat/lsst/sensors
Security and Authentication
Authentication is handled two ways. An end user can authenticate with CAS, either in the browser or via some HTTP libraries in the language they are using, and use CAS management. This means a user in a browser will see the application like they would any other application we make. It also means we can use the REST resources for web applications.
The second way an authentication is handled is via HMAC. A user will be provided a UUID and a shared secret key which they will use to sign parts of their request. In their request, they will include an Authorization header with their UUID and the signed hash.
The path after the REST application should be the path used for the signing. All requests must include a date, and the request must be received by the server in a specific amount of time (5 seconds? a minute?)
"GET" + "\n" + // (HTTP Method) "/path/LSST/SensorAnalysis" + "\n" + // (Rest resource + path up to query string) "" + "\n" + // (Content-MD5) "" + "\n" + // (Content-Type) "Thu, 29 Aug 2013 20:16:48 GMT" + "\n" (Date)
Searching
Searching can be done a variety of ways.
In general, searching will be performed starting at a base resource URI. Most often it will be used from the datasets resource, and a user will supply a filter string.
Expressions
A search query/filter is now backed with a domain-specific language to compose expressions for searching the datacatalog.
Each datacatalog item, datasets, groups, and folders, will be able to be searched for with this language.
The language itself has type recognition, and supports three basic types: Strings, Numbers, and Timestamps.
A single expression will be composed of an identifier, an operations, and a value.
[IDENTIFIER] [OPERATOR] [VALUE] datasetName eq 'myDataset' datasetName == 'myDataset' runMin gt 3248 createDate > d'2012-01-01T13:00:01'
To make things easier with URLs, you can use a variety of ways of specifying your operations. For example, "&&" is the same as "and", "eq" for "==", etc.
Also, inspired by python, a date type is written as a string, but prepended with the letter 'd' so the parser knows it's a date. A date string can be varying lengths, but should conform to ISO 8601. It's a little more flexible than ISO 8601 because you can supply it with varying lengths, i.e. d'2012-03' is extended to mean the first day of march, d'2013-02-03T15' will be the first minute and second at 3PM. Time zone information is also allowed. Both single quotes and double quotes are allowed for denoting strings and date strings. Numbers are never to be quoted.
Expressions can be composed in pretty intricate ways.
( (datasetName EQ 'myDataset' AND runMin IN (1234, 1235, 1239, 1329)) OR (datasetName EQ 'myOldDataset' AND runMin GT 1200) )
Identifiers
It's recommended to limit your metakey names in the database to the form:
[a-zA-Z][a-zA-Z0-9_\.\-]
However, the language lexer will recognize identifiers in these forms:
[a-zA-Z_][a-zA-Z0-9_\.\-\:]
Scope
Each identifier is resolved based on a hierarchy of three scopes. The are checked in order.
Table
The first scope is effectively the columns of the type of objects (dataset, group, folder) you are searching on. They will generally be of the form:
name == "datasetName" and runMin > 3000 and site == "SLAC_XROOT"
The second scope is the plugin scope. A plugin will usually correspond to an external table that's not in the datacatalog's core set of tables. This will usually be an experiment-specific feature. As such, they will generally be in the form:
[experiment].[identifier] exo.runQuality in ('GOLDEN', 'GOOD') exo.runQuality not in ('BAD', 'UNSET') lsst.sensorId == 3458
Plugin
The plugins themselves will know how to join their data to the data of the core tables of the datacatalog. When a plugin identifier is found, the parser will ask the plugin join tables necessary to the current selection.
Metadata
The final scope is metadata. In the datacatalog, there are three tables corresponding to metadata, Numbers, Strings, and Timestamps. The metadata key you search on must exist in one of those tables. The datacatalog will do a lookup to see which metadata tables may be applicable, and check those tables for the search.
nRun eq 4000
nRun eq '4000'
Priority
When an identifier is encountered, it's first looked for in the table scope, then the plugin scope, then the metadata scope. So, metadata named exo.runQuality would interfere with the exo plugin's runQuality selector.