Motivation
Extremely large databases in the scientific world are running into difficulties that may have been present at smaller scales but were acceptable because queries and processing were fast. The larger scale makes it more difficult to build compensating systems outside the database. Accordingly, new database engine features are required.
Requirements
The following requirements for a new open source science-oriented database engine were generated at a workshop held at Asilomar, CA, from 2008-Mar-30 to 2008-Apr-01.
Scalability and Performance
- Scalability up to 100s of petabytes.
- Parallelized single queries on commodity hardware.
- Fault tolerant with intra-query failover.
- The performance, extensibility, and compression of a column store, but also with efficient "SELECT *" (or "SELECT most columns").
Note: Both fault tolerance and a "SELECT *" view may require two copies of the data. Requiring more than two copies is less desirable.
Interfaces
- SQL (with appropriate extensions, if needed).
- An object-relational mapping layer for external applications. Full OO is likely not needed; mapping to hierarchical structures should be adequate.
- Procedural user-defined functions (or stored procedures) operating on row cursors or post-ORM object cursors that have ordering and grouping properties:
- User may specify that the function/procedure is distributive and can be executed in parallel.
- The function/procedure must have exceptions handled reasonably and be debuggable.
- The functions/procedures must be able to share scans, particularly when interacting with tiered storage (see below).
- Partial results, query progress indicator, query pause/restart/abort.
- Pre-execution query cost estimate, preferably in wall-clock time.
Features
- No transactions needed for the largest tables.
- Do need batch appends that are atomic across multiple tables.
- Concurrent single writer, multiple readers; dirty reads are acceptable.
- Metadata tables may need transactions.
- Versioning of tables and code, including the ability to tag or label sets that go together.
- Support for spatial and temporal operations.
- Cheap (near-zero cost) one-to-one or one-to-many joins when both tables are partitioned on the join key.
- This feature is expected to take care of annotations, user "namespaces" for columns, data sharing, and ability to cheaply store classification probabilities with attributes tied to each classification.
- Support for arrays as a first-class column type.
- Support for provenance of data elements:
- Log database operations that create/delete/update data elements.
- Enable queries over provenance:
- What operations led to the creation of this element?
- What operations used this element?
- What data elements were used as input to this operation?
- What data elements were created as output from this operation?
- Import provenance from external systems when loading data.
- Lightweight support for uncertainty of data elements:
- Ability to associate an error (standard deviation) column with each data column.
- An "approximately equal" operator for WHERE and JOIN clauses. Such an operator may need to be ternary to handle the "3" in "+/- 3 sigma".
- A resource management system including CPU and disk quotas.
- Support for tiered storage: migration from fast disk to slow disk to tape, and maybe flash memory in the future.
Less Important
- Column grouping within wide tables: assert for efficiency that certain columns will always be used together.
- Access controls and perhaps namespaces for sets of columns in wide tables.
- Probabilistic uncertainty for nominal values (thresholding is thought to be usually adequate).
- XQuery-like engine for querying hierarchical data.