Added by Kian-Tat Lim, last edited by Kian-Tat Lim on Nov 01, 2007

Labels:

Enter labels to add to this page:
Wait Image 
Looking for a label? Just start typing.

The LSST project, which will need to query upwards of 10 petabytes of astronomical data, is considering an architecture that uses the popular Map/Reduce distributed processing paradigm to control a shared-nothing cluster of individual relational database nodes. These nodes may in turn use an underlying distributed filesystem to provide data migration and replication capabilities.

Goals:

  • Attain scalability and fault tolerance by using shared-nothing and Map/Reduce.
  • Reduce required coding by utilizing the database as a scan and combine execution engine.
  • Maximize familiarity for end users by providing SQL access. A top-level SQL parser would still be necessary to convert the input query into the queries to be executed by each database node.

We are considering using Hadoop as the Map/Reduce implementation with MySQL as the underlying database engine.