Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Sept 5: MacroBase: A Search Engine for Fast Data Streams

Date: Sept. 5, 2pm

Speaker: Sahaana Suri (Stanford)

While data volumes generated by sensors, automated process, and application telemetry continue to rise, the capacity of human attention remains limited. To harness the potential of these large scale data streams, machines must step in by processing, aggregating, and contextualizing significant behaviors within these data streams. This talk will describe progress towards achieving this goal via MacroBase, a new analytics engine for prioritizing attention in this large-scale "fast data" that has begun to deliver results in several production environments. Key to this progress are new methods for constructing cascades of analytic operators for classification, aggregation, and high-dimensional feature selection; when combined, these cascades yield new opportunities for dramatic scalability improvements via end-to-end optimization for streams spanning time-series, video, and structured data. MacroBase is a core component of the Stanford DAWN project (http://dawn.cs.stanford.edu/), a new research initiative designed to enable more usable and efficient machine learning infrastructure.

 

Sept 12: TBA

Date: Sept. 12, 2pm

Speaker: Yashar Hezaveh

 

Sept. 26: Optimal Segmentation with Pruned Dynamic Programming

Date: Sept. 26, 2pm

Speaker: Jeffrey Scargle (NASA)

Bayesian Blocks (1207.5578) is an O(N**2) dynamic programming algorithm to compute exact global optimal segmentations of sequential data of arbitrary mode and dimensionality. Multivariate data, generalized block shapes, and higher dimensional data are easily treated. Incorporating a simple pruning method yields a (still exact) O(N) algorithm allowing fast analysis of series of ~100M data points. Sample applications include analysis of X- and gamma-ray time series, identification of GC-islands in the human genome, data-adaptive triggers and histograms, and elucidating the Cosmic Web from 3D galaxy redshift data.


...