Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

 

Raw data

Labeled data

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="1282720593a5dbce-fbddef07-45eb4982-867eacd3-576cc5e1e7356646c7cdb172"><ac:plain-text-body><![CDATA[

SDSC

[[csv

http://www.slac.stanford.edu/~kalim/event-detection/published-data/SDSC-pathchirp.csv]], [[xls

http://www.slac.stanford.edu/~kalim/event-detection/published-data/SDSC-pathchirp.xls]]

[[txt

http://www.slac.stanford.edu/~kalim/event-detection/published-data/UTORONTO-pathchirp-labeled-events.txt]]

]]></ac:plain-text-body></ac:structured-macro>

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="472cef4d93994e68-da6f679f-4d514e60-b110ab71-a95117368d8c088297e515f8"><ac:plain-text-body><![CDATA[

CERN

[[csv

http://www.slac.stanford.edu/~kalim/event-detection/published-data/CERN-pathchirp.csv]], [[xls

http://www.slac.stanford.edu/~kalim/event-detection/published-data/CERN-pathchirp.xls]]

[[txt

http://www.slac.stanford.edu/~kalim/event-detection/published-data/CERN-pathchirp-labeled-events.txt]]

]]></ac:plain-text-body></ac:structured-macro>

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="d95f9bc1be811ecb-d96f7128-46c54c88-a8d6bae5-72f9e1e3175ea377cc54406f"><ac:plain-text-body><![CDATA[

FZK

[[csv

http://www.slac.stanford.edu/~kalim/event-detection/published-data/FZK-pathchirp.csv]], [[xls

http://www.slac.stanford.edu/~kalim/event-detection/published-data/FZK-pathchirp.xls]]

[[txt

http://www.slac.stanford.edu/~kalim/event-detection/published-data/FZK-pathchirp-labeled-events.txt]]

]]></ac:plain-text-body></ac:structured-macro>

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="c60dc4bfa522472f-77bf31e3-42664743-8d2cb0d7-2912a7397aac28c6e10de27e"><ac:plain-text-body><![CDATA[

DESY

[[csv

http://www.slac.stanford.edu/~kalim/event-detection/published-data/DESY-pathchirp.csv]], [[xls

http://www.slac.stanford.edu/~kalim/event-detection/published-data/DESY-pathchirp.xls]]

[[txt

http://www.slac.stanford.edu/~kalim/event-detection/published-data/DESY-pathchirp-labeled-events.txt]]

]]></ac:plain-text-body></ac:structured-macro>

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="097faeb94d226894-f098d4f8-45d24f6c-84aa85ec-271ff40214bcd84fa2601bef"><ac:plain-text-body><![CDATA[

UTORONTO

[[csv

http://www.slac.stanford.edu/~kalim/event-detection/published-data/UTORONTO-pathchirp.csv]], [[xls

http://www.slac.stanford.edu/~kalim/event-detection/published-data/UTORONTO-pathchirp.xls]]

[[txt

http://www.slac.stanford.edu/~kalim/event-detection/published-data/UTORONTO-pathchirp-labeled-events.txt]]

]]></ac:plain-text-body></ac:structured-macro>

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="ab056001209994c2-59192365-415840c7-a373bb4a-9a8a26a4500bc7fe9b09e292"><ac:plain-text-body><![CDATA[

TRIUMF

[[csv

http://www.slac.stanford.edu/~kalim/event-detection/published-data/TRIUMF-pathchirp.csv]], [[xls

http://www.slac.stanford.edu/~kalim/event-detection/published-data/TRIUMF-pathchirp.xls]]

[[txt

http://www.slac.stanford.edu/~kalim/event-detection/published-data/TRIUMF-pathchirp-labeled-events.txt]]

]]></ac:plain-text-body></ac:structured-macro>

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="512573050b0cc112-d9c87d8b-44714823-b6c4b800-c26a72fea6e09d33e61d9a84"><ac:plain-text-body><![CDATA[

ORNL

[[csv

http://www.slac.stanford.edu/~kalim/event-detection/published-data/ORNL-pathchirp.csv]], [[xls

http://www.slac.stanford.edu/~kalim/event-detection/published-data/ORNL-pathchirp.xls]]

[txt]

]]></ac:plain-text-body></ac:structured-macro>

...

Wiki Markup
To perform a fair comparison between \[1\], \[2\], \[3\] and the proposed DTA we devised a labeling algorithm to identify true anomalies in the data sets. This labeled data was then used to determine the accuracy (true-positive rate), corresponding false-positive rate and the detection delay. 

Events

Before we proceed to discuss the labeling algorithm we define anomalous events as "a set of anomalous observations is called an event if the deviant observations persist for a period greater than  or equal to a defined epoch". With reference to the IEPM data, we define the epoch to span at least 3 hours.

Given a set of bandwidth measurements, extraction of the baseline behavior essentially entails removing all anomalous observations (and the corresponding bandwidth values) from the set. The remaining measurements can then be used to characterize the baseline behavior. Anomalous bandwidth values always cause significant fluctuations in the measurements, albeit these fluctuations may be sustained or spurious in nature.

To remove the anomalous bandwidth measurements from the dataset, we apply an n-tap median filter to the dataset. A median filter is a sliding window low-pass filter that stores n previous values of the input and at each step outputs the median of the stored values. Consequently, high frequency spikes are removed from the input data. Note that the value of n is a crude upper bound on the maximum duration for an anomaly. If a bandwidth change sustains itself beyond n observations then it is treated as a change in the underlying baseline behavior. Therefore, care should be exercised in choosing the value of n for a given bandwidth measurement dataset. We define an empirical lower bound on n as

No Format

n <= 2 * d * u

where d is the epoch and u is the average number of IEPM performance measurements made in one hour. In the present dataset, we observed that a maximum value of n=15 is sufficient to remove sustained and spurious bandwidth fluctuations.

The median filtered dataset are treated as normal bandwidth values of an Internet path. The baseline behavior is then characterized by computing the mean of the filtered measurements. Then as per our definition an anomaly is flagged when an anomaly deviates significantly from this baseline behavior and sustains itself for more than 3 hours.

To flag significant deviations, we first compute the mean of the baseline. We then analyze the measurements in sub-sets (windows) of length 3. The mean of the window is computed and a test is performed such that:

No Format

0.5 <= (mean of  window)/(mean of filtered data) <= 1.5

We opt for such thresholds in light of definition of an event. In order to define an observation as anomaly, we conclude that it should be significantly different from the baseline observations. In order to define 'significant difference' we manually observed data sets with and without anomalies. Our empirical observation suggests that nearly 6% of the observations show a difference of greater than or equal to 50% from the mean observation. This result is summarized in Fig. 2 below. Also, such deviant observations tend to maintain their state and feature small variation (irrespective of the duration of the event) thereby endorsing the fact that significant change is primarily observed in the mean observations and not in the variance.

Fig. 2. Percentage of Measurements significantly different from normal observations.

SLAC-UTORONTO

SLAC-DESY

Image Added

Image Added


Wiki Markup
The labeling algorithm and the decision theoretic approach for real-time anomaly detection are discussed at length in the research paper. (_The paper will be posted soon_.) \[F. Hussain, U. Kalim, N. Latif, S. A. Khayam, "A Decision-Theoretic Approach to Detect Anomalies in Internet Paths", submitted to _INFOCOM 2009_\].
\\

Implementations and Parameter Tuning

...