Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

 

Raw data

Labeled data

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="f2d13f26b007d767-66e0d90e-47504b6c-b6f2b6d1-7b9cd211cc36b44fa3cb07a9"><ac:plain-text-body><![CDATA[

SDSC

[[csv

http://www.slac.stanford.edu/~kalim/event-detection/published-data/SDSC-pathchirp.csv]], [[xls

http://www.slac.stanford.edu/~kalim/event-detection/published-data/SDSC-pathchirp.xls]]

[[txt

http://www.slac.stanford.edu/~kalim/event-detection/published-data/UTORONTO-pathchirp-labeled-events.txt]]

]]></ac:plain-text-body></ac:structured-macro>

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="b86d87cb93d846fe-3851cc22-456e408a-9758acc4-9b9c4b8aa53600d6c5daaccb"><ac:plain-text-body><![CDATA[

CERN

[[csv

http://www.slac.stanford.edu/~kalim/event-detection/published-data/CERN-pathchirp.csv]], [[xls

http://www.slac.stanford.edu/~kalim/event-detection/published-data/CERN-pathchirp.xls]]

[[txt

http://www.slac.stanford.edu/~kalim/event-detection/published-data/CERN-pathchirp-labeled-events.txt]]

]]></ac:plain-text-body></ac:structured-macro>

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="ff62e4995d0bec68-3ae569e7-4a7849cc-98f198a8-102b39892bcf3480671769a2"><ac:plain-text-body><![CDATA[

FZK

[[csv

http://www.slac.stanford.edu/~kalim/event-detection/published-data/FZK-pathchirp.csv]], [[xls

http://www.slac.stanford.edu/~kalim/event-detection/published-data/FZK-pathchirp.xls]]

[[txt

http://www.slac.stanford.edu/~kalim/event-detection/published-data/FZK-pathchirp-labeled-events.txt]]

]]></ac:plain-text-body></ac:structured-macro>

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="ab83c5a10858a3e5-85b9ddb9-4fa341f7-82c49e10-0f3394d1ab9687678cb1a1a4"><ac:plain-text-body><![CDATA[

DESY

[[csv

http://www.slac.stanford.edu/~kalim/event-detection/published-data/DESY-pathchirp.csv]], [[xls

http://www.slac.stanford.edu/~kalim/event-detection/published-data/DESY-pathchirp.xls]]

[[txt

http://www.slac.stanford.edu/~kalim/event-detection/published-data/DESY-pathchirp-labeled-events.txt]]

]]></ac:plain-text-body></ac:structured-macro>

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="3c5cba9decb787c7-c09fef18-4b14453e-9e38ac5e-70ebb4724e40896f9029575f"><ac:plain-text-body><![CDATA[

UTORONTO

[[csv

http://www.slac.stanford.edu/~kalim/event-detection/published-data/UTORONTO-pathchirp.csv]], [[xls

http://www.slac.stanford.edu/~kalim/event-detection/published-data/UTORONTO-pathchirp.xls]]

[[txt

http://www.slac.stanford.edu/~kalim/event-detection/published-data/UTORONTO-pathchirp-labeled-events.txt]]

]]></ac:plain-text-body></ac:structured-macro>

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="744e6f336c4d011b-56dfa1c8-4cfd4ce2-b1c0a716-ba10406d4ea00ff75ebdc3b7"><ac:plain-text-body><![CDATA[

TRIUMF

[[csv

http://www.slac.stanford.edu/~kalim/event-detection/published-data/TRIUMF-pathchirp.csv]], [[xls

http://www.slac.stanford.edu/~kalim/event-detection/published-data/TRIUMF-pathchirp.xls]]

[[txt

http://www.slac.stanford.edu/~kalim/event-detection/published-data/TRIUMF-pathchirp-labeled-events.txt]]

]]></ac:plain-text-body></ac:structured-macro>

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="1dd74073a3ca09e1-c5f54c37-49df43ad-a90fb1ce-b3da369e9775273be35e8229"><ac:plain-text-body><![CDATA[

ORNL

[[csv

http://www.slac.stanford.edu/~kalim/event-detection/published-data/ORNL-pathchirp.csv]], [[xls

http://www.slac.stanford.edu/~kalim/event-detection/published-data/ORNL-pathchirp.xls]]

[txt]

]]></ac:plain-text-body></ac:structured-macro>

...

Before we proceed to discuss the labeling algorithm we define anomalous events as "a set of anomalous observations is called an event if the deviant observations persist for a period greater than  or equal to a defined epoch". With reference to the IEPM data, we define the epoch to span at least 3 hours.

...

To remove the anomalous bandwidth measurements from the dataset, we apply an n-tap median filter to the dataset. A median filter is a sliding window low-pass filter that stores n previous values of the input and at each step outputs the median of the stored values. Consequently, high frequency spikes are removed from the input data. Note that the value of n is a crude upper bound on the maximum duration for an anomaly. If a bandwidth change sustains itself beyond n observations then it is treated as a change in the underlying baseline behavior. Therefore, care should be exercised in choosing the value of n for a given bandwidth measurement dataset. We define an empirical lower bound on n as

No Format
n <= 2 * d * u
          (1)

where where d is the epoch and u is the average number of IEPM performance measurements made in one hour. In the present dataset, we observed that a maximum value of n=15 is sufficient to remove sustained and spurious bandwidth fluctuations.

...

No Format
0.5 <= (mean of  window)/(mean of filtered data) <= 1.5          (2)

We opt for such thresholds in light of definition of an event and the following inferences. In order to define an observation as anomaly, we conclude that it should be significantly different from the baseline observations. In order Now, to define 'significant difference' we manually observed data sets with and without anomalies. Our empirical observation suggests that nearly 6% of the observations show a difference of greater than or equal to 50% from the mean observation. These 6% observations comprise the anomalous measurements identified manually. While plotting the time series of these measurements we also observe negligible overlap with normal observations. This result is summarized in Fig. 2 below. AlsoIt is this conclusion that allows us to define (2). We also observed that, such deviant observations tend to maintain their state and feature small variation (irrespective of the duration of the event) thereby endorsing the fact that significant change is primarily observed in the mean observations and not in the variance.

Fig. 2. Percentage of Measurements significantly different from normal observations.

SLAC-UTORONTO

SLAC-DESY

The same may be verified by Fig. 3; A histogram of the normal observations in comparison to anomalous observations. It may be observed that the anomalous observations are significantly different from the baseline measurements.

Histogram of Normal and Anomalous observations

Image Added

Wiki Markup
The labeling algorithm and the decision theoretic approach for real-time anomaly detection are discussed at length in the research paper \[F. Hussain, U. Kalim, N. Latif, S. A. Khayam, "A Decision-Theoretic Approach to Detect Anomalies in Internet Paths", submitted to _INFOCOM 2009_\].
\\

...