Overview
In this project we study and investigate network anomaly detection algorithms \[1\] \[2\] \[3\] for Internet Paths. We also develop a _Decision Theoretic Approach_ (DTA) based on our observations about regarding the characteristics of the performance measurements -measurement statistics obtained from the [IEPM-BW] project. Wiki Markup
To study and compare the algorithms we use the data sets collected by IEPM-BW spanning approximately 2 3 years (i.e. 2006 2005 - 2008). The Internet paths observed were the links between Stanford Linear Accelerator Center (SLAC) and the following sites:
- University of Toronto, Canada.
- Deutsches Elektronen-Synchrotron, Germany.
- Forschungszentrum Karlsruhe, Germany.
- European Organization for Nuclear Research, Geneva, Switzerland.
- San Diego Supercomputing Center (SDSC) USA,
- Oak Ridge National Laboratory (ORNL) USA,
- European Organization for Nuclear Research (CERN) Geneva, Switzerland,
- Forschungszentrum Karlsruhe (FZK) Germany,
- Deutsches Elektronen- Synchrotron (DESY) Germany and
- University of Toronto (UTORONTO) Canada.
- , USA.
- Switch, Switzerland.
- University of Florida, USA.
- National Laboratory for Particle and Nuclear Physics, Canada.
- Oak Ridge National Laboratory, USA.
- Budker Institute of Nuclear Physics, Russia.
- Daresbury Laboratory, United Kingdom.
- California Institute of Technology - CACR, USA.
- Istituto Nazionale di Fisica Nucleare, Italy.
- Czech NREN Operator, Czech Republic.
- Brookhaven National Laboratory, USA.
- Argonne National Laboratory, USA.
- California Institute of Technology - Ultralight, USA.
The topology of the monitoring framework is shown in figure 1.
Fig. 1: Topology of IEPM as of 07/2008 | Fig. 2: Deployment of Selected Sites |
---|---|
|
|
The number of measurements made to the following sites from SLAC:
Site | pathchirp | iperf | thrulay |
---|---|---|---|
cern.ch | 48647 | 24586 | 39510 |
desy.de | 32247 | 4522 | 28689 |
fzk.de | 65536 | 4874 | 42708 |
nslabs.ufl.edu | 41206 | 1549 | 28613 |
switch.edu | 19668 | 4638 | 28744 |
sdsc.edu | 21176 | 4416 | 22456 |
triumf.ca | 26425 | 4669 | 27021 |
utoronto.ca | 40614 | 5003 | 21646 |
ornl.gov | 35339 | 5182 | 18375 |
anl.gov | 17968 | 1 | 27559 |
bnl.org | 23580 | 20708 | 16000 |
cacr.caltech.edu | 61871 | 25525 | 37293 |
dl.ac.uk | 27806 | 6096 | 28058 |
nsk.su | 20117 | 1 | 26845 |
cesnet.cz | 23618 | 3062 | 28426 |
infn.it | 30372 | 4343 | 28573 |
ultralight.caltech | 3739 | 88 | 1534 |
SubTotal | 539929 | 119263 | 452050 |
Data Sets
The data sets used in the study may be downloaded from the links listed below. Latest performance statistics may be accessed from hereThese data sets were collected by the IEPM-BW project
Table 1: Performance measurement statistics compiled by IEPM, as seen from SLAC.
| Raw data | Labeled data | ||
---|---|---|---|---|
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="af77a26e-5c3f-4bbc-b021-c57754cf59d0"><ac:plain-text-body><![CDATA[ | SDSC | [csv], [xls] | [txt] | ]]></ac:plain-text-body></ac:structured-macro> |
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="4fa86a03-d606-43ee-acaa-0fc1f59270e3"><ac:plain-text-body><![CDATA[ | ORNL | [csv], [xls] | [txt] | ]]></ac:plain-text-body></ac:structured-macro> |
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="166873aa-0611-4c26-bf9b-b8734070aac5"><ac:plain-text-body><![CDATA[ | CERN | [csv], [xls] | [txt] | ]]></ac:plain-text-body></ac:structured-macro> |
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="6cdbfaeb-7450-4fcb-a903-eeaff71fb035"><ac:plain-text-body><![CDATA[ | FZK | [csv], [xls] | [txt] | ]]></ac:plain-text-body></ac:structured-macro> |
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="d2f338ae-e709-4e88-9aa7-d629a1906d83"><ac:plain-text-body><![CDATA[ | DESY | [csv], [xls] | [txt] | ]]></ac:plain-text-body></ac:structured-macro> |
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="04360e4c-1511-4df2-a40b-7a7c049a9b2a"><ac:plain-text-body><![CDATA[ | UTORONTO | [csv], [xls] | [txt] | ]]></ac:plain-text-body></ac:structured-macro> |
Labeling Algorithm
The labeling algorithm is as under:
Implementations and Parameter Tuning
The source code of the implementations and the tuning of parameters is discussed below.
References
...
Data Sets with Events | Data Sets with no Events | |
---|---|---|
IEPM |
All files with name "filename_raw_dataset.pathchirp" contain the raw data i.e the available bandwidth measurements along with the timestamps which are used in all algorithms.
All files with name "filename_event_file.txt" contain the list of events identified.
Technical Report - Labeling and Comparative Analysis
The technical report titled "A performance evaluation of anomaly detection algorithms for Internet Paths" will be available soon.
Input/Tuning parameters
Plateau Algorithm (PL)
History Buffer Length (H) | Trigger Buffer Length (T) | Threshold (th) | Sensitivity (s) |
---|---|---|---|
240 | 6 - 45 | 0.10 - 0.70 | 1.0 - 2.8 |
Kalman Filters Method (KF)
Sensitivity (K) | Time Window (h) |
---|---|
0.001 - 11.0 | 6 - 20 |
Holt Winter's Method (HW)
? - alpha | ? - beta | ? - gamma | ? - sigma |
---|---|---|---|
0.1 | 0.1 - 0.3 | 0.1 - 0.5 | 2.0 |
Adaptive Fault Detector (AFD)
Window Size (N) | ? - alpha | ? - beta | No. of Training Data (Hn) |
---|---|---|---|
20 | 0.95 | 0.0015 - 0.1 | 100 |
Decision Theoretic Approach (DTA)
History Buffer Length (N) | ? - alpha | ? - beta | Median filter length ( n) |
---|---|---|---|
30 - 90 | 0.01 - 0.34 | 0.99 | 100 |
ROC Results
Datasets with Gaussian Distributions
CERN | FZK | SDSC |
---|---|---|
|
|
|
TRIMUF | UTORONTO |
---|---|
|
|
Datasets with Weibull Distributions
DESY | NSLABS | SWITCH |
---|---|---|
|
|
|
...