Overview
In this project we study and investigate network anomaly detection algorithms for Internet Paths. We also develop a Decision Theoretic Approach (DTA) based on our observations regarding the characteristics of the performance-measurement statistics obtained from the IEPM-BW project.
To study and compare the algorithms we use the data sets collected by IEPM-BW spanning approximately 3 years (i.e. 2005 - 2008). The Internet paths observed were the links between Stanford Linear Accelerator Center (SLAC) and the following sites:
- University of Toronto, Canada.
- Deutsches Elektronen-Synchrotron, Germany.
- Forschungszentrum Karlsruhe, Germany.
- European Organization for Nuclear Research, Geneva, Switzerland.
- San Diego Supercomputing Center, USA.
- Switch, Switzerland.
- University of Florida, USA.
- National Laboratory for Particle and Nuclear Physics, Canada.
- Oak Ridge National Laboratory, USA.
- Budker Institute of Nuclear Physics, Russia.
- Daresbury Laboratory, United Kingdom.
- California Institute of Technology - CACR, USA.
- Istituto Nazionale di Fisica Nucleare, Italy.
- Czech NREN Operator, Czech Republic.
- Brookhaven National Laboratory, USA.
- Argonne National Laboratory, USA.
- California Institute of Technology - Ultralight, USA.
The topology of the monitoring framework is shown in figure 1.
Fig. 1: Topology of IEPM as of 07/2008 |
Fig. 2: Deployment of Selected Sites |
---|---|
|
|
The number of measurements made to the following sites from SLAC:
Site |
pathchirp |
iperf |
thrulay |
---|---|---|---|
cern.ch |
48647 |
24586 |
39510 |
desy.de |
32247 |
4522 |
28689 |
fzk.de |
65536 |
4874 |
42708 |
nslabs.ufl.edu |
41206 |
1549 |
28613 |
switch.edu |
19668 |
4638 |
28744 |
sdsc.edu |
21176 |
4416 |
22456 |
triumf.ca |
26425 |
4669 |
27021 |
utoronto.ca |
40614 |
5003 |
21646 |
ornl.gov |
35339 |
5182 |
18375 |
anl.gov |
17968 |
1 |
27559 |
bnl.org |
23580 |
20708 |
16000 |
cacr.caltech.edu |
61871 |
25525 |
37293 |
dl.ac.uk |
27806 |
6096 |
28058 |
nsk.su |
20117 |
1 |
26845 |
cesnet.cz |
23618 |
3062 |
28426 |
infn.it |
30372 |
4343 |
28573 |
ultralight.caltech |
3739 |
88 |
1534 |
SubTotal |
539929 |
119263 |
452050 |
Data Sets
The data sets used in the study may be downloaded from the links listed below. These data sets were collected by the IEPM-BW project
Table 1: Performance measurement statistics compiled by IEPM, as seen from SLAC.
|
Data Sets with Events |
Data Sets with no Events |
||||||
---|---|---|---|---|---|---|---|---|
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="0a6aa915-472b-4ce4-bed8-ae575b0d2887"><ac:plain-text-body><![CDATA[ |
IEPM |
[[rar |
^datasets-with-events.rar]] 3.4 MB, [[zip |
^datasets-with-events.zip]] 3.6 MB |
[[rar |
^datasets-with-no-events.rar]] 3.3 MB, [[zip |
^datasets-with-no-events.zip]] 3.5 MB |
]]></ac:plain-text-body></ac:structured-macro> |
All files with name "filename_raw_dataset.pathchirp" contain the raw data i.e the available bandwidth measurements along with the timestamps which are used in all algorithms.
All files with name "filename_event_file.txt" contain the list of events identified.
Technical Report - Labeling and Comparative Analysis
The technical report titled "A performance evaluation of anomaly detection algorithms for Internet Paths" is available here.
The range of input parameters used for the comparative analysis are summarized below. Note that each algorithm requires a different set of parameters.
Input/Tuning parameters
Plateau Algorithm (PL)
History Buffer Length (H) |
Trigger Buffer Length (T) |
Threshold (th) |
Sensitivity (s) |
---|---|---|---|
240 |
6 - 45 |
0.10 - 0.70 |
1.0 - 2.8 |
Kalman Filters Method (KF)
Sensitivity (K) |
Time Window (h) |
---|---|
0.001 - 11.0 |
6 - 20 |
Holt Winter's Method (HW)
? |
? |
? |
? |
---|---|---|---|
0.1 |
0.1 - 0.3 |
0.1 - 0.5 |
2.0 |
Adaptive Fault Detector (AFD)
Window Size (N) |
? |
? |
No. of Training Data (Hn) |
---|---|---|---|
20 |
0.95 |
0.0015 - 0.1 |
100 |
Decision Theoretic Approach (DTA)
History Buffer Length (N) |
? |
? |
Median filter length ( n) |
---|---|---|---|
30 - 90 |
0.01 - 0.34 |
0.99 |
100 |
ROC Results
Datasets with Gaussian Distributions
CERN |
FZK |
SDSC |
UTO |
---|---|---|---|
|
|
|
|
Datasets with Weibull Distributions
DESY |
NSLABS |
SWITCH |
TRIMUF |
---|---|---|---|
|
|
|
|