Introduction

Anomaly detection in computer networks is becoming increasingly important. Several approaches exist for event detection problems. Majority of them have restricted themselves to single route analysis. Our aim is to apply Principal Component Analysis to address the problem of anomalous event detection for single route as well as multiple routes. The scheme is to be applied on different data sets. Most notable is ABwE measurements from SLAC (Stanford Linear Accelerator Center) to different parts of world. Other data sets include data from other tools like IPerf, Pathchirp end Ping etc. Data set from Fermi lab is also to be analysed. The tasks which are performed during analysis include pre-processing of data (Trimming, Normalization and Regularization), PCA analysis (Application of PCA and event detection) and study of results

Process

Principal Component Analysis is used in many application. Its basic functionality is dimensionality reduction. Following steps have been applied to use PCA.

1- Data set: The data should be a time series consisting of number of parameters. we have applied this scheme to a data set ranging from three parameters to eight different parameters.

2- Adjust mean: Mean for every parameter in the series is calculated and subtracted from every value. Result is a data set which is mean adjusted or zero mean data set.

3- Covariance Matrix: Covariance of every parameter is calculated with reference to every other parameter. Then results are arranged in the form of matrix. This is a square matrix. 

4- Eigen Vectors & Eigen Values: Next step is to calculate Eigen vectors and Eigen values. Each Eigen vector represents a principal component.

5- Feature Vectors: Feature vectors are constructed by removing Eigen vector (s) which are less important. It is done by first sorting all Eigen vectors and then by removing least important one.  

6- Abnormal Subspace Calculation: Multiplication of Feature Vectors with its transpose generated a square matrix. To get abnormal subspace this matrix is subtracted from identity matrix. Now  resultant matrix is multiplied by original data set. Magnitudes of every vector in this matrix is calculated and output is passed through an event detector which alerts any value that is away from normal more than a specific threshhold.

Note:  The procedure is little different for multi-route analysis. Multi-route analysis also involves regularization of data and an extra step of combining different data sets into one data set.

This is a collaborative effort. Stanford Linear Accelerator Center (SLAC) and NUST Institue of IT (NIIT) are carrying out a combined research work. This work is part ofmaggie-ns (Maggie-NIIT-SLAC) project.

Dataset

The process has been applied on following different data sets.

1-  Data  from 06/21/04 to 09/29/04:

This data is from ABwE tool. This data has three parameters. It is further grouped into similar nodes. similar nodes are those which share a maximum path with each other. This group includes

a) SLAC-DESY

b) SLAC-SWITCH

c) SLAC-CESNET

d) SLAC-FZK

e) SLAC-NIIT

f) SLAC-TRIUMF

2- Data from 01/01/06 to 02/28/06

This data is from ping. Minimum, Maximum and average round trip time were used.  Groups included 

a) BNL-DESY, BNL-DL, BNL-FZK,BNL-INFN,BNL-CESNET

b) SLAC-DESY, SLAC-DL, SLAC-FZK,SLAC-INFN,-CESNET

Terminology

Overlap: it is overlapping time of two different events i.e., did they occur at the same time or not.

Full Overlap: Events are overlapping with respect to time and this overlapping time is more than one hour.

Partial Overlap: Events are overlapping with respect to time but the time period is very small i.e. from 10 minutes to one hour.

No Overlap: Events are mutually exclusive.

Results are desribed seperately for each route. A description of each route is given below 

A) ABING ANALYSIS RESULTS

Route# A1 (SLAC-DESY, SLAC-SWITCH, SLAC-CESNET, SLAC-FZK, SLAC-NIIT, SLAC-TRIUMF)

B) PING ANALYSIS RESULTS

Route# B1 (SLAC-DESY, SLAC-SWITCH, SLAC-CESNET, SLAC-FZK, SLAC-NIIT, SLAC-TRIUMF)

Route# B2 (SLAC-DESY, SLAC-SWITCH, SLAC-CESNET, SLAC-FZK, SLAC-NIIT, SLAC-TRIUMF)

Implementation details and usage

 

  • No labels