Background

Event Diagnosis is the next step after Event Detection. IEPM group has established a large infrastruture for network performance measurment all over the world. This infrasture provides near real time network performance related data. IEPM group has developed analysis techniques and tools that detect drop in performance. Event Diagnosis project is aimed at finding out the cause of performance drop. So it is a sub-system under IEPM-BW being developed over Event Detection. Current work is being carried out by a team of researchers at SLAC but given the nature and evident benfits of project it is expected to expand quickly.

Implementation Details

IEPM infrastructure consists of Monitoring Nodes and Monitored Nodes. For Event Diagnosis pupose we add another term Central Node. Below is a small description of each type of node.

Monitoring Node: A node that runs tools for network measurment, collects data, stores data, analyzes data for Event Detection and generates alert on the basis of analysis result. Currently we have six such nodes.

Monitored Node: A node that is measured by a Monitoring Node. Each Monitoring Node has its own set of Monitored Nodes. Currently we have about 40 monitored nodes all over the world.

Central Node: A node that runs Event Diagnosis analysis. There is only one such node for the time being (SLAC)

All the implementation is in the form of perl scripts or cgi/perl scripts. Some scripts reside on monitoring nodes and some on central node. There is no script currently on any monitored node.

Code on Monitoring Nodes: Every monitoring node has three cgi scripts.

nodeid_host.cgi: if script is called with option n, should be provided with node id. if script is called with option h, should beprovide ipv4 host alias. In either case it return complete record from which one can deduce node id or ipv4 host alias. 

alert_rec.cgi: if called with out any option, returns all vailable alerts present on monitoring node other wise returns only th alerts which fulfill tyhe criteria

tracert_analysis.cgi: script should be provided with trace route destination and time period. It return whether there was a route change between this period or not  

Code on Central Node: Central Node has following scripts.

datatracker.pl: has basic functioanality to get data from all monitoring nodes. To use it it should be imported in the script and its APIs should be used. It currently provides following APIs

controller.pl: Performs all analysis on data obtained using APIs of datatracker.pl. Generates output in the form of web page. Also updates database with newly analyzed results

get_asn.pl: package used to get autonomous system information by providing host name or ip address. This script is based on Yee's package.

analyze-all.pl: a wrapper over controller.pl, which anlyzes all the alerts available in all the monitorin nodes 

op_main.pl: updates summary web page with latest results in database

Results

Results are store in database table DIAG_RESULT. Results are also accessible here. This page is a summary of all results. Each analyzed alert also has a separate detailed analysis in a webpage. Detailed analysis for each analyzed alert is accessible through this summary page.

Presentations/Talks

Event Diagnosis -- New Heuristics  ppt , presented by Adnan Iqbal IEPM weekly meeting, 29th August 2006
Steps Towards Automated Event Diagnosis ppt (amended), presented by Yee-Ting Li at the ESnet Meeting, 20th July 2006
Steps Towards Automated Event Diagnosis ppt, presented by Yee-Ting Li at the Internet2 Measurements SIG, 18th July 2006

  • No labels