IEPM Tasks
Last update: September 4, 2006,Archive
Awaits something, also provides a start of wait date
Done or Drop is deleted when it is > a month old.
- Person(s) responsible
Task being worked on or to be discussed at group meeting
Changes
Action Items
- Terapaths
- Netflow (see http://iepmbe.bnl.gov/netflow/index.html) - Yee
- Talk to John H to find out his needs - Yee
- Try and make work on non-Firefox browsers (DOM needs fixing) - Yee
- Add spider and pie charts - Yee
- Discuss with Connie how to get permanent exec level plots - Yee, Connie, Les
- Netflow (see http://iepmbe.bnl.gov/netflow/index.html) - Yee
- PingER
- Make sure Maxim has all the latest monitoring nodes - Jerrod
- Get ping-data.pl working at sfsmds2.vsnl.in - Jerrod
- Transport services evaluation - Yee Ting Li
- Work with Microsoft - Yee, Les
- Get latest privates
- Decide what is needed for stage 2
- Work with Microsoft - Yee, Les
- MonALISA (no progress 3/12/06, awaits iepm-bw OWAMP integration, keeping servers running) - Connie
- Upload selected data (initially IEPM data from BNL, SLAC, Caltech, CERN) using a single object for efficiency (awaits Iosif's new version of ML/APMon) - Adnan, Iosif
- Figure out how to display IEPM monitoring hosts and their data - Fawad, Aziz
- Project defined and assigned to Akbar and Waqar (3/11/06) - Akbar, Waqar
- IEPM-BW
- Work with DESY to get new monitoring host (contacted Kars 7/20/04, Kars going on 2 weeks vacation then Jerrod is away, time to re-start 8/26/04, wait for v3, Jerrod sent email reminder 3/25/05, Kars will be here later this month (27th April '06), Jerrod contact him before he arrives) (awaits V3 of iepm-bw) - Jerrod
- Make FZK an IEPM Monitoring node - Connie
- Get contact for Connie (sent email 8/22/06, now awaits Connie) - Les
- Update metrics used
- ID and add more targets for pathload - Connie, Jerrod
- Get distribution kit for iepm monitoring nodes to install & configure - Jerrod
- Update pre-reqs document - Jerrod
Build pacman procedure so admin can do own install (now works to make the database, next step is to create the tables, and copy over and configure the crontabs 3/23/06) [dropped 8/21/06] - Jerrod
After re-think divide task up between what pacman does well, and script the rest [Dropped pacman 8/21/06]
- Develop on Taiwan (start 4/17/06) - Jerrod
- Write script to use ssh to get the configurations of IEPM monitor and remote hosts (in progress 4/26/06, will revisit Sep '06) - Jerrod
- Get architecture of remote nodes and create a web page (wil get back to in Sept 06) - Jerrod
- Do we want to get reverse traceroutes (at least where we have reverse traceroute servers, awaits time) - Connie
- Compare pathchirp and pathload - Connie
- Make up a proposal (see if we need it) - Connie, Adnan
- Bugs
- Fix up TCP receive buffer sizes, add sanity checks (in progress 4/26/06, Connie will talk to Yee to understand 8/22/06) - Connie
- Traceanal - Yee, Asif
- Integrate new topology into web server - Yee
- Identify the most used routes - Asif
- Integrate with pathneck to color links based on speed - Asif
- Rendering of topology much slower on www.slac.stanford.edu - Yee
- Prepare distributable version of traceanal - Yee
- Integrate new topology into web server - Yee
- Alert
- Look at multivariate event detection (collect data for SLAC, BNL, Caltech pathchirp, thrulay,ping) - Adnan
- Need to extend pathload to other sites - Connie
- Run plateau on the data for min-RTT, thrulay, pathchirp - Mahesh
- Apply to PCA to the same data
- Look at improvements to plateau
- Ability to find step ups - Adnan
- Extend to allow up & down then compare down with original - Adnan
- Allow for small number of samples (e.g. at start) - Mahesh
- Look at other detection algorithms and compare
- Holt-Winters - Les, Mahesh, Felipe
- Go back 7 weeks - Mahesh
- Check unusual results - Mahesh
- Consider other ways to optimize parameters - Mahesh
- Neural networks
- KS
- Look at making points before larger than points after- Akbar
- Holt-Winters - Les, Mahesh, Felipe
- Prepare table of canonical events and how various algorithms react - Adnan
- Build case studies of email events (how is this coming on?)- Adnan
- Look into host monitoring/isolation
- Look at installing LISA/APMon at monitoring sites so can eliminate events caused by host congestion
- Ganglia
- Nagios
- Monitor NIC errors
- Look at how to use PerfSONAR - Adnan
- Look at detecting outages for ping - Connie
- Analyze what constitutes a significant outage - Connie
- Understand cause of delayed alerts and see if can improve - Connie
- Diagnose events - Adnan
- Extend database to add trigger start date/time, trigger detection date/time in database - Connie
- Look at multivariate event detection (collect data for SLAC, BNL, Caltech pathchirp, thrulay,ping) - Adnan
- Install WANMON as IEPM web server - Yee
Port CGI-WRAP [Done 8/10/06]- Les, Yee
- Get NFS and AFS accesss - Yee
- Get approval for externally visible web server
- Get& traceroute.pl and pingtable.pl working and in production
- Install NDT server on NETTEST5
- IEPM-BW Web Services - Yee
- Does our web services access work (need to contact Warren, await proposals, and stability of implementations) - Yee
- Set up Wiki
- Presentations/Talks/Visits/Papers/Documentation
- IPv6