You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 11 Next »

IEPM Tasks

Last update: September 4, 2006, Archive
Awaits something, also provides a start of wait date
Done or Drop is deleted when it is > a month old.
- Person(s) responsible
Task being worked on or to be discussed at group meeting
Changes

Action Items

  1. Terapaths Netflow (see http://iepmbw.bnl.gov/netflow/index.html) - Yee
    1. Get security requirements from SLAC/John H - Yee
    2. Make javascript work on non-Firefox browsers (non-DOM needs fixing) - Yee
    3. Implement grouping for graphs and tables - Yee
    4. Add spider and pie charts - Yee
    5. Discuss with Connie type of useful exec level plots - Yee, Connie, Les
    6. Finalise installation script - Akbar, Yee
      1. Add patching - need distribution mechanism for tarball storage - Yee, Akbar
      2. Test perl prefix installations - Akbar, Yee
    7. Factorise out TopN code in JKFlow.pm - Akbar
    8. Refactorise JKFlow code for QoS analysis (restructure file structure and possibly QoS analaysis) - Akbar
  2. PingER
    1. Make sure Maxim has all the latest monitoring nodes - Jerrod
    2. Get ping-data.pl working at sfsmds2.vsnl.in - Jerrod
    3. Restore data gathering from indix.ncst.ernet.in to Wisconsin  target node.
      1. Get traceroute server (sent mail on 9/5/06) - Jerrod
      2. Update ping_data.pl (sent mail on 9/5/06) - Jerrod
      3. Remove Wisconsin target Pinger node from monitoring (sent mail on 9/5/06) - Jerrod
      4. Binu Abraham [binu@cdacmumbai.in] responded to these mails on 9/6/06 and  states he will perform the above tasks on today 9/6/06

  3. Transport services evaluation - Yee Ting Li
    1. Work with Microsoft - Yee, Les
      1. Get latest privates
      2. Decide what is needed for stage 2
    2. NDT server at SLAC (see http://nettest5:7123) - Yee
      1. installed on nettest5 NDT Install - Yee.
      2. Talk security into allowing access - Yee.
  4. MonALISA (no progress 3/12/06, awaits iepm-bw OWAMP integration, keeping servers running) - Connie
    1. Upload selected data (initially IEPM data from BNL, SLAC, Caltech, CERN) using a single object for efficiency (awaits Iosif's new version of ML/APMon) - Adnan, Iosif
    2. Figure out how to display IEPM monitoring hosts and their data - Fawad, Aziz
    3. Project defined and assigned to Akbar and Waqar (3/11/06) - Akbar, Waqar
  5. IEPM-BW
    1. Work with DESY to get new monitoring host (contacted Kars 7/20/04, Kars going on 2 weeks vacation then Jerrod is away, time to re-start 8/26/04, wait for v3, Jerrod sent email reminder 3/25/05, Kars will be here later this month (27th April '06), Jerrod contact him before he arrives) (awaits V3 of iepm-bw) - Jerrod
    2. Make FZK an IEPM Monitoring node - Connie
      1. Get contact for Connie (sent email 8/22/06, now awaits Connie) - Les
    3. Update metrics used
      1. ID and add more targets for pathload - Connie, Jerrod
    4. Get distribution kit for iepm monitoring nodes to install & configure - Jerrod
      1. Update pre-reqs document - Jerrod
      2. Build pacman procedure so admin can do own install (now works to make the database, next step is to create the tables, and copy over and configure the crontabs 3/23/06) [dropped 8/21/06] - Jerrod

      3. After re-think divide task up between what pacman does well, and script the rest [Dropped pacman 8/21/06]

        1. Develop on Taiwan (start 4/17/06) - Jerrod
    5. Add LHC Atlas hosts to IEPM-BW (list sent to Jerrod 9/6/06) - Jerrod, Les, Connie
    6. Get architecture of remote nodes and create a web page (wil get back to in Sept 06) - Jerrod
    7. Write script to use ssh to get the configurations of IEPM monitor and remote hosts (in progress 4/26/06, will revisit Sep '06) - Jerrod
    8. Do we want to get reverse traceroutes (at least where we have reverse traceroute servers, awaits time) - Connie
    9. Compare pathchirp and pathload - Connie
      1. Make up a proposal (see if we need it) - Connie, Adnan## Fixes needed:
      2. Fix up TCP receive buffer sizes, add sanity checks - needs root on target hosts, which we do not have - (question)
      3. check-cron script: I do not know if this is still in use, but looking at the code, it references files in your ~jerrodw/bin and this is not transportable. Please fix it if it is still needed, or get rid of it if it is not.
      4. make-bw-html script: I see code in there for putting up the comments, however  I am not sure that it is correct. Please check it over. I do not see how there is any guarantee that $com{$node} is ever defined. Also please comment how it works. Why is there a 'require' to get the comments rather than just returning them in @ans: @ans=`$iepmSrcDir/fetch-comments`?
      5. Other code issues: I am reading all the code to look for issues that might make mysql hang onto sockets and other obvious problems. In many cases of your code, there is no description of what the script does, so I do not know if it is used or if it is not needed any more.
        1. ckavailtables: This does not call rdsth-> finish and will not work on the newer versions of mysql. Also it directly references /afs files, and if this is part of the ported system, it will fail on other nodes.
        2. ckaddednode: This does not call rdsth-> finish and will not work on the newer versions of mysql.
        3. fetch-added-node: I do not think that this will even compile.Also please use several lines to create a long sql statement, not one very very long line.
        4. fetch-comments: Does not close the $rdsth or $rddbh - this may cause the sockets to stay open
        5. fetch-stale-scheduled-tasks: Does not close the $rdsth or $rddbh - this will cause the sockets to stay open I fixed this one because I know it is called.
        6. Back on July 27 I sent email about the requirements for finish and disconnect, I got a repsonse back that it had been taken care of, but it appears it hasn't. 
      6. CGIs need work...add node is out of date, toolspecs cgi puts a lot of entry entries into the toolspecs table.
  6. Traceanal - Yee, Asif
    1. Integrate new topology into web server - Yee
      1. Identify the most used routes - Asif
      2. Integrate with pathneck to color links based on speed - Asif
      3. Rendering of topology much slower on www.slac.stanford.edu - Yee
    2. Modularise traceanal code for extensibility with non-iepmbw data - Asif, Yee
      1. Prepare distributable version of traceanal - Yee
  7. Alerts
    1. Look at multivariate event detection (collect data for SLAC, BNL, Caltech pathchirp, thrulay,ping) - Adnan
    2. Need to extend pathload to other sites - Connie
      1. Run plateau on the data for min-RTT, thrulay, pathchirp - Mahesh
      2. Apply to PCA to the same data
    3. Look at improvements to plateau
      1. Ability to find step ups - Adnan
      2. Extend to allow up & down then compare down with original - Adnan
      3. Allow for small number of samples (e.g. at start) - Mahesh
    4. Look at other detection algorithms and compare
      1. Holt-Winters - Les, Mahesh, Felipe
        1. Go back 7 weeks - Mahesh
        2. Check unusual results - Mahesh
        3. Consider other ways to optimize parameters - Mahesh
      2. Neural networks
      3. KS
        1. Look at making points before larger than points after - Akbar
    5. Prepare table of canonical events and how various algorithms react - Adnan
      1. Build case studies of email events (how is this coming on?) - Adnan
    6. Look into host monitoring/isolation
      1. Look at installing LISA/APMon at monitoring sites so can eliminate events caused by host congestion
      2. Ganglia
      3. Nagios
      4. Monitor NIC errors
    7. Look at how to use PerfSONAR - Adnan
    8. Look at detecting outages for ping - Connie
      1. Analyze what constitutes a significant outage - Connie
    9. Understand cause of delayed alerts and see if can improve - Connie
    10. Diagnose events - Adnan
    11. Extend database to add trigger start date/time, trigger detection date/time in database - Connie
  8. Install WANMON as IEPM web server - Yee
    1. Port CGI-WRAP [Done 8/10/06] - Les, Yee

    2. Get NFS and AFS accesss - Yee
    3. Get approval for externally visible web server
    4. Get& traceroute.pl and pingtable.pl working and in production
  9. IEPM-BW Web Services - Yee
    1. To be impelmented via PerfSONAR - Yee, Asif
      1. Measurement Archive Service for IEPM-BW data - Yee
        1. Install SQL-MA info- Yee
        2. Write Ibatis configs for IEPM-BW data - Yee
        3. NMWG requirements? - Yee.
    2. Does our web services access work (need to contact Warren, await proposals, and stability of implementations) - Yee
  10. Set up Wiki
  11. Presentations/Talks/Visits/Papers/Documentation
  12. IPv6
  • No labels