Action Items

  1. Terapaths Netflow (see http://iepmbw.bnl.gov/netflow/index.html) - Yee
    1. Get security requirements from SLAC/John H - Yee
    2. Make javascript work on non-Firefox browsers (non-DOM needs fixing) - Yee
    3. Implement grouping for graphs and tables - Yee
    4. Add spider and pie charts - Yee [done 20060905]
    5. Tidy up legends for pie and spider charts - Yee
    6. Discuss with Connie type of useful exec level plots - Yee, Connie, Les
    7. Finalise installation script - Akbar, Yee
      1. Add patching - need distribution mechanism for tarball storage - Yee, Akbar
      2. Test perl prefix installations - Akbar, Yee [done 20060905]
      3. Add UI/cgi code to installation - Akbar, Yee
    8. Factorise out TopN code in JKFlow.pm - Akbar
    9. Refactorise JKFlow code for QoS analysis (restructure file structure and possibly QoS analysis) - Akbar
  2. Transport services evaluation - Yee
    1. Work with Microsoft - Yee, Les
      1. Get latest privates from Microsoft
      2. Decide what is needed for stage 2
    2. NDT server at SLAC (see http://nettest5:7123) - Yee
      1. Installed on nettest5 NDT Install - Yee [DONE 20060831]
      2. Talk security into allowing public access - Yee.
  3. PingER
    1. Make sure Maxim has all the latest monitoring nodes (ihep is probably bad, reported 9/16/06) - Jerrod
    2. Get ping-data.pl working at sfsmds2.vsnl.in (sent email to mantosh 9/8/06 with suggestions) [Done 9/19/06] - Jerrod
    3. Get correct server name displayed for sfsmds2.vsnl.net traceroute server (sent reminder 9/8/06) [Done 9/16/06] - Les
    4. Missing output from http://www-iepm.slac.stanford.edu/pinger/slaconly/analyze-hourly-out.pl in downsites.cgi (reported with suggestions 9/16/06) - Jerrod
    5. Fix indix.cdacmumbai.in (i.e. replace indix.ncst.ernet.in, now need to fix old raw data 9/19/06) - Jerrod
    6. Fix up downsites warnings on brunsvigia.tenet.ac.za (request to add www.sajhe.org.za 9/16/06) - Jerrod
    7. Patch helsinki.gatech.edu as monitoring site (requested 9/19/06) - Jerrod
    1. Restore data gathering from indix.ncst.ernet.in to Wisconsin  target node.
      1. Get traceroute server (sent mail on 9/5/06) - Jerrod
      2. Update ping_data.pl (sent mail on 9/5/06) - Jerrod
      3. Remove Wisconsin target Pinger node from monitoring (sent mail on 9/5/06) - Jerrod
      4. Binu Abraham [binu@cdacmumbai.in] responded to these mails on 9/6/06 and  states he will perform the above tasks on today 9/6/06
    2. Update guthrie with LHC - ATLAS nodes (9/7/06) -  Jerrod
    3. Insert latest 8 LHC nodes into /afs/slac/g/www/www-iepm/bw/iepmworld/rss.xml (9/7/06) - Jerrod
  1. Redesign and Implement Guthrie to cover both IEPM and PingER
  2. MonALISA (no progress 3/12/06, awaits iepm-bw OWAMP integration, keeping servers running) - Connie
    1. Upload selected data (initially IEPM data from BNL, SLAC, Caltech, CERN) using a single object for efficiency (awaits Iosif's new version of ML/APMon) - Adnan, Iosif
    2. Figure out how to display IEPM monitoring hosts and their data - Fawad, Aziz
    3. Project defined and assigned to Akbar and Waqar (3/11/06) - Akbar, Waqar
  3. IEPM-BW
    1. Work with DESY to get new monitoring host (contacted Kars 7/20/04, Kars going on 2 weeks vacation then Jerrod is away, time to re-start 8/26/04, wait for v3, Jerrod sent email reminder 3/25/05, Kars will be here later this month (27th April '06), Jerrod contact him before he arrives) (awaits V3 of iepm-bw) - Jerrod
    2. Make RAL a remote node
      1. Have account but cannot ssh to it (sent email to tasker 9/8/06) - Les
    3. Make FZK an IEPM Monitoring node - Connie
      1. Get contact for Connie (sent email 8/22/06, now awaits Connie) - Les
    4. Update metrics used
      1. ID and add more targets for pathload - Connie
    5. Get distribution kit for iepm monitoring nodes to install & configure - Jerrod
      1. Update pre-reqs document - Jerrod
    6. Add LHC Atlas hosts to IEPM-BW (list sent to Jerrod 9/6/06, added needs testing 9/6/06) [Done 9/8/06] - Jerrod, Les, Connie
    7. Add group for US-ATLAS -  Connie
    8. Diagnose orphan sockets problem - Connie
    9. Get architecture of remote nodes and create a web page (wil get back to in Sept 06) - Jerrod
    10. Write script to use ssh to get the configurations of IEPM monitor and remote hosts (in progress 4/26/06, will revisit Sep '06) - Jerrod
    11. Do we want to get reverse traceroutes (at least where we have reverse traceroute servers, awaits time) - Connie
    12. Compare pathchirp and pathload - Connie
      1. Make up a proposal (see if we need it) - Connie, Adnan
    13. Fix up TCP receive buffer sizes, add sanity checks - needs root on target hosts, which we do not have - (question)
    14. Bugs:
      1. check-cron script: I do not know if this is still in use, but looking at the code, it references files in ~jerrodw/bin and this is not transportable. Need to fix it if it is still needed, or get rid of it if it is not - Jerrod
      2. make-bw-html script: there is code for putting up the comments, however it is unclear  that it is correct. Please check it over. I do not see how there is any guarantee that $com{$node} is ever defined. Also please comment how it works. (Explained to Connie by Jerrod 9/11/06) Why is there a 'require' to get the comments rather than just returning them in @ans: @ans=`$iepmSrcDir/fetch-comments`?
      3. Other code issues (some of these may no longer be active, so should they be deleted to reduce confusion) - Jerrod ( 
        1. ckavailtables: This does not call rdsth> finish and will not work on the newer versions of mysql. Also it directly references /afs files, and if this is part of the ported system, it will fail on other nodes.(Done, Jerrod 9/11/06)-
        2. ckaddednode: This does not call rdsth> finish and will not work on the newer versions of mysql.(Done, Jerrod 9/11/06)-
        3. fetch-added-node: Does this compile. Use several lines to create a long sql statement, not one very very long line. (done; Jerrod 9/11/06)
        4. fetch-comments: Does not close the $rdsth or $rddbh - this may cause the sockets to stay open(Done by Jerrod 9/11/06)
        5. fetch-stale-scheduled-tasks: Does not close the $rdsth or $rddbh - this will cause the sockets to stay open [Done by Connie 9/4/06]
        6. New Mysql needs finish and disconnect to all active script (requested July '06, has it been doen? Yes, it was done on July 27. Those mentioned above were over looked at that time jerrod 9/11/06) - Jerrod
      4. CGIs need work:
        1. add node is out of date,
        2. toolspecs cgi puts a lot of entry entries into the toolspecs table.
  4. Traceanal - Yee, Asif [AWAITing Asif's arrival]## Integrate new topology into web server - awaits wan-mon appropoval - Yee
    1. Identify the most used routes - Asif
    2. Integrate with pathneck to color links based on speed - Asif
    3. Rendering of topology much slower on www.slac.stanford.edu - likely to be related to the web server, not code, as it runs quickly elsewhere- Yee
    4. Modularise traceanal code for extensibility with non-iepmbw data - Asif, Yee
      1. Prepare distributable version of traceanal - Yee
  5. Alerts and Diagnosis - Les, Yee
    1. Look at multivariate event detection (collect data for SLAC, BNL, Caltech pathchirp, thrulay,ping) - Adnan
    2. Need to extend pathload to other sites - Connie
      1. Run plateau on the data for min-RTT, thrulay, pathchirp - Mahesh
      2. Apply to PCA to the same data
    3. Look at improvements to plateau
      1. Ability to find step ups - Adnan
      2. Extend to allow up & down then compare down with original - Adnan
      3. Allow for small number of samples (e.g. at start) - Mahesh
    4. Look at other detection algorithms and compare
      1. Holt-Winters - Les, Mahesh, Felipe
        1. Go back 7 weeks - Mahesh
        2. Check unusual results - Mahesh
        3. Consider other ways to optimize parameters - Mahesh
      2. Neural networks
      3. KS
        1. Look at making points before larger than points after - Akbar
    5. Prepare table of canonical events and how various algorithms react - Adnan
      1. Build case studies of email events (how is this coming on?) - Adnan
    6. Look into host monitoring/isolation
      1. Look at installing LISA/APMon at monitoring sites so can eliminate events caused by host congestion
      2. Ganglia
      3. Nagios
      4. Monitor NIC errors
    7. Look at how to use PerfSONAR - Adnan
    8. Look at detecting outages for ping - Connie
      1. Analyze what constitutes a significant outage - Connie
    9. Understand cause of delayed alerts and see if can improve - Connie
    10. Diagnose events - Adnan
    11. Extend database to add trigger start date/time, trigger detection date/time in database - Connie
  6. Install WANMON as IEPM web server - Yee
    1. Port CGI-WRAP - Les, Yee [Done 8/10/06]
    2. Get NFS and AFS accesss - Yee
    3. Get approval for externally visible web server - Yee, Les [DONE approved 20060907]
      1. Submitted request - Yee [done 20060814]
    4. A v20z will be donated to SCS for this purpose - Yee, Les [AWAITing hardware and software installation by SCS]
    5. Get traceroute.pl and pingtable.pl working and in production (will be via John B. as they will be maintaining it) - Yee
  7. IEPM-BW Web Services - Yee
    1. To be impelmented via PerfSONAR - Yee, Asif
      1. Measurement Archive Service for IEPM-BW data - Yee
        1. Install SQL-MA info- Yee
        2. Write Ibatis configs for IEPM-BW data - Yee
        3. NMWG requirements? - Yee.
    2. Does our web services access work (need to contact Warren, await proposals, and stability of implementations) - Yee
  8. Set up Wiki
  9. Presentations/Talks/Visits/Papers/Documentation
  10. IPv6
  • No labels