You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 51 Next »

This list contains the action items for the IEPM-BW group and is to be used to determine the tasks and progress of IEPM-BW members. Members are expected to keep their tasks up-to-date and current.

An archive of the task list, as standing after our group meeting, is also kept for reference.Key

Key

Item is

Description

underscored

Awaits something; provide a description of the cause and also provide the date from which it has been waiting at the end of the task description

strikesthru'd

Task is complete or has been dropped; if dropped, provide a reason. Also provide a complete date after the task, or 'dropped' if appropiate

bold

Task is currently being worked on or is actively being discussed

The general format of each task shall be represented as such:

  1. Project 1 - <Task Manager>
    1. Task 1 - <Person(s) responsible>
      1. Minor task 1 - <Person(s) responsible>
      2. Minor task 2 - <Person(s) responsible> - [DROPPED due to lack of interest]
    2. Task 2 - <Person(s) responsible>
  2. Project 2 - <Task Manager>
    1. Task 1 - <Person(s) responsible> [DONE 20060901]
    2. Task 2 - <Person(s) responsible> [AWAITING email contact back from Bob Smith]

Completed items must only be striked through rather than removed. These items will then be removed after each face-to-face meeting when the archive is updated.

Action Items

Terapaths Netflow (see http://iepmbw.bnl.gov/netflow/index.html) - Yee

    1. Get security requirements from SLAC/John H - Yee
    2. Discuss with Connie type of useful exec level plots - Yee, Connie, Les
    3. Presentation/Front-End - Yee
      1. Make work on non-Firefox browsers (non-DOM needs fixing) [Drop 9/29/06] - Yee

      2. Implement grouping for graphs and tables - Yee [DONE 20060921]

      3. Adapt RRDGraph::Group.pm to support non-grouped entries - Yee [DONE 20061019]

      4. Request for a lot of data fails due to limitation of GET argument length (the table shows because it's using a POST rather than a GET#### Figure out how to conduct a POST through javascript to request for the graph [Done 9/28/06, now use SVG] - Yee

      5. Add spider and pie charts - Yee [done 20060905]

      6. Reimplement spider and pie charts for SVG::TT - Yee
        1. Implement pie chart class in SVG::TT - Yee
        2. Implement spider class in SVG/Template format - Shahryar, Yee
        3. Tidy up legends for pie and spider charts - Yee
      7. Fix labels on time series plots - Yee [DONE 20061019]

    4. Processing Back-End - Yee, Akbar
      1. Factorise out TopN code in JKFlow.pm - Akbar
      2. Refactorise JKFlow code for QoS analysis (restructure file structure and possibly QoS analysis) - Akbar
      3. Technical Documentation - Akbar
    5. Testing and Installation - Yee, Akbar
      1. Finalise installation script - Akbar, Yee
        1. Add patching - need distribution mechanism for tarball storage - Yee, Akbar
        2. Test perl prefix installations - Akbar, Yee [done 20060905]

      2. Add UI/cgi code to installation - Akbar, Yee
      3. Try out installation on fresh machine

Transport services evaluation - Yee

    1. Work with Microsoft - Yee, Les
      1. Get latest privates from Microsoft
      2. Decide what is needed for stage 2
      3. Install Vista/Longhorn - Yee
    2. NDT server at SLAC (see http://nettest5:7123) - Yee
      1. Installed on nettest5 NDT Install - Yee [DONE 20060831]

      2. Talk security into allowing public access - Yee.

PingER

    1. Make sure Maxim has all the latest monitoring nodes (ihep is probably bad, reported 9/16/06, sent mail/attached excel sheet to Maxim 9/20/06) - Jerrod
    2. Get ping-data.pl working at sfsmds2.vsnl.in (sent email to mantosh 9/8/06 with suggestions) [Done 9/19/06] - Jerrod

    3. Get correct server name displayed for sfsmds2.vsnl.net traceroute server (sent reminder 9/8/06) [Done 9/16/06] - Les

    4. Missing output from http://www-iepm.slac.stanford.edu/pinger/slaconly/analyze-hourly-out.pl in downsites.cgi (reported with suggestions 9/16/06, still occurs using pinger as crojob account 9/23/06, moved from iepm to pinger, still there 9/29/06[Done 930/06]) - Jerrod

    5. Fix indix.cdacmumbai.in (i.e. replace indix.ncst.ernet.in, now need to fix old raw data 9/19/06, fixed 9/20/06) - Jerrod
    6. Fix up downsites warnings on brunsvigia.tenet.ac.za (request to add www.sajhe.org.za 9/16/06, done 9/18/06) - Jerrod
    7. Patch helsinki.gatech.edu as monitoring site (requested 9/19/06, must locate monitoring site reporting this 9/18/06[Done 9/25/06]) - Jerrod

    8. Try and restore itep.ru Beacon site (email sent to Ilya, ICFA & Greg Cole 9/23/06, have contact and have emailed 9/28/06) - Les
    9. Restore data gathering from indix.ncst.ernet.in to Wisconsin target node.
      1. Get traceroute server (sent mail on 9/5/06) - Jerrod
      2. Update ping_data.pl (sent mail on 9/5/06) - Jerrod
      3. Remove Wisconsin target Pinger node from monitoring (sent mail on 9/5/06) - Jerrod
      4. Binu Abraham [binu@cdacmumbai.in] responded to these mails on 9/6/06 and  states he will perform the above tasks on today 9/6/06

    10. Update guthrie with LHC - ATLAS nodes (9/7/06) -  Jerrod
    11. Insert latest 8 LHC nodes into /afs/slac/g/www/www-iepm/bw/iepmworld/rss.xml (9/7/06) - Jerrod
    12. Added pinger.ictp.it as a monitoring node (9/23/06) - Jerrod
      1. Email sent to ICTP 6/5/06 (responded 6/7/06, awaits director return, they agree 7/20/06, sent reminder 9/4/06, reinstalled pinger on the pinger.ictp.it node 9/25/06-Done): Jerrod
    13. Check Brunsvigia pinger node for data collection problems (9/25/06, they are moving sites so down 9/28/06[Done 10/4/06]) - Jerrod

    14. Re-run analyze-all.pl on the last 90days of pinger data. (9/25/06[Done 9/27/06]) - Jerrod

    15. Removed several nodes from brunsvigia pinger monitoring to reduce downsites output (9/24/06) - Jerrod:
      1. www.ci.uem.mz
        www.unam.na
        students.techpta.ac.za
        81.199.21.194
        www.netpress.bi
        www.ml.refer.org
        www.marwan.ma
        www.micti.co.mz
        www.museumsnc.co.za
        www.ugb.sn
        www.natmus.cul.na
        www.uac.bj.refer.org
        www.uonbi.ac.ke
        mail.gnet.tn
        www.refer.mg
        www.ru.ac.za
        www.gncic.org.gh
        www.refer.sn
        www.kie.ac.rw
        www.ird.ne
        www.drfn.org.na
        www.maurifemme.mr
        www.sudan.gov.sd
        www.muchs.ac.tz

Redesign and Implement Guthrie to cover both IEPM and PingER

MonALISA (no progress 3/12/06, awaits iepm-bw OWAMP integration, keeping servers running) - Connie

    1. Upload selected data (initially IEPM data from BNL, SLAC, Caltech, CERN) using a single object for efficiency (awaits Iosif's new version of ML/APMon) - Adnan, Iosif
    2. Figure out how to display IEPM monitoring hosts and their data - Fawad, Aziz
    3. Project defined and assigned to Akbar and Waqar (3/11/06) - Akbar, Waqar

IEPM-BW

    1. Work with DESY to get new monitoring host (contacted Kars 7/20/04, Kars going on 2 weeks vacation then Jerrod is away, time to re-start 8/26/04, wait for v3, Jerrod sent email reminder 3/25/05, Kars will be here later this month (27th April '06), Jerrod contact him before he arrives) (awaits V3 of iepm-bw) - Jerrod
    2. Make RAL a remote node
      1. Have account but cannot ssh to it (sent email to tasker 9/8/06) - Les
    3. Make FZK an IEPM Monitoring node - Connie
      1. Get contact for Connie (sent email 8/22/06, now awaits Connie) - Les
    4. Update metrics used
      1. ID and add more targets for pathload - Connie
    5. Get distribution kit for iepm monitoring nodes to install & configure (Done adding owamp to the installkit as requested 10/3/06) - Jerrod
      1. Update pre-reqs document - Jerrod
    6. Add LHC Atlas hosts to IEPM-BW (list sent to Jerrod 9/6/06, added needs testing 9/6/06) [Done 9/8/06] - Jerrod, Les, Connie

    7. Add group for US-ATLAS [Done 9/20/06] -  Connie

    8. Diagnose orphan sockets problem - Connie
    9. Get architecture of remote nodes and create a web page (wil get back to in Sept 06) - Jerrod
    10. Write script to use ssh to get the configurations of IEPM monitor and remote hosts (in progress 4/26/06, will revisit Sep '06) - Jerrod
    11. Do we want to get reverse traceroutes (at least where we have reverse traceroute servers, awaits time) - Connie
    12. Compare pathchirp and pathload - Connie
      1. Make up a proposal (see if we need it) - Connie, Adnan
    13. Fix up TCP receive buffer sizes, add sanity checks - needs root on target hosts, which we do not have - (question)
    14. Bugs:
      1. check-cron script: I do not know if this is still in use, but looking at the code, it references files in ~jerrodw/bin and this is not transportable. Need to fix it if it is still needed, or get rid of it if it is not (the code is needed and is not transported to any remote systems at all thus only run inhouse and not necessary to remove the ~jerrodw reference. This has been explained several times but if this entire script needs to be removed, let me know) - Jerrod
      2. make-bw-html script: there is code for putting up the comments, however it is unclear  that it is correct. Please check it over. I do not see how there is any guarantee that $com{$node} is ever defined. Also please comment how it works. (Explained to Connie by Jerrod 9/11/06) Why is there a 'require' to get the comments rather than just returning them in @ans: @ans=`$iepmSrcDir/fetch-comments`? either way works thus not hindering the working of the code.
      3. Other code issues (some of these may no longer be active, so should they be deleted to reduce confusion) - Jerrod ( 
        1. ckavailtables: This does not call rdsth> finish and will not work on the newer versions of mysql. Also it directly references /afs files, and if this is part of the ported system, it will fail on other nodes.(Done, Jerrod 9/11/06)-
        2. ckaddednode: This does not call rdsth> finish and will not work on the newer versions of mysql. [Done, Jerrod 9/11/06]-

        3. fetch-added-node: Does this compile. Use several lines to create a long sql statement, not one very very long line. (done; Jerrod 9/11/06)
        4. fetch-comments: Does not close the $rdsth or $rddbh - this may cause the sockets to stay open(Done by Jerrod 9/11/06)
        5. fetch-stale-scheduled-tasks: Does not close the $rdsth or $rddbh - this will cause the sockets to stay open [Done by Connie 9/4/06]

        6. New Mysql needs finish and disconnect to all active script (requested July '06, has it been doen? Yes, it was done on July 27. Those mentioned above were over looked at that time jerrod 9/11/06) - Jerrod
      4. CGIs need work:
        1. add node is out of date,
        2. toolspecs cgi puts a lot of entry entries into the toolspecs table.

Traceanal Modularisation - Yee, Asif

    1. Integrate new topology into web server - awaits wan-mon appropoval - Yee
    2. Identify the most used routes - Asif
    3. Integrate with pathneck to color links based on speed - Asif
    4. Rendering of topology much slower on www.slac.stanford.edu - likely to be related to the web server, not code, as it runs quickly elsewhere- Yee
    5. Traceroute_analysis - Asif, Yee
      1. Understand and document current version - Asif [DONE 20061019]

      2. -Modularisation of code - Asif [DONE 20061110-]

      3. Prepare distributable version of traceanal - Asif, Yee
      4. Tidy up presentation - Asif
      5. Profile code for speed improvements - Asif, Yee
      6. Provide example of how to allow integration into Non-IEPM-BW data sets - Asif, Yee

Alerts and Diagnosis - Les, Yee

    1. Look at multivariate event detection (collect data for SLAC, BNL, Caltech pathchirp, thrulay,ping) - Adnan
    2. Need to extend pathload to other sites - Connie
      1. Run plateau on the data for min-RTT, thrulay, pathchirp - Mahesh
      2. Apply to PCA to the same data
    3. Look at improvements to plateau
      1. Ability to find step ups - Adnan
      2. Extend to allow up & down then compare down with original - Adnan
      3. Allow for small number of samples (e.g. at start) - Mahesh
    4. Look at other detection algorithms and compare
      1. Holt-Winters - Les, Mahesh, Felipe
        1. Go back 7 weeks - Mahesh
        2. Check unusual results - Mahesh
        3. Consider other ways to optimize parameters - Mahesh
      2. Neural networks
      3. KS
        1. Look at making points before larger than points after - Akbar
    5. Prepare table of canonical events and how various algorithms react - Adnan
      1. Build case studies of email events (how is this coming on?) - Adnan
    6. Look into host monitoring/isolation
      1. Look at installing LISA/APMon at monitoring sites so can eliminate events caused by host congestion
      2. Ganglia
      3. Nagios
      4. Monitor NIC errors
    7. Look at how to use PerfSONAR - Adnan
    8. Look at detecting outages for ping - Connie
      1. Analyze what constitutes a significant outage - Connie
    9. Understand cause of delayed alerts and see if can improve - Connie
    10. Diagnose events - Adnan
    11. Extend database to add trigger start date/time, trigger detection date/time in database - Connie

Install WANMON as IEPM web server - Yee

    1. Port CGI-WRAP - Les, Yee [Done 8/10/06]

    2. Get approval for externally visible web server - Yee, Les [DONE approved 20060907]

      1. Submitted request - Yee [done 20060814]

    3. A v20z will be donated to SCS for this purpose - Yee, Les [AWAITing hardware and software installation by SCS]

    4. Install machine - Yee [AWAITing installation of Dell machine by SCCS]

    5. Get traceroute.pl and pingtable.pl working and in production (will be via John B. as they will be maintaining it) - Yee

PerfSONAR - Yee

    1. IEPM-BW Web Services - Yee, Asif
      1. -Measurement Archive Service for IEPM-BW data - Yee [DONE 20061110]

        1. Install SQL-MA* info*- Yee [STALLED due to complexity of Java instalaltion and code base compared to Perl]

          1. Write Ibatis configs for IEPM-BW data - Yee
    2. NMWG requirements? - Yee
    3. Does our web services access work (need to contact Warren, await proposals, and stability of implementations) - Yee [Not Applicable anymore due to perfSONAR]

  1. -Make asn.pl into downloadable module - Yee [DONE 20061110]

Presentations/Talks/Visits/Papers/Documentation

IPv6

  • No labels