An archive of the task list, as standing after our group meeting, is also kept for reference.Key
Key
Item is |
Description |
---|---|
underscored |
Awaits something; provide a description of the cause and also provide the date from which it has been waiting at the end of the task description |
strikesthru'd |
Task is complete or has been dropped; if dropped, provide a reason. Also provide a complete date after the task, or 'dropped' if appropiate |
bold |
Task is currently being worked on or is actively being discussed |
The general format of each task shall be represented as such:
- Project 1 - <Task Manager>
- Task 1 - <Person(s) responsible>
- Minor task 1 - <Person(s) responsible>
- Minor task 2 - <Person(s) responsible> - [DROPPED due to lack of interest]
- Task 2 - <Person(s) responsible>
- Task 1 - <Person(s) responsible>
- Project 2 - <Task Manager>
- Task 1 - <Person(s) responsible> [DONE 20060901]
- Task 2 - <Person(s) responsible> [AWAITING email contact back from Bob Smith]
Completed items must only be striked through rather than removed. These items will then be removed after each face-to-face meeting when the archive is updated.
Action Items
- Terapaths Netflow (see http://iepmbw.bnl.gov/netflow/index.html) - Yee
- Get security requirements from SLAC/John H - Yee
- Make javascript work on non-Firefox browsers (non-DOM needs fixing) - Yee
- Implement grouping for graphs and tables - Yee
Add spider and pie charts - Yee [done 20060905]
- Tidy up legends for pie and spider charts - Yee
- Discuss with Connie type of useful exec level plots - Yee, Connie, Les
- Finalise installation script - Akbar, Yee
- Add patching - need distribution mechanism for tarball storage - Yee, Akbar
Test perl prefix installations - Akbar, Yee [done 20060905]
- Add UI/cgi code to installation - Akbar, Yee
- Factorise out TopN code in JKFlow.pm - Akbar
- Refactorise JKFlow code for QoS analysis (restructure file structure and possibly QoS analysis) - Akbar
- Transport services evaluation - Yee
- Work with Microsoft - Yee, Les
- Get latest privates from Microsoft
- Decide what is needed for stage 2
- NDT server at SLAC (see http://nettest5:7123) - Yee
Installed on nettest5 NDT Install - Yee [DONE 20060831]
- Talk security into allowing public access - Yee.
- Work with Microsoft - Yee, Les
- PingER
- Make sure Maxim has all the latest monitoring nodes - Jerrod
- Get ping-data.pl working at sfsmds2.vsnl.in - Jerrod
- Get correct server name displayed for sfsmds2.vsnl.net traceroute server - Les
- Restore data gathering from indix.ncst.ernet.in to Wisconsin target node.
- Get traceroute server (sent mail on 9/5/06) - Jerrod
- Update ping_data.pl (sent mail on 9/5/06) - Jerrod
- Remove Wisconsin target Pinger node from monitoring (sent mail on 9/5/06) - Jerrod
Binu Abraham [binu@cdacmumbai.in] responded to these mails on 9/6/06 and states he will perform the above tasks on today 9/6/06
- Update guthrie with LHC - ATLAS nodes (9/7/06) - Jerrod
- Insert latest 8 LHC nodes into /afs/slac/g/www/www-iepm/bw/iepmworld/rss.xml (9/7/06) - Jerrod
- Redesign and Implement Guthrie to cover both IEPM and PingER
- MonALISA (no progress 3/12/06, awaits iepm-bw OWAMP integration, keeping servers running) - Connie
- Upload selected data (initially IEPM data from BNL, SLAC, Caltech, CERN) using a single object for efficiency (awaits Iosif's new version of ML/APMon) - Adnan, Iosif
- Figure out how to display IEPM monitoring hosts and their data - Fawad, Aziz
- Project defined and assigned to Akbar and Waqar (3/11/06) - Akbar, Waqar
- IEPM-BW
- Work with DESY to get new monitoring host (contacted Kars 7/20/04, Kars going on 2 weeks vacation then Jerrod is away, time to re-start 8/26/04, wait for v3, Jerrod sent email reminder 3/25/05, Kars will be here later this month (27th April '06), Jerrod contact him before he arrives) (awaits V3 of iepm-bw) - Jerrod
- Make RAL a remote node
- Have account but cannot ssh to it (sent email to tasker 9/8/06) - Les
- Make FZK an IEPM Monitoring node - Connie
- Get contact for Connie (sent email 8/22/06, now awaits Connie) - Les
- Update metrics used
- ID and add more targets for pathload - Connie, Jerrod
- Get distribution kit for iepm monitoring nodes to install & configure - Jerrod
- Update pre-reqs document - Jerrod
- Add LHC Atlas hosts to IEPM-BW (list sent to Jerrod 9/6/06, added needs testing 9/6/06) - Jerrod, Les, Connie
- Get architecture of remote nodes and create a web page (wil get back to in Sept 06) - Jerrod
- Write script to use ssh to get the configurations of IEPM monitor and remote hosts (in progress 4/26/06, will revisit Sep '06) - Jerrod
- Do we want to get reverse traceroutes (at least where we have reverse traceroute servers, awaits time) - Connie
- Compare pathchirp and pathload - Connie
- Make up a proposal (see if we need it) - Connie, Adnan
- Fix up TCP receive buffer sizes, add sanity checks - needs root on target hosts, which we do not have -
- Bugs:
- check-cron script: I do not know if this is still in use, but looking at the code, it references files in ~jerrodw/bin and this is not transportable. Need to fix it if it is still needed, or get rid of it if it is not - Jerrod
- make-bw-html script: there is code for putting up the comments, however it is unclear that it is correct. Please check it over. I do not see how there is any guarantee that $com{$node} is ever defined. Also please comment how it works. Why is there a 'require' to get the comments rather than just returning them in @ans: @ans=`$iepmSrcDir/fetch-comments`?
- Other code issues (some of these may no longer be active, so should they be deleted to reduce confusion) - Jerrod
- ckavailtables: This does not call rdsth-> finish and will not work on the newer versions of mysql. Also it directly references /afs files, and if this is part of the ported system, it will fail on other nodes.
- ckaddednode: This does not call rdsth-> finish and will not work on the newer versions of mysql.
- fetch-added-node: Does this compile. Use several lines to create a long sql statement, not one very very long line.
- fetch-comments: Does not close the $rdsth or $rddbh - this may cause the sockets to stay open
fetch-stale-scheduled-tasks: Does not close the $rdsth or $rddbh - this will cause the sockets to stay open [Done by Connie 9/4/06]
- New Mysql needs finish and disconnect to all active script (requested July '06, has it been doen?) - Jerrod
- CGIs need work:
- add node is out of date,
- toolspecs cgi puts a lot of entry entries into the toolspecs table.
Traceanal - Yee, Asif [AWAITing Asif's arrival]## Integrate new topology into web server - awaits wan-mon appropoval - Yee
- Identify the most used routes - Asif
- Integrate with pathneck to color links based on speed - Asif
- Rendering of topology much slower on www.slac.stanford.edu - likely to be related to the web server, not code, as it runs quickly elsewhere- Yee
- Modularise traceanal code for extensibility with non-iepmbw data - Asif, Yee
- Prepare distributable version of traceanal - Yee
- Alerts and Diagnosis - Les, Yee
- Look at multivariate event detection (collect data for SLAC, BNL, Caltech pathchirp, thrulay,ping) - Adnan
- Need to extend pathload to other sites - Connie
- Run plateau on the data for min-RTT, thrulay, pathchirp - Mahesh
- Apply to PCA to the same data
- Look at improvements to plateau
- Ability to find step ups - Adnan
- Extend to allow up & down then compare down with original - Adnan
- Allow for small number of samples (e.g. at start) - Mahesh
- Look at other detection algorithms and compare
- Holt-Winters - Les, Mahesh, Felipe
- Go back 7 weeks - Mahesh
- Check unusual results - Mahesh
- Consider other ways to optimize parameters - Mahesh
- Neural networks
- KS
- Look at making points before larger than points after - Akbar
- Holt-Winters - Les, Mahesh, Felipe
- Prepare table of canonical events and how various algorithms react - Adnan
- Build case studies of email events (how is this coming on?) - Adnan
- Look into host monitoring/isolation
- Look at installing LISA/APMon at monitoring sites so can eliminate events caused by host congestion
- Ganglia
- Nagios
- Monitor NIC errors
- Look at how to use PerfSONAR - Adnan
- Look at detecting outages for ping - Connie
- Analyze what constitutes a significant outage - Connie
- Understand cause of delayed alerts and see if can improve - Connie
- Diagnose events - Adnan
- Extend database to add trigger start date/time, trigger detection date/time in database - Connie
- Install WANMON as IEPM web server - Yee
Port CGI-WRAP - Les, Yee [Done 8/10/06]
- Get NFS and AFS accesss - Yee
Get approval for externally visible web server - Yee, Les [DONE approved 20060907]
Submitted request - Yee [done 20060814]
A v20z will be donated to SCS for this purpose - Yee, Les [AWAITing hardware and software installation by SCS]
- Get traceroute.pl and pingtable.pl working and in production (will be via John B. as they will be maintaining it) - Yee
- IEPM-BW Web Services - Yee
- To be impelmented via PerfSONAR - Yee, Asif
- Measurement Archive Service for IEPM-BW data - Yee
- Install SQL-MA info- Yee
- Write Ibatis configs for IEPM-BW data - Yee
- NMWG requirements? - Yee.
- Measurement Archive Service for IEPM-BW data - Yee
- Does our web services access work (need to contact Warren, await proposals, and stability of implementations) - Yee
- To be impelmented via PerfSONAR - Yee, Asif
- Set up Wiki
- Presentations/Talks/Visits/Papers/Documentation
- IPv6