Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

...

  • Overall we were pleased with the behavior of the tools
  • Test was very useful for seeing how tools performed under semi-realistic load
    • Revealed some issues we need to address between now and launch

Oracle crash

SCCS restarted database in ~30 minutes. Due to "out of shared pool space". Shared pool is used, among other things, to cache recent queries so that they can be reused. Non optimal queries in the DataQualityMonitoring application seem to have been responsible for this. The queries were fixed immediately.

We learned that we have to e-mail db-admin to report database outages.

  • Learn to use GRIDControl, an insanely complicated tool to monitor Oracle. We can use it to find out what queries are heavier on the system and ways to optimize them.
  • RAC system: Real Application Clusters. For load balancing, scalability and fail over.

Web Server Crashes

...

  • Does not seem to like oracle crashing. Will try to make it more robust. Difficult to test without actually taking oracle down.
  • SubStream rollback did not work as expected.
  • Work is ongoing to speed up processing pipeline to decrease data turn-around time.

NFS Problem

Half pipe for orbit 2 failed a few times due to NFS problems.

Safari Compatibility

We might have to spend some time making sure that the javascript we use is compatible with Safari.

...

In no particular order...

  • Figure out how to read NFS files from Tomcat
  • Stop using SLACDEV database
    • Rationalize use of Dev/prod etc, decide if we need other configuration options
  • Remove duplication between data processing page and other apps
  • Look into tomcat clustering
  • Cross App application trending, i.e. and Daily Report Application
  • Improve data catalog interface especially for real data
    • L1 data products arranged by groups rather than folders
    • Look at into WebDav/GUI for data catalog
  • Integration between monitoring tools and Ops Log
    • Ability to move easily copy plots to ops log and comment on them
    • Ability to find all recent comments on a plot
  • Make Ops Log use same login system as everything else
  • Get LAT Data Server tied into L1Proc
  • Get portal working, at least for items like data processing page, grb summary etc
    • Generate RSS feeds from LogWatcher, Ops Log, JIRA, Confluence etc to display on portal page
  • Ability to monitor all tomcat servers/applications from one page (and maybe restart them)
  • Figure out why automatic generation of tomcat configuration on glast-win01,02 did not work
  • .... and much more

What else should we have learned?