How to do infrastructure shifts

Infrastructure shifts are designed to solve problems people on shift have with the pipeline, web applications, oracle database and other core glast hardware and software (such as the glastlnx machines). So far we have seen few problems with these components, so performing infrastructure shifts has been relatively simple.

The key functions are:

  • Check the infrastructure log for known issues and recent problems.
  • Monitor the OpsProb mailing list for problem reports and make sure they are getting taken care of.
  • Attend the morning shift meeting and take note of any problems and/or report on the status of any outstanding problems.
  • Be available (via phone or page) to respond to any urgent problems. In principle attempt to fix these problems using the list of known problems or the how to fix pages. If problem cannot be immediately solved contact the appropriate expert.

Notes:

  • We will continue to refine the role of the infrastructure shifts as we gain more experience, suggestions are welcome.
  • Since we have had little need to use the how to fix documentation up to now it is still quite rough. Please feel free to suggest improvements (or even better if you find things which are unclear try to update the documentation to make it clearer)
  • Some other mailing lists you might consider joining
    • Shift list – discussion concerning shifts
    • Oracle list – discussion of problems with oracle, including automated reports from "grid control"
    • Nagios list – Nagios critical alerts.
  • No labels