Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

To stop and start the pipeline automatically (for a planned outage):

  • first of all, login as glast to some interactive machine and go to the pipeline directory. In this example we are working with the prod pipeline (there are also dev/ and test/ subdirectories under ~glast/pipeline-II/):

    Code Block
    $ ssh glast@noricglast@centos7
    $ cd ~glast/pipeline-II/prod/
    


  • create and edit a shutdown_schedule_<date> file:

    Code Block
    $ date > shutdown_schedule_20100909
    $ vim shutdown_schedule_20100909
    


  • the shutdown_schedule_<date> file should look something like this (the start time should be some 10 minutes before the beginning of the outage, the stop time should be a few hours after the expected end of the outage); important: leave a blank line at the end of the file:

    Code Block
    Thu Sep  9 09:50:00 PDT 2010
    Thu Sep  9 13:50:00 PDT 2010
    
    


  • create a symbolic link for shutdown_schedule_<date> as shutdown_schedule:

    Code Block
    ln -s shutdown_schedule_20100909 shutdown_schedule
    


  • the pipeline should shutdown automagically a few minutes after the start time in the shutdown_schedule file (there is a cronjob involved, plus it can take a couple of minutes for it to go down)
  • you'll want it to start up again with an increased reaper delay so it doesn't fail the processes that finished while the pipeline was down. Edit the file run-pipeline.csh, look for a line like

    Code Block
         -Dpipeline.reaper.delay=300 \
    

    and change it to

    Code Block
         -Dpipeline.reaper.delay=3000 \
    

    After the pipeline has restarted and run long enough to process the old jobs, stop it, change run-pipeline.csh back to the way it was, and restart. Note that for it to restart with the new value, a simple "restart" won't suffice. It needs to be completely shutdown and restarted.

  • the pipeline will start again automatically after the end date specified in the shutdown_schedule file. If you want to extend the outage before the end date occurs, you can modify and save the file.
  • to restart the pipeline before the outage is over, modify the end date to be before the current time and save the file, or just remove the symbolic link (leaving the shutdown_schedule_<date> file for future reference):

    Code Block
    rm shutdown_schedule
    


  • the pipeline should go back up automagically in a few minutes
  • to check the status of the pipeline server, one can use the ping function of the pipeline:

    Code Block
    glast@noric09glast@centos7 $ ./pipeline ping
    Pinged server version 1.3.5 running on glastlnx12.slac.stanford.edu since 2010-09-07 16:57:28.012
    


To stop and start the pipeline manually (if all else fails):

  • first of all, login as glast to an interactive machine and go to the pipeline directory:

    Code Block
    $ ssh glast@noricglast@centos7
    $ cd ~glast/pipeline-II/prod/
    


  • issue the pipeline ping command to find out where the pipeline is running (for PROD this is currently glastlnx12, for DEV it is currently glastlnx13, and for TEST it is currently glastlnx07)

    Code Block
    $ ./pipeline ping
    Pinged server version 1.3.5 running on glastlnx12.slac.stanford.edu since 2010-09-07 16:57:28.012
    


  • ssh to the server listed (in the above example, it was glastlnx12.slac.stanford.edu) This is not strictly necessary when using the pipeline shutdown command, but it is necessary to run the start and stop scripts from the correct machine.

    Code Block
    ssh glast@fermilnx-v14 glast@glastlnx12
    


  • move out of the way the monitor script (which will restart the pipeline whenever it doesn't find it running):

    Code Block
    $ mv monitor monitor_something_something
    


  • shutdown the pipeline server with the shutdown command:

    Code Block
    $ ./pipeline shutdown
    


  • It can take up to a minute or more for the pipeline to finish what it is doing and exit. You can issue the pipeline ping command a few times until you see it has exited:

    Code Block
    $ ./pipeline ping
    Pinged server version 1.3.5 running on glastlnx12.slac.stanford.edu since 2010-09-07 16:57:28.012
    $ ./pipeline ping
    org.glast.pipeline.client.PipelineClient$PipelineServerNotRunningException: Pipeline server not running
            at org.glast.pipeline.client.PipelineClient.initialize(PipelineClient.java:62)
            at org.glast.pipeline.client.PipelineClient.<init>(PipelineClient.java:52)
            at org.glast.pipeline.client.command.PipelineCommand.getPipelineClient(PipelineCommand.java:220)
            at org.glast.pipeline.client.command.Ping.execute(Ping.java:26)
            at org.glast.pipeline.client.command.PipelineCommand.parseCommand(PipelineCommand.java:149)
            at org.glast.pipeline.client.command.PipelineCommand.main(PipelineCommand.java:100)
    


  • perform whatever work required the outage
  • restart the pipeline using the start script:

    Code Block
    $ ./start
    


  • move back the monitor script to its original location:

    Code Block
    $ mv monitor_something_something monitor
    


  • if the shutdown command hangs or fails, one can use the stop script, which finds the pid and kills the process:

    Code Block
    $ ./stop
    


  • to check the status of the pipeline server, one can use the ping function of the pipeline:

    Code Block
    glast@noric09glast@centos7 $ ./pipeline ping
    Pinged server version 1.3.5 running on glastlnx12.slac.stanford.edu since 2010-09-07 16:57:28.012