Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Follow the instructions here: Submitting Batch Jobs

How do I keep programs running if a ssh connection fails

See if you can use the LSF batch nodes for your work. If not, three unix programs to help with this are tmux, nohup and screen. None of these programs will preserve a graphical program, or a X11 connection, so run your programs in terminal mode.

tmux

For example, with tmux, if one does


ssh psexport
ssh psana
# suppose we have landed on psanacs040 and that there is a matlab license here
tmux
matlab --nosplash --nodesktop

If you lose the connection to psanacs040, you can go back to that node and reattach:

ssh psexport
ssh psanacs040
tmux attach

You need to remember the node you ran tmux on. If you are running matlab, you can run the matlab license script with the --show-users parameter to see where you are running it:

/reg/common/package/scripts/matlic  --show-users

nohup

You could run a batch process with nohup (no hangup) as follows

    nohup myprogram

For example, suppose we want to run a Python script that prints to the screen and save its output (the below syntax is for the bash shell):

nohup python myscript.py > myoutput 2>&1 &

Here we are capturing the output of the program in myoutput, along with anything it writes to stderr (the 2>&1), then putting it in the background. The job will persist after you logout. You can take a look at the output in the file myoutput the next day. As with tmux you will need to remember the node you launched nohup on.

Why did my batch job failed? I'm getting 'command not found'

Before running your script, make sure you can run something, for instance do

  bsub -q psnehq pwd

(substitute the appropriate queue for psnehq). If you created a script and are running

  bsub -q psnehq myscript

Then it maybe that the current directory is not in your path, run

  bsub -q psnehq ./myscript

Check that myscript is executable by yourself, check that you have the correct #! line to start the script.

 


Psana

Topics specific to Psana

...

The Psana ddl based Translator can be used to write ndarrays, strings and a few simple types that C++ modules register. These will be organized in the same groups that we use to translate xtc to hdf5. Datasets with event times will be written as well. To use this, create a psana config file that turns off the translation of all xtc types but allows translation of ndarrays and strings. An example cfg file is here: psana_translate_noxtc.cfg You would just change the modules and files parameters for psana and the output_file parameter to Translator.H5Output. Load modules before the translator that put ndarrays into the event store. The Translator will pick them up and write them to the hdf5 file

How do I keep programs running if a ssh connection fails

See if you can use the LSF batch nodes for your work. If not, three unix programs to help with this are tmux, nohup and screen. None of these programs will preserve a graphical program, or a X11 connection, so run your programs in terminal mode.

tmux

For example, with tmux, if one does

...

If you lose the connection to psanacs040, you can go back to that node and reattach:

ssh psexport
ssh psanacs040
tmux attach

You need to remember the node you ran tmux on. If you are running matlab, you can run the matlab license script with the --show-users parameter to see where you are running it:

/reg/common/package/scripts/matlic  --show-users

nohup

You could run a batch process with nohup (no hangup) as follows

    nohup myprogram

For example, suppose we want to run a Python script that prints to the screen and save its output (the below syntax is for the bash shell):

nohup python myscript.py > myoutput 2>&1 &

Here we are capturing the output of the program in myoutput, along with anything it writes to stderr (the 2>&1), then putting it in the background. The job will persist after you logout. You can take a look at the output in the file myoutput the next day. As with tmux you will need to remember the node you launched nohup on.

Why did my batch job failed? I'm getting 'command not found'

Before running your script, make sure you can run something, for instance do

  bsub -q psnehq pwd

(substitute the appropriate queue for psnehq). If you created a script and are running

  bsub -q psnehq myscript

Then it maybe that the current directory is not in your path, run

  bsub -q psnehq ./myscript

...