...
doRecon controls the reconstruction of the chunks. It reads a list of jobs to run, and for each one, spawns a thread that submits a batch job, waits for it to complete, and returns its status. It then writes a file listing which, if any, of the chunks failed, and exits with an unsuccessful return code if there were failed chunks. When the TP first starts, it checks for this list of failed chunks, and if it is present and nonempty, it uses it instead of the original list of all chunks writted by setupRecon. This way, if some chunks fail, the TP can be rolled back and it will only need to redo the failed chunks. The failed chunk list is not registered as a pipeline dataset, since it violates logical constraints within GINO for a TP to modify its input this way.
In order to avoid problems with unreliable NFS service, the chunk jobs copy the input digi file to a local disk on the batch host (if it has one, I think they all do now) and writes the output files there as well. It then moves the output files to a staging directory on AFS, and deletes the local copy of the input file. When chunks fail, these files are left behind and eventually fill up the local disk. Therefore, there is a script to seek out and delete these orphaned files. It must run as user glastdpf. I usually log into a noric as glastdpf and run it by hand every once in a while, but a better solution would probably be to wrap it in a task and run that task every night from my crontab.
Merging the chunks of the recon file uses 4 TPs. The first (mergeRecon) performs the actual merge, from chunk files on the staging disk to a recon file on the staging disk. The second deletes the chunk files from the staging disk. The third copies the merged recon file from the staging disk to its final destination on NFS. The fourth deletes the berged file from the staging area. This may seem unreasonably complicated, but it reduces the amount
Purpose | Associated Scripts | Input | Output | Comments |
---|---|---|---|---|
finish up a run | cleanup.py |
|
| external side |
finish up a run | cleanupWrapper.pl |
|
| pipeline side |
control job for reconstruction of chunks | doRecon.pl |
|
| external side |
control job for reconstruction of chunks | doReconWrapper.pl |
|
| pipeline side |
| genRTRLaunchWrapper.pl |
|
| pipeline side |
make XML config file for task | genXml.pl |
|
| |
merge chunks of recon file | mergeRecon.py |
|
| external side |
merge chunks of recon file | mergeReconWrapper.pl |
|
| pipeline side |
| recon.py |
|
| obsolete |
reconstruct one chunk | recon0ne.csh |
|
|
|
| reconWrapper.pl |
|
| obsolete |
| reprocess-licos.csh |
|
| delete |
| reprocess-v3r1p5.csh |
|
| delete |
| reprocess-version.csh |
|
| delete |
launch recon report | RunRALaunchWrapper.pl |
|
| pipeline side |
prepare chunk jobs | setupRecon.py |
|
| external side |
prepare chunk jobs | setupReconWrapper.pl |
|
| pipeline side |
...