Confluence will be down for maintenance June 14 2024 at 6AM PT.
...
Seems to be an NFS error. I can't write to that directory as myself or the glast account. Need to check on propre proprer account/permissions
...
You'll have to be glastraw for the last 2 steps.
But then I messed up. I rolled back fondChunks 180502012.546962321 and it failed complaining about overlapping data. The right thing is to roll back the whole doRun stream while defining deliveriesToIgnore=180502013 (How?). Which I've now done.
And now there are more errors, which I'll have to investigate later, but probably involve the magic7 file.
This is usually indicated by one or more instances of the LSF summary report appearing near the end (although not always) of the file. The summary looks like this:
------------------------------------------------------------ Sender: LSF System <lsf@hequ0119> Subject: Job 217648: <findChunks> in cluster <slac> Done Job <findChunks> was submitted from host <fermilnx-v08> by user <glastraw> in cluster <slac> at Thu Aug 2 11:37:06 2018. Job was executed on host(s) <hequ0119>, in queue <glastdataq>, as user <glastraw> in cluster <slac> at Thu Aug 2 11:37:08 2018. </u/gl/glastraw> was used as the home directory. </nfs/farm/g/glast/u41/L1/logs/PROD/L1Proc/5.6/doRun/findChunks/180xxxxxx/802xxx/015/554xxxxxx/915xxx/869> was used as the working directory. Started at Thu Aug 2 11:37:08 2018. Terminated at Thu Aug 2 11:39:29 2018. Results reported at Thu Aug 2 11:39:29 2018. Your job looked like: ------------------------------------------------------------ # LSBATCH: User input bash pipeline_wrapper ------------------------------------------------------------ Successfully completed. Resource usage summary: CPU time : 8.33 sec. Max Memory : 66 MB Average Memory : 24.33 MB Total Requested Memory : - Delta Memory : - Max Swap : - Max Processes : 8 Max Threads : 13 Run time : 141 sec. Turnaround time : 143 sec. The output (if any) is above this job summary.
In 95% of the cases, all that needs to be done is a simple rollback of that affected process. In the other 5%, there is some other underlying problem that is also affecting the completion of the job and you'll need to search through the log files for the error.
This occurs when there are no files for the half-pipe to process in the delivery. There are three causes for this:
They symptom of this is that you get an error of the form:
LSEReader::read: unknown LSE_Keys typeid 551075433 from /nfs/farm/g/glast/u42/ISOC-flight/Downlinks/180619008/0000f5a2-20d8be69-03bd-00a21.evt
In this particular case the LSE_Key was set to the run number instead of it's proper value (-1 through 3 from an enum in the code).
To date, I've only ever seen this in the mergeEvt task. This is symptomatic of a problem upstream in the doChunk streams (running the makeEvt task). In every instance, I've found one or more of those makeEvt tasks had a multiple submission. Rolling all the doChunk streams back with the multiple submissions fixes the problem.