Warren sent out a thread on directions for the svac pipelines on Dec 17. This is archived as an entry in the pipeline mailing list. Note the pdf file attachment in that mail.
We discovered early on with RM that one needs to protect its output from the unwary. Those with group write privilege to the disk could modify builds. Alex then write-protected the output (presumably turned off the group write bit). We should do something similar to protect the files the pipeline creates.
This is another item we will need very soon. Navid has made a perl interface to the archive system. But now the issue is how and when to archive.
Since the I&T pipelines are parallel, they have several different named tasks operating on the same run, writing to the same directory. Additionally, not all files are reported to the pipeline, but they are wanted to be archived. So we have to archive the entire directory. Ideally we would prevail on everyone to identify every file they want archived, but we seem to be on the losing end of that one!
I think this means the archiving has to happen asynchronously to the pipeline. I'd be curious to see a comment to this blog item from Dan with his thoughts on the algorithm for figuring out what to archive and when.
Note that SCS asks that we keep the files larger than 500 MB to use the tapes efficiently. So we had been thinking to make tar files. Navid keeps track in his archiver db of the file content inside the tar file, so he can ask the archive system for the right tar file when someone asks for an individual file.
I'm at a bit of a loss at the moment to divine a way to know when one can archive in a general way. It would be nice not to need custom archiving per group of tasks.
It can be hard to find the actual code that does the work. In my recent allGamma-GR-v5r0p2 task, I have one task process configured as:
- GleamWrapper.pl
- template provided by Dan with a few lines of recommended code to access things like the run id, and input and output datasets.
- gleam.pl
- submitted by GleamWrapper.pl
- does a little setup and executes allGamma.sh which runs Gleam, using environment variables to customize it.
I realize that gleam.pl is not necessary; GleamWrapper.pl could have easily done the work. I had based my task on Warren's recon task, where his 'gleam'pl' builds the shell script, and slavishly kept his structure.
But I realized that nowhere do we record the version of the underlying code that is run: nowhere in the database do we actually record the version of GlastRelease. We do have a spot to record the version of GleamWrapper.pl (the only executable Gino knows about per task process), though the xml configurator does not allow setting this version.
For some executables - and GlastRelease is an important one - we could use the version number to access the code. The Release Manager builds the releases and maintains a database giving access to them.
It would be good to both record the important version number and allow it to be found automatically rather than (eg) hardwired into my shell script as I'm doing now.
There are 2 more use cases I expect we will need to handle:
- splitting input files
- eventually we will get ~1.5 GB files from the MOC for each downlink, and will need to separate them into many files to feed to the estimate 75 CPUs needed to process the downlink in ~1 hour. It is possible we will receive the 1.5 GB in multiple files - unknown yet, but see next bullet anyway.
- by the time we get ~4 towers integrated this coming year, keeping the raw data files small enough so that the recon files don't get out of sight will become a problem. They will be taking 10 minute or less runs!
- so we need a mechanism to expand out a run's input file to go to multiple nodes and then regroup them all at the end, I assume allowing a run to be a logical entity whose datasets could be lists of files. Further complication would be in handling subsequent steps in the pipeline for that run, and whether external pipelines make use of a dataset.
- MC concatenation of files
- once an MC run is done, we gather up the myriad ntuple files, run a pruning mechanism on them (for the MC I just ran I weeded out events with no tracks) and then concatenate them to a more handleable set of files (by user request <~ 200 MB or so).
- it would be nice if these could both be done automatically in the pipeline and be recorded. I am doing both manually. A task would have to have something like summary properties where you could make outputs based on a selection of all the input datasets in the task, and record the outcome. And again, the output dataset could be a list of files.
At the moment Gino is run from cron. When it wakes up, it checks to see if an instance of itself is already running, and exits if it is, to not step on itself.
Gino also is fairly verbose in generating a log file (that blew the glast04 /pipeline/ partition last week and has been moved to u12/pipeline/). The log is rather hard to parse since Gino spits out processes that write to the log asynchronously. And huge.
If we want to check aliveness with the resource checker, the best option currently is to check the last touched date on the log file. The Gino process itself cannot respond to queries.
Matt has suggested we move towards a java server, initially wrapping the scheduler perl script.
He says the java wrapper can (out of the box, more or less, I think):
*handle log files, automatically breaking them up into nice-sized chunks
*provide network connections for querying, though this would only tell you the wrapper is running in that stage of evolution.
I imagine there are other features I am forgetting. It would be nice if Matt could elaborate, giving a fuller feature list, a pointer to further reading and perhaps a simple demonstration example that wraps a perl script issuing a print statement or two?
I notice there is no place in the task table for a description of the task. Would be nice.
Also, it could be useful to allow the user to add options to the bsub command. One that comes to mind immediately is the -R option. One might be willing to trade time waiting for a job to start for the x2-3 gain in CPU between the barb and noma batch workers.
Of course both would have to be configurable from the web front end.
Posted from Warren's pipelinelist entry
http://www-glast.stanford.edu/protected/mail/pipeline/0140.html
A new run status, OldFail, or AcknowledgedFailure or something, which I could manually set runs/processes in the current Fail state to after investigating the failure. This would simplify debugging as I wouldn't have to wade through piles of old failures to find the (hopefully) few new ones.
The ability to filter the run list on run id (a list of ranges would be
good) or date.
The ability to filter the run list based on run status. Ideally I'd be able to OR and NOT them as well.