Tom would like to be able to operate on directories or files created by the pipeline server using "unix tools". He needs this capability to debug and fix problems, the nature of which cannot be known in advance, so he needs a method which is very flexible so it can be quickly modified to address arbitrary problems as they occur. The method needs to be able to handle large numbers (>30000) of files and directories.
Below I present a few performance measurements, and then a proposal for a slightly different way of addressing the problem.
Notes:
[glastlnx07] /nfs/farm/g/glast/u26/MC-tasks/obssim-ST-v7r6p1/output > /usr/bin/time grep cob0343 */logFile.txt 0.16user 0.13system 0:00.73elapsed 39%CPU
Note This does not scale to large number of directories, since */logFile.txt is expanded by the shell and eventually the expanded line becomes too long.
[glastlnx07] /nfs/farm/g/glast/u26/MC-tasks/obssim-ST-v7r6p1/output > /usr/bin/time find */logFile.txt -exec grep cob0343 \{\} \; 0.44user 1.95system 0:03.41elapsed 69%CPU
Note This command scale much better. Note that grep is invoked 1500 times in this case, but that does not seem to introduce a huge overhead.
[glastlnx07] /nfs/farm/g/glast/u26/MC-tasks/obssim-ST-v7r6p1/output > ls -1 */logFile.txt > /tmp/file.list cat /tmp/file.list | /usr/bin/time xargs -i grep cob0343 \{\} 0.50user 1.80system 0:03.58elapsed 64%CPU
Note Performance is very similar to using find. Similar to find grep is invoked 1500 times.
Define a new command "pfind" which is able to return a list of files or directories. This can then be used with xargs (see experiment 3 above). So to search all log files we could use the command:
pfind --task obssim-ST-v7r6p1 --process --logfile | xargs -i grep cob0343 \{\}
pfind arguments (work in progress)
--task <name> |
The task on which to operate. Could allow wildcards or lists for multiple tasks |
--process <name> |
The process name. Could allow wildcards |
--logfile |
Produce the list of log files |
--workingDir |
Produce a list of working directories |
--latest |
Show only "latest" items |
--all |
All (not only latest) |
--obsolete |
all - latest |
--stream <run-range-list> |
List of stream ranges |
--filter <filter-spec> |
Filter the results (e.g. exitcode != 0) |