Tom would like to be able to operate on directories or files created by the pipeline server using "unix tools". He needs this capability to debug and fix problems, the nature of which cannot be known in advance, so he needs a method which is very flexible so it can be quickly modified to address arbitrary problems as they occur. The method needs to be able to handle large numbers (>30000) of files and directories.
Below I present a few performance measurements, and then a proposal for a slightly different way of addressing the problem.
Notes:
No Format |
---|
[glastlnx07] /nfs/farm/g/glast/u26/MC-tasks/obssim-ST-v7r6p1/output > /usr/bin/time grep cob0343 */logFile.txt 0.16user 0.13system 0:00.73elapsed 39%CPU |
Note This does not scale to large number of directories, since */logFile.txt is expanded by the shell and eventually the expanded line becomes too long.
No Format |
---|
[glastlnx07] /nfs/farm/g/glast/u26/MC-tasks/obssim-ST-v7r6p1/output > /usr/bin/time find */logFile.txt -exec grep cob0343 \{\} \; 0.44user 1.95system 0:03.41elapsed 69%CPU |
Note This command scales much better. Note that grep is invoked 1500 times in this case, but that does not seem to introduce a huge overhead.
No Format |
---|
[glastlnx07] /nfs/farm/g/glast/u26/MC-tasks/obssim-ST-v7r6p1/output > ls -1 */logFile.txt > /tmp/file.list cat /tmp/file.list | /usr/bin/time xargs -i grep cob0343 \{\} 0.50user 1.80system 0:03.58elapsed 64%CPU |
Note Performance is very similar to using find. Similar to find grep is invoked 1500 times.
Define a new command "pfind" which is able to return a list of files or directories. This can then be used with xargs (see experiment 3 above). So to search all log files we could use the command:
...