You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Problem

Tom would like to be able to operate on directories or files created by the pipeline server using "unix tools". He needs this capability to debug and fix problems, the nature of which cannot be known in advance, so he needs a method which is very flexible so it can be quickly modified to address arbitrary problems as they occur. The method needs to be able to handle large numbers (>30000) of files and directories.

Below I present a few performance measurements, and then a proposal for a slightly different way of addressing the problem.

Measurements

Notes:

  • These times are all highly non-reproducible since they are IO bound and highly dependent on file caching. Times below are based on reissuing command several times
  • The directory /nfs/farm/g/glast/u26/MC-tasks/obssim-ST-v7r6p1/output contains approximately 1500 runs

Experiment 1

[glastlnx07] /nfs/farm/g/glast/u26/MC-tasks/obssim-ST-v7r6p1/output > /usr/bin/time grep cob0343 */logFile.txt
0.16user 0.13system 0:00.73elapsed 39%CPU

Note This does not scale to large number of directories, since */logFile.txt is expanded by the shell and eventually the expanded line becomes too long.

Experiment 2

[glastlnx07] /nfs/farm/g/glast/u26/MC-tasks/obssim-ST-v7r6p1/output > /usr/bin/time find */logFile.txt -exec grep cob0343 \{\} \;
0.44user 1.95system 0:03.41elapsed 69%CPU

Note This command scales much better. Note that grep is invoked 1500 times in this case, but that does not seem to introduce a huge overhead.

Experiment 3

[glastlnx07] /nfs/farm/g/glast/u26/MC-tasks/obssim-ST-v7r6p1/output > ls -1 */logFile.txt > /tmp/file.list
cat /tmp/file.list | /usr/bin/time xargs -i grep cob0343 \{\}
0.50user 1.80system 0:03.58elapsed 64%CPU

Note Performance is very similar to using find. Similar to find grep is invoked 1500 times.

Proposal

Define a new command "pfind" which is able to return a list of files or directories. This can then be used with xargs (see experiment 3 above). So to search all log files we could use the command:

pfind --task obssim-ST-v7r6p1 --process obssim --logfile | xargs -i grep cob0343 \{\}

or to delete all obsolete working directories we could use

pfind --task obssim-ST-v7r6p1 --process obssim --logfile | xargs -i grep cob0343 \{\}

pfind arguments (work in progress)

--task <name>

The task on which to operate. Could allow wildcards or lists for multiple tasks

--process <name>

The process name. Could allow wildcards

--logfile

Produce the list of log files

--workingDir

Produce a list of working directories

--latest

Show only "latest" items

--all

All (not only latest)

--obsolete

all - latest

--stream <run-range-list>

List of stream ranges

--filter <filter-spec>

Filter the results (e.g. exitcode != 0)

  • No labels