You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 12 Next »

Problem

Tom would like to be able to operate on directories or files created by the pipeline server using "unix tools". He needs this capability to debug and fix problems, the nature of which cannot be known in advance, so he needs a method which is very flexible so it can be quickly modified to address arbitrary problems as they occur. The method needs to be able to handle large numbers (>30000) of files and directories.

Below I present a few performance measurements, and then a proposal for a slightly different way of addressing the problem.

Measurements

Notes:

  • These times are all highly non-reproducible since they are IO bound and highly dependent on file caching. Times below are based on reissuing command several times
  • The directory /nfs/farm/g/glast/u26/MC-tasks/obssim-ST-v7r6p1/output contains approximately 1500 runs

Experiment 1

[~tonyj:glastlnx07] /nfs/farm/g/glast/u26/MC-tasks/obssim-ST-v7r6p1/output > /usr/bin/time grep cob0343 */logFile.txt
0.16user 0.13system 0:00.73elapsed 39%CPU

Note This does not scale to large number of directories, since */logFile.txt is expanded by the shell and eventually the expanded line becomes too long.

Experiment 2

[~tonyj:glastlnx07] /nfs/farm/g/glast/u26/MC-tasks/obssim-ST-v7r6p1/output > /usr/bin/time find */logFile.txt -exec grep cob0343 \{\} \;
0.44user 1.95system 0:03.41elapsed 69%CPU

Note This command scales much better. Note that grep is invoked 1500 times in this case, but that does not seem to introduce a huge overhead.

Experiment 3

[~tonyj:glastlnx07] /nfs/farm/g/glast/u26/MC-tasks/obssim-ST-v7r6p1/output > ls -1 */logFile.txt > /tmp/file.list
cat /tmp/file.list | /usr/bin/time xargs -i grep cob0343 \{\}
0.50user 1.80system 0:03.58elapsed 64%CPU

Note Performance is very similar to using find. Similar to find grep is invoked 1500 times.

Proposal

Define a new command "pipeline find" which is able to return a list of files or directories. This can then be used with xargs (see experiment 3 above). So to search all log files we could use the command:

pipeline find obssim-ST-v7r6p1 obssim logFile | xargs -i grep cob0343 \{\}

or to delete all obsolete working directories we could use

pipeline find obssim-ST-v7r6p1 obssim logFile | xargs -i grep cob0343 \{\}

Syntax

pipeline find <options> <task-name> <process-name> [<output> ,<output>...]

<task-name>

The task on which to operate. Can include version and subtasks, e.g. parent(1.0)/child

<process-name>

The process name.

<output>

An item to output. Defaults to workingDir. See valid items below.

--latest

Show only "latest" items

--all

All (not only latest)

--obsolete

all - latest

--stream <run-range-list>

List of stream ranges (not yet implemented)

--filter <filter-spec>

Filter the results (e.g. exitcode != 0). Filters can use any of the supported output items, including meta-data

Supported output items

Item

workingDir

exitCode

stream

createDate

submitDate

endDate

cpuSecondsUsed

host

exitCode

logFile

jobId

executionNumber

isLatest

streamPath

or any meta-data item associated with the task.

Example

pipeline find backgndSC-GR-v10r4 runMonteCarlo -s logFile exitCode stream evtsSim evtsOut --filter "evtsOut>200"
  • No labels