Linemode-client find command

The easiest way to get a list of files is by using the linemode client 'find' command.  Here's it's help screen:

noric13:dflath> /afs/slac/g/glast/ground/bin/datacat -h find
Command-specific help for command find

Usage: datacat find [-options] <logical folder>

parameters:
  <logical folder>   Logical Folder Path at which to begin performing the search.

options:
  --recurse                              Recurse sub-folders
  --search-folders                    Search for datasets inside folder(s)
  --search-groups                    Search in groups.  This option is superseded by the -G (--group) option if they are both supplied.
  --group <group name>          Dataset Group under which to search for datasets.
  --site <site name>                Name of Site to search.  May be used multiple times to specify a list of sites in which case order is taken as preference.  Defaults to the Master-location if not provided.
  --filter <filter expression>     Criteria by which to filter datasets.  ie: 'DataType=="MERIT" && nMetStart>=257731220 && nMetStop <=257731580'
  --display <meta name>         Name of meta-data field to display in output.  Default is to display only the file location.  May be used multiple times to specify an ordered list of fields to display.
  --sort <meta name>             Name of meta-data field to sort on.  May be used multiple times to specify a list of fields to sort on.  Order determines precedence.
  --show-unscanned-locations   If no "OK" (ie: verified by file crawler) location exists, display first location (if any) which has not yet been scanned.  If this option and '--show-non-ok-locations' are both specified, an unscanned location will be returned before a non-ok location regardless of their sequence in the ordered site list.
  --show-non-ok-locations        If no "OK" (ie: verified by file crawler) location exists, display first location (if any) which exists in the list of sites.

More detail:

Arguments to 'find' command: 

Argument Explanation
<logical folder> This parameter is required and comes after all options are specified.  Replace it with the Data Catalog folder path where you want to begin your search.
The more specific you are, the faster your search will be.
--recurse If you specify this option, the find command will traverse the entire folder tree under <logical folder> searching for datasets that meet your criteria.
--search-folders
Tells find to look inside the folder (or folders if --recurse specified) and consider datasets that live there.  In general, datasets live in groups, and this option is not used.
--search-groups Tells find to look inside all dataset groups in the specified folder (or folder tree if using --recurse) for your files.  May be combined with --search-folders and --recurse.
Has no meaning if --group is also specified.
--group <group name> Tells find to look inside groups only if they have the name specified by <group name>.  May be used with --recurse to search in groups of the given name in a folder tree.
--site <site name> Specifies a specific site you want to get datasets from.  May be used multiple times to specify a list of sites to search where order indicates preference.
If no sites are specified, the 'master' location is returned.  This will generally be a file in XROOT.  More information about sites below.
--filter <filter expression>
Also known as "search criteria" is an expression using logical operators, meta-data fields and constant values on which to filter the output results.  See below for details.
--display <meta name> Causes the meta-data value associated with 'meta name' to be displayed in the output. May be used multiple times. Columns are tab-separated.
--sort <meta name> Specifies the name of the specific meta-data field on which to sort the output results.  May be used multiple times to specify a list of fields to sort on where order indicates preference.
Sorting may add a significant overhead to the time it takes to start getting results, as the entire output set must be calculated, then sorted, before being displayed.
Ascending order (smallest first) is the default for each field.  You may override this on a field by field basis by prefixing a field name with '-' (minus) for descending order, or '+' (plus) for ascending order.
--show-unscanned-locations If a verified disk location can not be found in the specified site-list, the first location (site-preference order) which has not been scanned yet will be returned.
--show-non-ok-locations Similar to --show-unscanned-locations, but will return the first location in the ordered site list that has a disk location regardless of the file scan-status. The file may be missing, 'bad', or otherwise. (Caveat emptor.) If you specify this option in addition to the --show-unscanned-locations option. An unscanned location will be returned before a non-ok location if both exist.

*Important note:  At least one of --search-folders, --search-groups, --group <group name> must be specified.

<site name> Valid values:

SLAC_XROOT XROOT servers at SLAC.  Almost everything lives here.
SLAC NFS (or AFS) at SLAC.  Some FT1 and FT2 data are duplicated here until the ftools learn to read from XROOT.
IN2P3 Some Monte Carlo data are produced and stored at Lyon
IN2P3_HPSS Lyon Monte Carlo backups
UW University of Washington.

<filter expression> Specifics:

An expression composed of logical, arithmetic, and comparison operators along with meta-data fields used to select datasets that meet specific criteria. 

Here's a loose grammar which defines the filter expressions:

Expr ::= Expr
Expr ::= '(' Expr ')'
Expr ::= '!' Expr
Expr ::= Expr LogOp Expr
LogOp ::= '&&' | '||'
Expr ::= Comparable CmpOp Comparable
CmpOp ::= '==' | '!=' | '>=' | '<='
Comparable ::= "String"                --> a String constant must be enclosed in double quotes
Comparable ::= Number
Comparable ::= Identifier
Comparable ::= UnOp Comparable
UnOp ::= '-' | '+'
Comparable ::= Comparable BinOp Comparable
BinOp ::= '/' | '*' | '+' | '-'

Meta Data fields you can use in your expressions:

System maintained (build in) meta-data:

Name Type Description
Name String Dataset Name
FileFormat String File encoding. ex: "root", "fits"
DataType String Type of data in file. Always uppercase. ex: "RECON"
VersionID Integer Version of the Dataset this file represents.
CreateDate Timestamp Date this Version of the Dataset was created.
Source String What created this Version of the Dataset. ex: "PIPELINE", "LINEMODE CLIENT"
TaskName String If Source=="PIPELINE" this will contain the name of the Task which created this Version of the Dataset.
RunMin Long Integer Smallest Run Identifier found in this file, if applicable.
RunMax Long Integer Largest Run Identifier found in this file, if applicable.
NumberEvents Long Integer Number of events in the file, if applicable.
FileSizeBytes Long Integer Size of this file on disk, in bytes.
RootVersion String If FileFormat=="root", the version of root which wrote this file.
SOLibVersion String If FileFormat=="root", the version of the shared object library that the events correspond to, if applicable.
TTreeName String If FileFormat=="root", the name of the first TTree in the file, if one exists.

User-defined meta-data tags. In order of most used (first) to least used (last):

(Feel free to fill in the description and DataType field for those you are responsible for.)

Name Type Description Data-Type(s) generally tagged
sDatasource STRING    
nMetStop NUMBER    
nMetStart NUMBER    
sOrigFilename STRING    
nOrigBytes NUMBER    
nOrigCkSum NUMBER    
sBTRversion STRING    
sPhysList STRING    
nBtRunId NUMBER    
sDataSource STRING    
nDownlink NUMBER    
nRun NUMBER    
sRunStatus STRING    
sCreator STRING    
sIntent STRING    
nMootKey NUMBER    
type STRING    
packetTime STRING    
packetApid STRING    
startAddress STRING    
functionCode STRING    
stopAddress STRING    
transactionId STRING    
tstop STRING    
tstart STRING    
nMootKey STRING    
startedAt STRING    
firstTimeStamp STRING    
counterType STRING    
lastTimeStamp STRING    
nDatasetId NUMBER    
TCut STRING    

Examples:

All the FT1 files in a given run-range, sorted by nMetStart:

noric13:dflath> /afs/slac/g/glast/ground/bin/datacat find \--filter 'RunMin>=236191699  && RunMax<=236211846'
\--sort nMetStart \--group FT1 /Data/Flight/Level1/LPA/

root://glast-rdr.slac.stanford.edu//glast/Data/Flight/Level1/LPA/prod/1.57/ft1/gll_ph_r0236191699_v002.fit
root://glast-rdr.slac.stanford.edu//glast/Data/Flight/Level1/LPA/prod/1.57/ft1/gll_ph_r0236197643_v001.fit
root://glast-rdr.slac.stanford.edu//glast/Data/Flight/Level1/LPA/prod/1.57/ft1/gll_ph_r0236198321_v001.fit
root://glast-rdr.slac.stanford.edu//glast/Data/Flight/Level1/LPA/prod/1.56/ft1/gll_ph_r0236209517_v001.fit
root://glast-rdr.slac.stanford.edu//glast/Data/Flight/Level1/LPA/prod/1.56/ft1/gll_ph_r0236211846_v001.fit

The same search, but retrieving their SLAC NFS location rather than their master (default) location:

noric13:dflath> /afs/slac/g/glast/ground/bin/datacat find \--filter 'RunMin>=236191699 &&
RunMax<=236211846' \--sort nMetStart \--group FT1 \--site SLAC /Data/Flight/Level1/LPA/

/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.57/ft1/gll_ph_r0236191699_v002.fit
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.57/ft1/gll_ph_r0236197643_v001.fit
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.57/ft1/gll_ph_r0236198321_v001.fit
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.56/ft1/gll_ph_r0236209517_v001.fit
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.56/ft1/gll_ph_r0236211846_v001.fit

SLAC NFS locations of all the FT1 and FT2 datasets in a given run-range, grouped by dataset name (perhaps you have a tool that wants the ft1 and it's corresponding ft2 file listed on consecutive lines):

noric13:dflath> /afs/slac/g/glast/ground/bin/datacat find \--filter '(DataType=="FT1" \|\| DataType=="FT2")
 && RunMin>=236191699  && RunMax<=236211846' \--sort Name \--sort DataType \--search-groups \--site SLAC
/Data/Flight/Level1/LPA/

/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.57/ft1/gll_ph_r0236191699_v002.fit
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.57/ft2/gll_pt_r0236191699_v002.fit
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.57/ft1/gll_ph_r0236197643_v001.fit
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.57/ft2/gll_pt_r0236197643_v001.fit
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.57/ft1/gll_ph_r0236198321_v001.fit
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.57/ft2/gll_pt_r0236198321_v001.fit
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.56/ft1/gll_ph_r0236209517_v001.fit
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.56/ft2/gll_pt_r0236209517_v001.fit
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.56/ft1/gll_ph_r0236211846_v001.fit
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.56/ft2/gll_pt_r0236211846_v001.fit

The same search, but with 'nMetStart' and 'nMetStop' as columns in the output:

noric13:dflath> /afs/slac/g/glast/ground/bin/datacat find \--filter '(DataType=="FT1" \|\| DataType=="FT2") &&
 RunMin>=236191699  && RunMax<=236211846' \--display nMetStart \--display nMetStop \--sort Name \--sort DataType
\--search-groups \--site SLAC /Data/Flight/Level1/LPA/

/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.57/ft1/gll_ph_r0236191699_v002.fit        236191701.95599103      236195764.0891509
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.57/ft2/gll_pt_r0236191699_v002.fit        236191701.95599103      236195764.0891509
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.57/ft1/gll_ph_r0236197643_v001.fit        236197645.96239495      236198115.084378
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.57/ft2/gll_pt_r0236197643_v001.fit        236197645.96239495      236198115.084378
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.57/ft1/gll_ph_r0236198321_v001.fit        236198324.12405705      236201952.08423495
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.57/ft2/gll_pt_r0236198321_v001.fit        236198324.12405705      236201952.08423495
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.56/ft1/gll_ph_r0236209517_v001.fit        236209519.9578979       236211816.08435512
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.56/ft2/gll_pt_r0236209517_v001.fit        236209519.9578979       236211816.08435512
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.56/ft1/gll_ph_r0236211846_v001.fit        236211848.96936107      236214145.084764
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.56/ft2/gll_pt_r0236211846_v001.fit        236211848.96936107      236214145.084764

All the FT2 files available from SLAC NFS in no particular order:

noric15:dflath> datacat find \--site SLAC \--group FT2 /Data/Flight/Level1/LPA/

/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.58/ft2/gll_pt_r0236511638_v003.fit
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.57/ft2/gll_pt_r0236339577_v000.fit
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.57/ft2/gll_pt_r0236345681_v000.fit
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.57/ft2/gll_pt_r0236409925_v001.fit
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.58/ft2/gll_pt_r0236443723_v002.fit
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.57/ft2/gll_pt_r0236271733_v000.fit
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.56/ft2/gll_pt_r0236090205_v001.fit
/nfs/farm/g/glast/u20/FT1-2copies/glast/Data/Flight/Level1/LPA/prod/1.57/ft2/gll_pt_r0236351742_v002.fit
... etc ...
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.