Xrootd Discussion - September 13 2007

SLAC Ballam room, 3:30pm, Thursday 13 2007

We will meet to discuss various issues involving GLAST's use of xrootd. The aim of the meeting is to review and get feedback on how we are (planning) to use Xrootd, review outstanding technical issues and problems, and come up with a concrete TO DO list and timescale.

GLAST's planned use of xrootd

Intend to use xrootd for storing data files (mainly *.root, *.fits)
- MC simulations, test data and real data from satellite
- Some files will be relatively small (10's of MBytes),
  - Wilko is developing system to tar these files after they are put into xrootd, but before they are put on tape.
Want xrootd to manage many physical disks
Want xrootd to manage archiving of files to HPSS (with recovery from tape expected to be rare)
All files to be put into xrootd should also be registered in data catalog
- Files are first put into xrootd, then registered in data catalog
  - This is not an atomic transaction
- Data catalog has "crawler" which verifies existence of files after they have been added
  - Extracts checksum, creation date, # events etc
  - If file exists in multiple locations (e.g. xrootd and NFS) checks consistency of files.

Issues (mainly based on Tom's input)

Bugs/Reliability issues

unreliability of redirector-server link for writing files. If a server containing an old version of a file is delayed, off the network, hung or shutdown, the redirector writes the new file to a new server. This results in a situation with two copies of different versions of the same file.
unreliability of redirector-server link for reading or removing files. If a server holding the desired file is delayed, off the network, hung or shutdown, there is no way for the redirector to distinguish this situation from a request for a nonexistant file.
- Note: propitious use of the dataCatalog might mitigate the ill effects of the items aboce. For example, if all files in xrootd are registered in the dataCatalog, a special purpose script could "verify" all files by using the xrootd "stat" command (this could be done, say, asynchronously after the file's creation). Then, all subsequent xrootd requests to read/write/delete/stat/copy/etc. could be wrapped in a script to first check with the dataCatalog to confirm the initial conditions and then properly report unexpected error situations.
I (Tom) now think that all GLAST xrootd servers should be dedicated to that one function and not shared with other functions, such as /nfs serving. Wilko recently relinquished /u44 (shared with Monte Carlo data disk /u45) but is still using (part of) /u42 whose server is shared by /u43. Further, /u42 itself is itself shared with other /nfs users.
The xrootd servers (redirector and disk servers) have in the past failed to restart after machine reboot (or failed to start talking to each other after reboot), this needs to be fixed.
There is a known bug in the xrootd "rm" command which can result in client code hang.
The "xrdcp -f" command is not working
The "rm" command now only removes an xrootd file from disk - and not HPSS. This is probably okay unless a new version of that file is created, which may cause confusion.
- This may become more critical when we add the ability to automatically tar files before archiving them
There has been some evidence that xrootd has scaling issues and we do not know how much load it will sustain. I have suggested (8/22/2007) setting up a Pipeline task expressly for the purpose of hammering on the system to locate its limitations and weaknesses but we need an xrootd expert's help in designing the appropriate tool.
There seems to be no global tool for assessing the status of the entire xrootd infrastructure as used by GLAST (including redirector(s) and server(s)). Perhaps a nagios script or somesuch should be set up for proper monitoring.

Miscellaneous/Management issues

casual way xrootd software components are managed (e.g., Wilko's private directory with no concept of a "release") In fact, I set up a directory with appropriate AFS permissions expressly for this purpose (8/27/2007), /afs/slac.stanford.edu/g/glast/ground/PipelineConfig/xrootd, but it has not yet been used.
Documentation for xrootd is spread over multiple disconnected confluence pages; they should be consolidated in a single place with appropriate links to other, external docs, i.e.,

Feature requests

It would be nice if xrootd kept the original date when a file is copied in, and maintained this even if added to/restored from HPSS.
The command syntax, error messages, and returncode for commonly used xrootd commands could be
improved to be easier to use within (python) scripts.
lack of a reasonable way to list files in xrootd (e.g., "ls" command)
- A possible alternative solution would be a nightly job which lists files on all servers, combines them together, and flags inconsistencies (e.g. same file with different size/date).
Ability to migrate (or duplicate) data between disks without going via HPSS.

Recommendations/Action Items

Do not run NFS on the same machine as xrootd.
Increase priority of some xrootd processes.
Checksum only (a few) files at a time (we already do).
Could append some kind of UID to files to ensure names are unique.
- Could perhaps use process instance ID?
- Need to think about how we access files from different job steps
- Should not use ID in public data catalog name
- Ideally this could be automated as part of GPL tools
How to handle incomplete (real) data ?
- Maybe we can move data if we find out that it is as complete as it ever will be.

Access control
- We need to restrict access to members of glast-user (or subset)
- NFS group is recommended way to handle it
  - We need to figure out how to create an NFS group and keep it updated automatically

When machines on which xrootd daemons are running crashes system does not recover
- Not clear whether there is a timeout issue
- Set timeout lower?
- Try out on Wilko's laptop

Bug in rm — bug in client (bug #?)
- Fabrizio has promised to fix this (moving to CERN, Atlas)

xrdcp -f bug is fixed
- xrdcp is in /usr/local/bin

When Tom does a lot of rm's together with short timeout caused problems
- Wilko dummied out mss call

How easy to bring xrootd down?
- Many clients reading at once can cause client timeouts
- Thumpers have 15TB per machine.

Ganglia runs on our servers
- Wilko will ask Yemi to rename the servers which are running xrootd
- Talk to Tifigh about how appropriate the Babar Monitoring System is for Glast
- We will send Tom a pointer to babar monitoring system

We will attempt to be more organized and formal about which releases Glast is using
- We will use our own glast location for client tools
- Wilko should use version management in Jira
- Should use AFS directories area to make version # clear
- Will make sure tools can print out their version #

Hard to maintain dates on files in xrootd
Tom should provide list of desired return codes for xrootd tools
- Wilko has a xrdcpls command

We could add the checksum into the file

To Do List

Complete implementation of tar/archive/retrieval system

Space shortcuts

Child pages