You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

We will meet to discuss various issues involving GLAST's use of xrootd. The aim of the meeting is to review and get feedback on how we are (planning) to use Xrootd, review outstanding technical issues and problems, and come up with a concrete TO DO list and timescale.

Tom's Input

  1. lack of a reasonable way to list files in xrootd (e.g., "ls" command)
  2. casual way xrootd software components are managed (e.g., Wilko's private directory with no concept of a "release") In fact, I set up a directory with appropriate AFS permissions expressly for this purpose (8/27/2007), /afs/slac.stanford.edu/g/glast/ground/PipelineConfig/xrootd, but it has not yet been used.
  3. unreliability of redirector-server link for reading or removing files. If a server holding the desired file is delayed, off the network, hung or shutdown, there is no way for the redirector to distinguish this situation from a request for a nonexistant file.
  4. unreliability of redirector-server link for writing files. If a server containing an old version of a file is delayed, off the network, hung or shutdown, the redirector writes the new file to a new server. This results in a situation with two copies of different versions of the same file.
    Note: propitious use of the dataCatalog might mitigate the ill effects of items 3. and 4. For example, if all files in xrootd are registered in the dataCatalog, a special purpose script could "verify" all files by using the xrootd "stat" command (this could be done, say, asynchronously after the file's creation). Then, all subsequent xrootd requests to read/write/delete/stat/copy/etc.
    could be wrapped in a script to first check with the dataCatalog to confirm the initial conditions and then properly report unexpected error situations.
  5. I now think that all GLAST xrootd servers should be dedicated to that one function and not shared with other functions, such as /nfs serving. Wilko recently relinquished /u44 (shared with Monte Carlo data disk /u45) but is still using (part of) /u42 whose server is shared by /u43.
    Further, /u42 itself is itself shared with other /nfs users.
  6. The xrootd servers (redirector and disk servers) have in the past failed to restart after machine reboot, this should be fixed.
  7. There has been some evidence that xrootd has scaling issues and we do not know how much load it will sustain. I have suggested (8/22/2007) setting up a Pipeline task expressly for the purpose of hammering on the system to locate its limitations and weaknesses but we need an xrootd expert's help in designing the appropriate tool.
  8. The "rm" command now only removes an xrootd file from disk - and not HPSS. This is probably okay unless a new version of that file is created, which may cause confusion.
  9. There seems to be no global tool for assessing the status of the entire xrootd infrastructure as used by GLAST (including redirector(s) and server(s)). Perhaps a nagios script or somesuch should be set up for proper monitoring.
  10. There is a known bug in the xrootd "rm" command which can result in client code hang.
  11. The "xrdcp -f" command is not working
  12. The command syntax, error messages, and returncode for commonly used xrootd commands could be
    improved to be easier to use within (python) scripts.
  13. Documentation for xrootd is spread over multiple disconnected confluence pages; they should be consolidated in a single place with appropriate links to other, external docs, i.e.,

https://confluence.slac.stanford.edu/display/ds/Xrootd https://confluence.slac.stanford.edu/display/xrd/The+Scalla+Software+Suite+++xrootd++and+olbd

http://xrootd.slac.stanford.edu/

  • No labels