http://confluence.slac.stanford.edu/display/ds/Xrootd+Discussion+-+January+2008
Xrootd is able to retrieve files from HPSS. If a users requests a file that is not on disk Xrootd will check if this file exists in HPSS and if so it will be staged to disk. All of this is transparent to the client and the only difference it sees is that it takes a little bit longer to access the file.
In order to have HPSS work efficiently it is important to have large files be put into HPSS. The time to mount a tape and to position it in order to read the file are significant (ten's of seconds). Typically files sizes of 1GB and larger are desirable.
Xrootd itself does not put new files by itself into HPSS and neither does it remove files if disk space runs short. It, however, setups new files in such a way that the migrate daemon is able to identify new files.
An xrootd data server runs a migrate and a purge daemon. Both of them run periodically (typically every 10 min for the migration and every 30 min for the purging).
The migrate daemon looks for new files, files that were transfered into xrootd (with xrdcp for example). If it finds new files it will transfer them to HPSS and mark them as old (in HPSS). Files that are staged from HPSS are marked as old.
The purge daemon will check the disk space periodically. If the free space is less then a certain percentage (configurable) it will search the directories for old files. These files are sorted by age and the oldest ones will be removed until the free space reaches a certain limit.
For example: Lets assume the required free space is al least 5% and the purge threshold is 10% free space. If the disk fills up more then 95% the purge daemon will remove files until the disk usage is 90%.
As mentioned earlier putting small files into HPSS is inefficient:
A possible solution is to tar small files and then only copy the tar file into HPSS:
For migration:
For Xrootd (assuming a file is not on disk):
In addition it is needed that Xrootd is:
A client doesn't know anything about tar files. It will ask xrootd to open a fail and xrootd has to
determine if the file is: on disk, in HPSS, or in HPSS as a tar file.
A client that wants to open a file connects to a redirector (RDR) and the redirect asks its data-server if they have this file on disk. If none of the data-servers replies the RDR assumes that the file is not on disk and it will pick a server that allows staging from HPSS and tell the client to go to that server. The client will then ask this data-server tp open the file which will trigger the copy from HPSS to disk. If a second client asks for the same file
it will be redirected to the same server the first client was redirected to (even if the file is not yet on disk).
This makes sure that a file is only staged to a single server.
The data server could be setup so that upon an open request it would:
The naive approach to stage the tar file and untar'ing would not work:
If two clients ask for different files which are in the same tar file, the redirector might redirect the two clients to different data servers (the RDR doesn't know anything about tar files) and subsequently the tar would be staged twice.
As it is likely that clients open many files from the same tar this would lead to unwanted data duplication.
Solution??