The user community and its sponsoring organizations are very diverse with differing requirements for data retention. The Linac Coherent Light Source (LCLS) cannot guarantee indefinite data archival. Users of LCLS are responsible for meeting the data management requirements of their home institutions and funding agencies. Once data have been provided to the user group, the user group is responsible for managing the long-term retention of their data. The ownership of data generated at LCLS is governed by the User Agreement in place between the user group and the facility. Refer to your User Agreement for more details.
LCLS is committed to providing its users with their data in a timely and convenient fashion. Experiment data and metadata collected at LCLS may be stored at and retrieved from the facility for a period of ten years. The data retention practices described in this page are on a best effort basis, they depend on availability of the actual resources and may change at any time if these resources become unavailable. Historically we have been able to deliver on these lifetimes (with the exception of scratch where, twice, we had to delete data newer than 4 months), but we cannot guarantee them: available resources depend on the actual usage rate and available funding, which we do not fully control.
Space | Quota | Backup | Lifetime | Comment |
---|---|---|---|---|
xtc | None | Tape archive | 4 months | Raw data |
usrdaq | None | Tape archive | 4 months | Raw data from users' DAQ systems |
hdf5 | None | Tape archive | 4 months | Data translated to HDF5 |
scratch | None | None | 4 months | Temporary data (lifetime not guaranteed) |
results | 4TB, 10K files | Tape backup | 2 years | Analysis results |
calib | None | Tape backup | 2 years | Calibration data |
User home | 28GB | Disk + tape | Indefinite | User code (home in S3DF) |
Tape archive | - | - | 10 years | Raw data (xtc, hdf5, usrdaq) |
Tape backup | - | - | Indefinite | User home, results and calib folder |
Disk backup | - | - | Indefinite | Accessible under ~/.zfs/ |
For older experiments this folder is called res instead of results
df -h ~<username>
The raw data, the xtc and hdf5 files in the corresponding experiment folders, are purged from disk now and then. The minimum lifetime for new raw data is four month (see table above) and one month for runs that were restored from tape. Notice that runs that exceed the lifetime become eligible for purging but will not be automatically purged from disk.
Purging will remove all files that belong to a run (streams and chunks for xtc files) from disk. A few rules are applied for purging eligible runs:
The purging thresholds might vary depending on the size of a file system and its usage but typically are 5% and 10% (minimum/maximum threshold). Using these three rules we try to keep runs that are actively analyzed for as long as possible on disk and providing sufficient disk space for the ongoing experiment.
The xtc and hdf files are archived to tape and can be restored to disk in case runs were purged from disk. Restores are requested using the FileManager of the experiments eLog. A basic guide to the UI is described in Managing Files.
The requests are sent to a queue which is monitored by a process that will retrieve the files from tape. The restore time might vary from tens of minutes to many days depending on the amount of data to be restored but also how busy the tape system is. In particular if high throughput experiments are running the restore will take a backseat.
Please see the children pages Version 1, Version 2 and Version 3 for a description of the evolution and rationale of the LCLS data retention policy.