Policy by Folder

Red text indicates a change from the previous policy.

Space

Quota

Backup

Lifetime

Comment

xtc

None

Tape archive

4 months

Raw data 

usrdaq

None

Tape archive

4 months

Raw data from users' DAQ systems

hdf5

None

Tape archive

4 months

Data translated to HDF5

scratch

None

None

4 months

Temporary data (lifetime not guaranteed)

xtc/hdf5

10TB

n/a

2 years

Selected XTC and HDF5 runs

ftc

10TB

None

2 years

Filtered, translated, compressed

results

4TB

Tape backup

2 years

Analysis results

calibNoneTape backup2 yearsCalibration data

User home

20GB

Disk + tape

Indefinite

User code

Tape archive

-

-

10 years

Raw data (xtc, hdf5, usrdaq)

Tape backup--IndefiniteUser home, results and calib folder
Disk backup--Indefinite

Accessible under ~/.zfs/

Rationale for Proposed Policy

In the past couple of years we have observed some aspects of the LCLS data retention policy which are not ideal:

  1. All experiments are treated equally even if a few institutions copy the data home and don't need to have the data on disk at SLAC: ideally we would reserve that disk space for other experiments that do rely on it.
  2. Some folders and the different storage classes (short, medium and long term) were not always properly understood or used  (e.g. ftc was often treated as scratch).
  3. It's been hard to maintain the promise of preserving all the data on disk for their supposed lifetime: this has proved particularly tricky for scratch where the users can easily write tens of terabytes in a few hours.
  4. Deleting data too early, i.e. when files are still being actively accesses, can cause large, and concurrent, restore operations from tape which affect negatively the performance of the system.

Also, we have studied the data usage over time and we have observed that:

Proposed Policy

Based on the observations above we propose the following modifications:

Notes

User Home

df -h ~<username>
~<username>/.zfs/snapshot/