Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Please do not store under the scratch folder data that you cannot recreate because this directory is not backed up and the oldest files on scratch may be deleted at any time to make space for data from new experiments.
  • The tape archive (xtc, hdf5, usrdaq) and the tape backup (results, home) are fundamentally different:
    • In the tape archive the folders are frozen after the end of the experiments and their contents are stored on tape once. 
    • In the tape backup, the system takes snapshots of the folders as appear at a given time. this This implies that files which are deleted from disk are eventually, i.e. after a long enough time, also deleted from tape. 
  • For raw data the cleanup operations will affect all files, i.e. all streams and chunks, which make up one run, rather than individual files.
  • After 2 years from the end of an experiment we'll remove the experiment from disk. At that point we'll take a snapshot of the results and calib folders and archive them to tape so that we can, upon request, restore an entire experiment back to disk.
  • After 10 years we plan to remove the tapes with the archived raw data from the silos and store them in a safe environment.
  • The new policy will apply to all experiments, i.e. it will be retroactive, and its deployment date will coincide with the start of Run 14 (August 10th 2016).
  • For questions regrading the data retention and data access send your question to: pcds-datamgt-l@slac.stanford.edu.

...

~<username>/.zfs/snapshot/

Cleanup of raw data

The raw data, the xtc and hdf5 files in the corresponding experiment folders, are purged from disk now and then. The minimum lifetime for new raw data is four month (see table above) and one month for runs that were restored from tape. Notice that runs that exceed the lifetime become eligible for purging but will not be automatically purged from disk. 

Purging will remove all files that belong to a run (streams and chunks for xtc files) from disk. A few rules are applied for purging eligible runs:

  • Purging is performed only if the free disk space is below a minimum threshold.
  • Purging will stop if the free space is above a maximum threshold.
  • The least recently accessed runs will be purged first.

The purging thresholds might vary depending on the size of a file system and its usage but typically are 5% and 10% (minimum/maximum threshold).  Using these three rules we try to keep runs that are actively analysed for as long as possible on disk and providing sufficient disk space for the ongoing experiment.