Confluence will be unusable 23-July-2024 at 06:00 due to a Crowd upgrade.
Space | Size | Backup | Lifetime | Comment |
---|---|---|---|---|
xtc | Unlimited | Tape archive | 2 months | Raw data |
usr | Unlimited | Tape archive | 2 months | Raw data from users' DAQ systems |
hdf5 | Unlimited | Tape archive | 2 months | Data translated to HDF5 |
scratch | Unlimited | None | 1 month | Temporary data |
res | 10TB | Tape | 6 months | Analysis results |
User home | 20GB | Disk + tape | Indefinite | User code |
Tape archive | Unlimited | Two copies | 10 years | Raw data |
In the past couple of years we have observed a few non ideal aspects of the LCLS data retention policy:
The policy proposed below relies on two enabling technologies:
The addition of these capabilities allow us to enforce a policy that adapts to the actual usage of the system.
The SHORT-TERM or MEDIUM-TERM storage classes are eliminated. We're switching from the guaranteed unconditional stay of files on disk for a fixed duration of time to an access time based algorithm for determining the expiration status of the files. This will be enforced by the constant monitoring of the file systems to determine which files were access and when. We will be also tracking if the files were actually analyzed or just "touched" to dodge the policy.
The actual expiration threshold will be calculated dynamically based on the amount of free space available on the corresponding file system at the time of the cleanup procedure. Files with the more recent access times will stay on disk. The clean up process will be removing older files until it achieves the goal of preserving 20% of free space on the file system. The cleanup will affect whole runs (not individual files) which were determined as "expired". An automatic notification message will be sent to the PI (or all members?) of the affected experiment after each cleanup.
The amount of storage available for scratch/ will be increased by redistributing the freed storage resources of the ftc/ file systems. Like for the raw data folders (xtc and hdf5) the retention of files on scratch will switch from the fixed term expiration model to the last access time based model. The cleanup algorithm will be eliminating files with the older access times until the desired goal of having 20% of free space is met.
The existing 24 months expiration policy will be respected after the change. Though, at the same time we're changing the expiration enforcement technology for the file system by switching to Lustre HSM/HPSS. The technology will allow expired files to stay in the file system _namespace_ (be visible to users with the 'ls' command) while the actual file content will be eliminated from the disk. The files which are not actually on disk will be automatically restored from tape at a first attempt to open them. This process is supposed to be transparent to a user application except an extra delay before the file content will be available to a user process. When a process hits this type of files it will just hang at the file open stage before the file is brought back from tape to disk.