Storage Classes

Space

Size

Backup

Lifetime

Storage class

Comment

xtc

Unlimited

Tape archive

6 months

Short-term

Raw data

usr

Unlimited

Tape archive

6 months

Short-term

Raw data from users' DAQ systems

hdf5

Unlimited

Tape archive

6 months

Short-term

Data translated to HDF5

scratch

Unlimited

None

3 months

Short-term

Temporary data

xtc/hdf5

10TB

n/a

2 years

Medium-term

Selected XTC and HDF5 runs

ftc

10TB

None

2 years

Medium-term

Filtered, translated, compressed

res

1TB

Tape

2 years

Medium-term

Analysis results

User home

20GB

Disk + tape

Indefinite

 

User code

Tape archive

Unlimited

Two copies

10 years

Long-term

Raw data

Tools

Web tools have been created to facilitate users in:

  • Restoring files from tape to disk
  • Move data across different storage classes
  • Applying processing framework to select useful events within a run (only when effective)

Rationale for Switching to New Policy

The goal of the new policy based on three different storage classes was twofold:

  • Allow users to have easy access (ie on disk) to the most frequently used data for a longer period of time
  • Make better use of the LCLS storage resources

We do believe this is a better policy, above all for the users. We noticed that the previous 1-year policy was not enough. At the same time keeping everything on disk for 2-years is not the best use of LCLS resources. Hence LCLS decided to extend the lifetime on disk to 2-years for the most used data files. This was done by introducing quotas and by letting the users select which files should stay on disk.

We do understand the new policy requires a more active participation by the users so we provided tools to help managing the data. For example, we made it much easier to restore data from tape and move data across storage classes.

Also, the scratch directories were sometimes (ab)used unfairly. So we decided to create three scratch areas (ftc, res and scratch) with different characteristics.

Frequently Asked Questions

Will all raw files be deleted after 6 months?

  • No, you can extend to 2 years the lifetime on disk of your raw data by selecting the most frequently accessed runs. This selection can be done with the file manager in the experiment web portal.

The total size of our experiment is below the quota, may we select all runs for the 2-years storage?

  • You may, but try to be a good citizen by selecting only the most frequently used files and by restoring from tape the runs you rarely use.

What happens to the data under the scratch folder?

  • All files under scratch which are older than the specifid number of months will be deleted. Use the ftc or res directories to extend the lifetime of your intermediate data.

Can we restore raw files from tape to disk by ourselves?

  • Yes, you can restore raw files (xtc and hdf5) from tape using the file manager tab in the experiment web portal.

Can we restore the contents of the ftc, res and scratch folders from tape?

  • No, the contents of the ftc and scratch directories are not backed up or archived. The contents of the res folder, though, can be restored on demand from a disk backup.

How long does it take to restore files from tape to disk?

  • The rule of thumb is 1TB per hour. Actual results will vary from 60MB/s (if only one file is restored) to more than 600MB/s (if there are at least ten files in the queue evenly spread across ten available tape drives). Most experiments could be entirely restored on disk in half a day. The largest LCLS experiment would take five days to be restored.

What is the lifetime on disk of files restored from tape?

  • Files restored from tape will stay on disk for one month.

How long do the raw data files stay on tape?

  • Ten years.

What should we store in the ftc experiment directory?

  • The ftc directory is meant to provide a long term disk storage (2-years) for the raw data after basic processing like event filtering, translation to HDF5 or compression. Users can store what they want under ftc as long as they can recreate its contents from the raw data since this directory is not backed up or archived. This space has a default quota of 10TB.

What should we store in the res experiment directory?

  • The res directory is meant to provide a long term disk storage (2-years) for the results of the user analysis. Users should store important results and shared code under this directory. This directory is backed up and has a default quota of 1TB.

What should we store in the scratch experiment directory?

  • The scratch directory is meant to provide a large, short term disk storage for intermediate results. Users can write to this directory during data analysis development and later on move the most important results to res. Please do not store under the scratch folder data that you cannot recreate because this directory is not backed up and the oldest files on scratch may be deleted at any time to make space for data from new experiments. This folder doesn't have a quota. 

What goes in the usr experiment directory?

  • The usr folder contains data files collected by users' DAQ systems which are not integrated with the LCLS data acquisition system (eg some Rayonix cameras).

Can we ask for an increase of the 2-years quota for a particular experiment?

  • It's fair to ask.

Can we ask for an extension of the deadline for the run selection?

  • Yes, but only for experiments taken after April 2011 and try to keep the requested date within 3 months from the proposed deadline.
  • No labels