Confluence will be unusable 23-July-2024 at 06:00 due to a Crowd upgrade.
Space |
Size |
Backup |
Lifetime |
Comment |
---|---|---|---|---|
xtc |
Unlimited |
Tape archive |
1 year |
Raw data |
hdf5 |
Unlimited |
Tape archive |
1 year |
Data translated to HDF5 |
scratch |
Unlimited |
None |
1 year |
Analysis results and temporary data |
User home |
Unlimited |
Disk + tape |
Indefinite |
User code |
Tape archive |
Unlimited |
Two copies |
10 years |
Raw data |
Space |
Size |
Backup |
Lifetime |
Storage class |
Comment |
---|---|---|---|---|---|
xtc |
Unlimited |
Tape archive |
6 months |
Short-term |
Raw data |
hdf5 |
Unlimited |
Tape archive |
6 months |
Short-term |
Data translated to HDF5 |
scratch |
Unlimited |
None |
6 months |
Short-term |
Temporary data |
ftc |
10TB |
None |
2 years |
Medium-term |
Filtered, translated, compressed |
res |
1TB |
Disk |
2 years |
Medium-term |
Analysis results |
User home |
20GB |
Disk + tape |
Indefinite |
|
User code |
Tape archive |
Unlimited |
Two copies |
10 years |
Long-term |
Raw data |
Note: XTC/HDF5 files can always be restored from tape; files restored from tape will stay on disk for 1 month.
The goal of the new proposed policy based on three different storage classes is twofold:
The new policy will be accompanied by web tools to facilitate users in:
Why are you doing this to us?
We do believe this is a better policy, above all for the users. We noticed that the previous 1-year policy was not enough. At the same time keeping everything on disk for 2-years is not the best use of LCLS resources. Hence LCLS decided to extend the lifetime on disk to 2-years for the most used data files. This was done by introducing quotas and by letting the users select which files should stay on disk.
We do understand the new policy requires a more active participation by the users so we'll provide tools to help managing the data. For example, we made it much easier to restore data from tape and move data across storage classes. We'll soon provide tools to compress and filter the data.
Also, the scratch directories were sometimes (ab)used unfairly. So we decided to create three scratch areas (ftc, res and scratch) with different characteristics.
So will all raw files be deleted by the proposed deadlines?
No, you can extend by 2 years the lifetime on disk of your raw data by selecting runs with the file manager in the experiment web portal.
By which date do we need to select runs with the file manager to increase the lifetime on disk of our data?
See the proposed deadlines here https://confluence.slac.stanford.edu/display/PCDS/Phase+1+Dates
What will happen to the data under the scratch folder?
Starting November 1st 2012, all files under scratch which are older than 6 months will be deleted. Use the ftc or res directories to extend the lifetime of your intermediate data.
Can we restore files from tape to disk by ourselves?
Yes, use the file manager tab in the experiment web portal.
How long does it take to restore files from tape to disk?
The rule of thumb is 1TB per hour. Most experiments could be entirely restored on disk in half a day. The largest LCLS experiment would take five days to be restored.
What is the lifetime on disk of files restored from tape?
Files restored from tape will stay on disk for one month.
How long do the raw data files stay on tape?
Ten years.
What should we store in the ftc experiment directory?
The ftc directory is meant to provide a long term disk storage (2-years) for the raw data after basic processing like event filtering, translation to HDF5 or compression. Users can store what they want under ftc as long as they can recreate its contents from the raw data since this directory is not backed up or archived. This space has a default quota of 10TB.
What should we store in the res experiment directory?
The res directory is meant to provide a long term disk storage (2-years) for the results of the user analysis. Users should store important results and shared code under this directory. This directory is backed up and has a default quota of 1TB.
What should we store in the scratch experiment directory?
The scratch directory is meant to provide a large, short term disk storage (6-months) for intermediate results. Users can write to this directory during data analysis development and later on move the most important results to res. This directory is not backed up and it doesn't have a quota.
Can we ask for an increase of the 2-years quota for a particular experiment?
It's fair to ask.
Can we ask for an extension of the deadline for the run selection?
Yes, but only for experiments taken after April 2011 and try to keep the request within 3 months from the proposed deadline.