We should organise the folders in the catalogue to avoid total chaos. Here is a straw proposal for a top level layout:
MC
DC2
Service Challenge
Gleam
Obssim
Pass5
BeamtTest
User
Test
Data
ETE
1
2 etc
Flight
I&T
Test
Questions
- how do we specify data reprocessings? Or MC for that matter.
4 Comments
Richard Dubois
From Tom:
Seems fine to me. The current scheme organizes data under "Service Challenge" by task name, something which makes sense
to me but perhaps not to Joe User. Can there not also be aliases? If so, then we could have a parallel scheme, one
which was meaningful to pipeline task operators, and another by project and/or physics category intended for the
collaboration at large. For example, there will likely be many more ServiceChallenge sponsored tasks than most users
will ever be interested in, e.g., old v9r20 data. I would like to retain the task name as part of the organization if
possible - and am not certain the "Gleam" - "Obssim" separation would be needed in that view.
- Tom
Richard Dubois
From Anders:
Hmmm .... I'm not sure we want reprocessings at the (near) top
level. It would more be suitable at the run/file level.
In general I prefer to minimize the number of top levels, but
maybe we should split LCI and Physics i.e. Fligh/LCI, Fligh/Physics (and
maybe add a Flight/LEO)?
anders
Julie McEnery
I have a few comments on the folders in MC. I think that service challenge may not be a good identifier (because it will cut across a few versions of the CT analysis). Maybe something like:
MC
LAT (in orbit)
pass3/DC2
AllGamma
Bkg
Sky
Gleam
Obssim
...
pass4/handoff
AllGamma
Bkg
Sky
Gleam
Obssim
...
pass5/SC/final?
AllGamma
Bkg
Sky
Gleam
Obssim
...
BeamTest
I&T (if we still have some)
User
Test
The disadvantage of this is that a given task will move from pass4 to pass5 when we reprocess with the new classification trees. It would also give us obsolete branches (pass3/DC2 and pass4/handoff) at the top level which would not be great. However, this has the advantage of clearly collecting together datasets which have a high chance of playing well together (Service challenge would otherwise be a mix of pass4 and pass5). Pass3/pass4/pass5 all correspond to a labeled set of IRFs so make sense for collecting obssim runs too.
I don't have any suggestions at the moment for the DATA tree.
Tony Johnson
In general I think we should try to arrange folders from a user-centric point of view, i.e. making sure it is obvious to the user which top level folder the data they are looking for will be in is probably more important than making it obvious to the data producer. So items like Service Challenge/DC2, Gleam/Obssim all seem to make sense. (Or perhaps something like full simulation/fast simulation – will end users necessarily know what gleam is?) For MC running task name is an obvious component of the path name, although I would think it should come below the Gleam/Obssim level (and thus may be repeated). For real data I doubt that task name will be an important part of the path.
We are developing tools which should make it easy to move folders around, especially at the top-level, so the organization does not have to be fixed for all time. So right now ETEn is important and should live near the top, but in a years time it can be moved further down. It is not possible right now to have the same file/dataset appear in two different places in the tree, but I have been discussing with Dan how to add this, so I think it will appear later.