Design issues for Science Tools
I'm posting the message from the GSSC with regard to the GUI design discussed on Feb 22.
This is a follow up to the presentation by Marco Frailis of a unifying GUI for the Science Tools proposed by the DataMind group. We at the GSSC find the approach interesting and having promise, and we are appreciative of the efforts to date. We feel though that there are still a number of issues to be addressed.
The GSSC position is that this GUI interface presented may have great potential for simplifying the use of the science tools, but after the meeting on Thursday (Feb 22) there was not a clear implementation strategy. If this is to be a value added product internal to the LAT collaboration then the issues raised below are less relevant. If it is advocated that this GUI should be (or might be) distributed with the science tools then it would be good for the GSSC to be involved in determining the approach.
The first point is that the GUI should be just that, a GUI. In the presentation, it sounded like the DataMind group intends the GUI to link together the Science Tools, but not to include any direct scientific functionality. If the pipeline components start to take on some of the aspects of the science tools, e.g. any additional processing, then it seems to us that this might be leading to a split between the science tools released by the GSSC and those available to the collaboration. I think we all agree that this would not be a desirable outcome and we should be careful to avoid this. We think that having the ballistic science tool as the software modules in the GUI would achieve this in a natural manner.
A corollary to this is that the Science Tools should drive the GUI development, and not the other way around. The team should not be required to devote substantial time and/or resources to add features to the ballistic tools in order to support features in the GUI.
Another issue to consider carefully is the use of third-party components, especially if the GUI may be distributed with the Science Tools. For example, the DataMind group wants to use wxwidgets. That may be OK, but is it really necessary or could standard Python Tkinter widgets be used? Another example might be Java and CMT, both of which are used in the ProC software. The Science Tools will be distributed CMT-free, and currently do not require Java.
The point is not to forbid the use of any particular component, but rather that the GSSC (and even more so the HEASARC) will find it harder to support and distribute a system with dependencies on many components. It would be good for the collaboration to agree on a complete and well-articulated list of specific dependencies and requirements of the proposed GUI system before it is implemented.
Although Marco did not mention this in the presentation, there has been discussion of a connection between the proposed GUI and the build system. Again, ties between the new GUI and MRvcmt/CMT/ReleaseManager etc. would be a minus with respect to the GSSC distributing the GUI. The tools that will be released by the GSSC must build under the HEASARC build system and run under all the supported HEASARC platforms. The GSSC has made good progress on building the science tools in the HEASARC system using the current science tools environment. If there are any changes to the build system we should proceed carefully to preserve the work that has already occurred. In addition if this GUI system is to be distributed with the science tools it must also build under the HEASARC system.
We would also like to understand more completely how the concept of the "Pipeline" relates to the concept of an "Analysis thread" in the workbook. It was not clear from the meeting if this GUI would be able to use python scripts or would in "encapsulate" the workbook and lead users through the analysis.
It has also be stated several times that this GUI would address several windows usability problems. It would be helpful to know what the usability issues are so that we could evaluate how the GUI would help to alleviate these problems or if we should address these problems outside the GUI environment.
During Checkout 3, a number of problems were encountered in simulating and analyzing data owing to inconsistent assumptions about how time is represented. I summarized the issues here during the VRVS meeting of the Science Tools Working Group on October 5, 2005.
I'd like to make some specific proposals here that if adopted will result in changes to the sources, tools, and data products. These are important enough that they should be implemented before DC2. We did not reach consensus during the VRVS meeting, but I hope that we can here.
1. MJDREF checking
The times that will come out of Level 1 processing for the events and the pointing/livetime history will be Mission Elapsed Time, seconds since a reference epoch. This reference time has been specified mission wide as midnight, January 1, 2001. In FITS files the conventional designation of a reference epoch is MJDREF, the Modified Julian Date for the reference. For GLAST this is 51910. It will not change.
The problems in checkout 3 arose in part because the tools did not check the value of MJDREF, and just assumed that all times were MET, i.e., with respect to MJD 51910. Unfortunately, this was broken when an FT2 file with a different MJDREF was used.
My proposal, toned down somewhat from discussions in the VRVS meeting, was that anything that reads absolute times from a FITS file should check the value of MJDREF and issue a warning if it is not 51910, i.e., if the times are not GLAST MET.
The majority opinion, however, seemed to be that the tools should read MJDREF and use it (whatever its value) to derive absolute times without complaint. This would make the tools potentially more multi-mission friendly and would also mean that the times in FT1 and FT2 files would not have to be GLAST MET.
My argument against this position is that we know that the FT1 and FT2 times WILL be MET and that the reference date for MET will not change. I think that the sources that generate time-varying sources (like SpectralTransient for blazars, PulsarSpectrum for pulsars and GRBobsmanager for GRBs) internally use MET. I believe that the analysis tools do, too. If we allow MJDREF to be different from 51910., then the tools will need to work in terms of absolute times rather than MET.
Another opinion that was held by more than one person at the meeting was that in any case, the tools should not have hard coded in them what the 'correct' MJDREF value is. I don't think this argument is particularly compelling if one believes that MET is MET.
Still, there's no doubt that we'd be preserving the most flexibility and have a more robust analysis environment by having the tools work with absolute times rather than MET internally.
Unless converting the tools to use absolute times internally is easier than I think it is, I'd still propose that we have the tools and simulator check the value of MJDREF and issue a warning if it is not 51910.
2. Specification of a simulation starting time
In terms of defining sources for the observation simulator, a big inconvenience of having the sources use MET is that for times during the mission, MET will be 200,000,000+. We should have a way to specify a reference date for a simulation. This is not to be a substitute for the one true MJDREF but instead a MET value with respect to which the times for a given simulation are specified.
During our discussion at the VRVS meeting, it quickly became clear that we can't just have an additional XML tag that specifies a reference time for an entire simulation because complicated simulations can be constructed as the concatenations of several XML files. A less-appealing alternative is to have the reference MET specifiable as an optional parameter for any source that requires a time specification.
Toby mentioned during the meeting that for running Gleam he has implemented a reference time like this, something like the launch date of GLAST. It isn't clear to me that we want to have just one reference time, because a few years into the mission we'd be back to dealing with time offsets in the 100,000,000+ range, but I've asked Toby to describe how his implementation works, and I (or he) will post them here.
3. Times in data products
The tools that write FT1 or FT2 files use template files that define the keywords and many of the values, the extensions, etc. The way things are arranged now, any packaged that needs one of these template files has its own copy. For FT1, this means that we have 3 copies (which are not now currently all the same) in different packages (fitsgen, observationSim, and tip) and none of which is necessarily related to the current definitions of the FT1 and FT2 files that Masa maintains on the Web. FT2 template files are in 2 packages (fitsGen and observationSim).
Jim proposed that we keep one copy of each of the template files in the same package and that all packages that need them look for them in this central location. Also, the definitions in the template files in the repository would be THE definitions, that presumably would be mirrored to the Web.
I think that this makes sense, but I'd appreciate it if someone could explain why each of those packages needs the FT1 and/or FT2 templates in the first place.
Also, other packages have template files: e.g., evtbin has LatEnergyBinDef.tpl and LatTimeBinDef.tpl and pulsarDb has PulsarEph.tpl. I don't think that any other package will want to use these templates.
As James pointed out, the down side of moving template files out of the packages where they currently live is that we would have to start versioning the templates (e.g., with a new header keyword) and having the tools check the versions of the template files that they use.
versioning would be necessary. (The implication is that if a template file is kept in the package that uses it, then they are kept in synch - the code of the tool is updated as needed by the package maintainer when the template file is changed.)
So, if it turns out that we really weren't using all 3 copies of the FT1 template file and both copies of the FT2 template, the question becomes whether we should put all of the template files in one package to be centrally maintained, I hope by someone like Masa. What do you think?
Is there any convenient way to have the current versions of the template files linked to Masa's Web page of the FITS definitions of the Science Data Products? I think that the answer is only maybe. ViewCVS obviously could be used, if the revision number is known. For example, here is ft1.tpl from observationSim. Masa's Web pages have HTML versions of these templates, but with additional information that does not appear in the template file, including the formats of the values expected for each header keyword. So ViewCVS views of the template files could be expected to replace only part of Masa's pages.
The current template files do not all have analogs in the FITS formats described on Masa's page, although perhaps they should. The focus for Masa's Web page (and for the definition of science data products) has been on the products that will be exchanged between the LAT team and the GSSC. But we clearly have data formats, e.g., for binned event time series, that need to be maintained for the science tools.