You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 16 Next »

Event Summary Data (FT1, LS-002)

This is a fundamental data product used by the science tools. The current definition (version of July 12, 2005) is posted on the Guidelines for Science Tools Design page maintained by Masa Hirayama.

The issues and proposed changes below primarily are with respect to the 'LAT event summary' extension in the version linked above. You may want to have that page open when you read this page. The initial concept for the contents dates at least to 2001, when we were not burdened with a detailed understanding of what would be available from reconstruction and classification and how the tools would work, and so should be updated in several respects.

The issues and proposed resolutions below, or any other aspect of the definition of the contents are open for comment (probably most effectively by editing this page rather than inserting Confluence comments).

1. Duplication of keywords

(Digel) TELESCOP, INSTRUME, EQUINOX, RADECSYS, DATE, DATE-OBS, DATE-END are duplicated from the primary header of the file. Is this necessary, or even a good idea? I propose putting TELESCOP, INSTRUME, DATE, DATE-OBS, and DATE-END in the primary header only, and EQUINOX and RADECSYS in the Lat event summary header.

(Foschini) Perhaps it is better to keep all the above keywords in the events and GTI extensions and left blank the primary header. The latter is generally not used by common data analysis software.

(Stephens) I think they should stay in all the headers. The idea is that each extension can stand alone if necessary and have the relevant keywords. The only reason to leave them in the primary header is for informational purposes. i.e. you just have to read the primary header and you know what the file is about.

(Ballet) I agree that the extensions are more important than the primary, and that the best solution is to have the general keywords in both the extensions and the primary.

(Hirayama) The repetetion of "major" keywords (such as TELESCOP) was suggested by the HEASARC FITS Working Group when we met them at NASA/GSFC on May 28th, 2003. See also section titled "Answers to the questions" under topic 006 of the latest topics of FT1 and FT2 for more details. Because the FT1 definition changed quite a bit since then, it could be useful if some of us can meet the HEASARC FITS Working Group members to discuss on this topic.

2. HDUCLASS keywords

(Digel) I am not sure whether we adhere to the strict definition of a HDUCLASS1 = EVENTS file. The only column that we have in common with the specification of The Recommended Columns and Keywords for a FITS Event List is TIME.

Also, I think that with the HDUCLASS keywords we are supposed to supply a format-specifying document with the HDUDOC keyword, and a version in HDUVERS. We don't have such a document, and we already have (but do not yet use) a VERSION keyword in the primary header.

For these reasons, I'd recommend removing HDUCLASS keywords entirely.

(Hirayama) The HDUCLASS and HDUCLASn keywords were added based on the HEASARC FITS Working Group's suggestion made when we met them at NASA/GSFC on May 28th, 2003. See also topic 002 of the latest topics of FT1 and FT2 for more details. I guess they will not appreciate it very much if we drop those keywords, although I can't (and shouldn't) speak for them, of course. In any case, I suggest to ask for their opinion before making a dicision to drop those keywords.

3. TASSIGN keyword

(Digel) TASSIGN is supposed to be where the event times were assigned, and we currently have this set equal to 'SATELLITE'. This is not strictly true, as we will need some ground processing to turn ticks of the 20 MHz clock into time since the last 1 PPS signal from the spacecraft, and we'll also need to shift GPS time to TT.

Also, I don't see what good the keyword does us. So I recommend removing it.

(Hirayama) For usage (and the meaning) of TASSIGN keyword, see the HEASARC's description. Based on the description, 'SATELLITE' is appropreate for our case, in my opinion.

It is a part of standard time-related keywords listed under section titled "B. Time Keywords" in "The Recommended Columns and Keywords for a FITS Event List" and that's why it was added, I think, although I am not sure whether it is absolutely necessary or not. Indeed, gtbary seems not to read nor write this keyword. Also, in the meeting with the HEASARC FITS Working Group at NASA/GSFC on May 28th, 2003, it was suggested (if I remember correctly) that not all of the listed time-related keywords are necessary (See also topic 004 of the latest topics of FT1 and FT2 for more details). On the other hand, (I believe) it is a commonly-used keyword among other high-energy astrophisics missions and having this keyword in our FT1 file doesn't hurt us very much. So, I would recommend to keep it "just in case."

In any case (to drop it or to keep it), the HEASARC FITS Working Group might have a different opinion on this topic, especially from a point of view of multi-mission support.

4. OBS_ID and OBJECT keywords

(Digel) I don't think we need either of these. What OBS_ID was originally intended to represent is not clear, and a specific OBJECT is typically not relevant for the LAT.

(Foschini) OBS_ID can be used to identify the observation proposal of Guest Investigator programs, that should be activated during the second and the following years of operations. The OBJECT would be the on-axis target. With this respect I suggest to add also 4 more keywords, namely RA_SCX, DEC_SCX, RA_SCZ, DEC_SCZ for the equatorial coordinates (J2000) of the spacecraft X and Z axes during that particular pointing. This in the EVENTS HDU only. These keywords could be of help also during other observations to identify the centre of the field of view.

(Stephens) The GLAST data as planned to be distributed does not have pointings in the traditional sense and there will be a lot of data that doesn't correpsond to a proposal from the GI program. The RA_SCX, DEC_SCX, RA_SCZ, DEC_SCZ keywords are mostly meaningless for the data as the spacecraft is constantly moving. The center of the "field of view" for the data extraction is already identify by the Data Subspace Selection (DSS) keywords. It would be possible and might be of use to duplicate this in the OBJECT keyword but I think the OBS_ID keyword would go away.

(McEnery) I think that the OBS_ID might not be very useful. It is ambiguous when an observation starts. The LAT does not go into a different mode or start a new run when we repoint, so it would not be obvious when to start tagging events with a new OBS_ID. Also, as Tom points out, we will likely spend most of our time in survey mode even after year 1. I think that it may be useful to have RA_SCX, DEC_SCX, RA_SCZ, DEC_SCZ because they define the orientation of the LAT for each event.

(Hirayama) I don't remember why those keywords were introduced, but there are some records in topic 009 and topic 010 of the latest topics of FT1 and FT2, which might be of your interest.

(Foschini) The basic problem of these and other keywords is to define the boundaries of the data that are saved into a certain file. After a fruitfull email exchange with Seth and Julie about the observing modes, I suggest that it can be useful to separate the data obtained from slews and those from pointings. Also in survey mode, the spacecraft is not continuously slewing, but rocking between two pointings, lasting one orbit on each pointing. The pointing direction can change, but what is important is that also in survey mode the spacecraft is pointing toward a certain direction for one orbit. On the other hand, independently on the reason for which the spacecraft is moving (slew, repoint, other), the data acquired in this mode require a different treatment with respect to the data from pointings. So, we have a sequence of pointings-slews-pointings-slews... both in survey and GO modes.

I think that it could be useful to use the change between pointing and slew as boundary for the files: in this way, a lot of keywords, like RA_SCX, DEC_SCX, RA_SCZ, DEC_SCZ, OBJECT and even OBS_ID, are automatically defined (for slews these keywords could refer to the middle value), and so that also for time keywords. This could be also useful to avoid an excessive load of the hardware for the science analysis: several single small files are better than one or few single huge files.

Please note that the RA_SCX, DEC_SCX, RA_SCZ, DEC_SCZ keywords refer to the spacecraft (i.e. star tracker), but the boresight of the LAT should be calculated by applying a rototranslation from the star tracker position. The latter can be always improved, so that it is always better to have the starting point, that is the position from the star tracker.

5. MC_TRUTH keyword

(Digel) I think that the original intent was that this keyword indicate whether the data are entirely Monte Carlo truth values (i.e., actual directions, energies, etc.). We do not actually have files like this, and I propose removing this keyword. gtobsssim does add a column MC_SRC_ID to the files it generates, but this is easy enough to check for without the MC_TRUTH keyword.

(Stephens) Actually I thought this was a flag like the old PSR_COLS keyword and indicated that there were additional MC columns in the file. As we don't use this I think it can go away.

(Hirayama) Adding a new column to an FT1 file for a special case may not be appreciated by the HEASARC people. I remember they don't like "variants" of an FT1 file, mainly for archival purposes. I think their discussion was like: "for the GLAST team members (who know the file contents very well), it is just one additional column, but for the HEASARC members (who do NOT want to deal with the file contents as much as possible), it is a different file format." Probably it is better to talk to them about it again, once we have our conclusion about this topic, rather than trusting my rusty memory.

6. EVENT_ID column

(Digel) This is specified as a 32-bit integer. Right now, most likely the event IDs will be assigned by the LAT as a time stamp. Most likely we will need more than 32 bits to represent them, and of course they should look more or less like some kind of integer representation of the value in TIME. Right now, I don't know what to recommend for EVENT_ID.

(Foschini) Yes, it appears that EVENT_ID is a duplication of TIME column. It can be removed.

(Hirayama) The topic is also outlined in topic 010 as well as other range problems that might interest you. Also, note that it may not be very easy to identify an event by TIME column after gtbary overwrites its contents for a barycentric correction.

7. RECON_VERSION column

(Digel) RECON_VERSION is defined as a 2-byte integer to define the version of reconstruction (and classification) algorithm applied for a given event. This is probably more general than we need. In principle, a given events file could have events processsed with different versions of the software, but in practice, I do not expect that this will happen. I recommend that we make RECON_VERSION a header variable instead of a column.

Also, somebody needs to think of how we will translate versions of the geometry, calibration files, reconstruction, and classification, into some fairly compact representation. Certainly we will be closely keeping track of this information for Level 1 processing, and will have a database someplace that can tell us.

8. CALIB_VERSION column

(Digel) As for RECON_VERSION we do not need this as a column. Actually, I'd recommend combining it with RECON_VERSION.

9. IMGOODCALPROP, IMVERTEXPROB, IMCOREPROB, IMPSFERRPRED, CALENERGYSUM, CALTOTRLN, and IMGAMMAPROB columns

(Digel) These columns relate to the classifications of the events and date to DC1, in particular to implementing the Atwood cuts for background rejection. Many things came together at once in the runup to DC1 and we ended up incorporating each of these variables in the event summary file. The DC1 response functions were derived post-Atwood cuts.

For DC2, we will have a cleaner way to specify the results of the event classification, although I do not know what it is yet. We will have some kind of analog of IMGAMMAPROB. Ideally, we'll also have distilled other classifications into a few sets that map into response functions that we'll provide (e.g., front, back, and cal-only events, with good_pdf, good_energy, or dont_care).

What do you recommend here?

Speaking of classification, will we also have a flag like 'HEAVY_CR'?

(Burnett) The "IM" prefix was inserted by me when I absorbed Bill's values. Since it stands for Insightful Miner, and it is not clear that we will always use exactly Bill's trees for this, I would suggest that we drop it from the names. "COREPROB" is confusing, it really means goodpsfprob. And there is no longer anything to correspond to IMPSFERRPED.

More practically, since there would be actually cuts on each of the IM variables to define "gammas" to get into the FT1 gamma file, those need to be clear. Should they be strict, so that the events are highly likely to be gammas, or loose, to allow a user to choose the level of contamination with respect to poorly measured energy, direction, or presense of background? Given this possibility, how many different Aeff and PSFs will we calculate.

In the case where there are multiple parallel trees for the classification analysis, the best cuts are probably dependent on which tree was used: this implies more variables, or fields in a variable, to describe which path the analysis took.

10. CONVERSION_POINT

(Digel) This was originally imagined as a way for end users to decide whether they wanted to believe whether a particular event was not a charged particle and was well reconstructed. Eventually, we will have an event display server available that will make this much easier for users, who would otherwise have to find the geometry of the LAT some place.

Also, so far we have not invented a need for filtering the data on CONVERSION_POINT. In principle, we may some day have a solar flare mode, for which we will pay attention only to the inner towers and layers, but this is probably much better implemented as a kind of flag (like "INTERIOR").

I recommend removing CONVERSION_POINT.

(Burnett) I agree. The filtering implied by this is already incorporated into the classification tree variables.

11. PULSE_PHASE and ORBITAL_PHASE

(Digel) These are needed (and filled) by the pulsar timing analysis tools. I don't know whether they belong in the definition per se of the event file, because they can be added by the pulsar tools, but I don't feel particularly strongly if they stay. I do recommend that we make them floating point values instead of their current doubles.

(Stephens) They belong in the definition as they can be in the file, Even if they are not there in the
files delivered from the data server.

(Hirayama) Those columns became "permanent" to avoid multiple variants of the FT1 definition. In fact, it is not recommended (by the HEASARC FITS Working Group members) for a tool to change a file format by adding those columns. (See also my comment for "5. MC_TRUTH keyword.") Personally I like it better if pulsar tools add those columns when necessary, but I also see their point, especially thinking about archiving those files for future use. So, if we need/want to drop them, probably we should talk to them about our plan.

(Digel) I think that we should consider the FT1 format to define what the data servers deliver. The PULSE_PHASE and ORBITAL_PHASE columns are specific to an analysis, and would be wasted space in the server. These columns should be added to files by the pulsar tools that fill them, just as gtdiffresp adds a column for each diffuse source in the particular source model under consideration. None of these analysis-specific columns is fundamentally part of the FT1 data.

12. SKYX/Y issue

(Digel) The Guidelines for Science Tools Design page lists one outstanding issue for the definition of the events file, whether to add columns that give the coordinates of the events in some coordinate projection.

The justification given for this is that with projected values of the coordinates (presumably with respect to a sensibly chosen center), tools like ds9 can interpret and bin the events into maps, and then (possibly complicated) regions can be defined for generation of response matrices. This is an analysis path (multiple overlapping point sources in Xspec) that we are not intending to pursue.

Jim has pointed out that having ds9 able to make a binned map directly with real coordinates on it (as opposed to just sky pixels) is a big convenience. Otherwise, a map first has to be generated with gtbin.

I don't know what to recommend. Having coordinates available in some (which?) coordinate projection would be convenient, as would, say, having them available in Galactic coordinates, but I am not sure where it is sensible to draw the line.

(Foschini) Surely having ds9 able to make images directly from the event list is very useful, particularly to have a quick look. In this case, it is necessary to have full WCS keywords in the EVENTS header. But, for what I see the event list files are expected to be huge, so it will be necessary to have an executable able to make a selection of events centered on certain coordinates, with a certain radius of extraction, in a certain time region, within a certain energy band (that is already available). In that case, we can avoid putting the full WCS keywords in the EVENTS header and the executable extracting the selected photon list can add the necessary WCS keywords in the subset of data.

13. Live time

(Digel) Way back when (until early 2003), the events file was defined to include a column called 'Deadtime' which was to be the 'Deadtime accumulated since the start of the mission.' It was removed, partially because Masa pointed out that information that relates specifically to the LAT belong in the FT2 file. Also, live time turns out to be more convenient to work with.

I propose that we include accumulated livetime since the start of the mission as a column. It was always imagined that for very short time intervals (shorter than the update interval of the FT2 file) we would need to calculate accumulated livetimes between events in the event summary file. This also will be important for studies of solar flares and (if we are lucky) very bright GRBs, when we may be dead time limited for short periods of time.

(Foschini) Perhaps it can be useful to have a DEADC keyword with the average deadtime correction (i.e. the complementary to deadtime value) applied to the events in the file.

(McEnery) I agree that we should have livetime included as a column in the events file.

14. Event summary++

(Digel) LS-002 is an event summary, intended to have information for higher-level analyses. The GSSC expects (reasonably) to receive this summary for every event that is telemetered to the ground (and that is not discarded early in the process as obviously background to save disk space at the ISOC).

They would also like to receive an 'extended' event summary that includes all of the variables that are used by the classification trees. That sounds fine to me, too. Right now, the specification of variables actually used by the trees has not converged, and sending the whole Analysis Ntuple would be overkill. I think that we should be able to define this data product in good detail within the next month.

15. Time Issues

(Foschini) I suggest to adopt the days as time unit, that is of immediate use for scientist. This means to change the TIMEUNIT value to 'd'.

I suggest also to add to the EVENTS and GTI header two keywords indicading the on-board time value corresponding to the first and last event in the file. That is, the keywords TSTART,TSTOP should have corresponding keywords OBTSTART,OBTSTOP. This can be useful if the time correlation is missing or with errors.

I would add also a keyword TELAPSE, namely just TSTOP-TSTART, both in the EVENTS and GTI header, just to say the time extension of the whole observation. This is not the exposure or the sum of the GTI, but simply the whole uncorrected time of observation.

Two more keywords can be useful for timing accuracy: TIERRELA,TIERABSO see The Proposed Timing FITS File Format for High Energy Astrophysics Data.

(Hirayama) I don't think changing TIMEUNIT value from 's' to 'd' is a good idea. Almost all times are expressed in units of seconds, with an exception of MJDREF (which is in days), in event files for various astrophysics missions as long as I know. Also, according to Glossary of Keywords commonly used in OGIP FITS Files, TIMEUNIT governs TSTART, TSTOP and TIMEZERO keywords, but not TIME column (which is governed by TUNITn keyword). That means TSTART cannot be directly compared with a TIME column value, for example, if TIMEUNIT='d' and TUNITn='s'. It is technically feasible, but rather confusing. TUNITn can be 'd' to avoid the confusion, but then TELAPSE (and some other keywords) are in units of seconds no matter what. Looking at definitions of those other keywords and columns, it seems to me that 's' is a more natural choice for TIMEUNIT than 'd'.

For TELAPSE, OBTSTART, and OBTSTOP keywords, see my comment on "17. GTI."

16. Name of tables

(Foschini) Presently there is only one keyword to identify the template of EVENTS and GTI data structure. To take into account that templates can change (particularly during the first months after the launch), it is better to add a keyword EXTVERS or something similar with the version number of the used template. Moreover, the data structure of LAT and GBM can require different keywords, so I propose to give at the EXTNAME more complicate names to take into account the different uses: for example, for the LAT events there can be EXTNAME='GLAST-LAT_EVT' and the corresponding GTI can be EXTNAME='GLAST-LAT-GTI'.

(Ballet) I don't think it is a good idea to change the names. The INSTRUME and TELESCOP keywords are there to indicate which intruments we are dealing with. Better keep standard names for the EVENTS and GTI extensions, as was done for previous missions such as Chandra or XMM-Newton. I agree to the EXTVERS suggestion.

17. GTI

(Foschini) I suggest to add a keyword GTI_NAME to the GTI header to explain the origin of GTI: for example, there could be GTI due to attitude, telemetry, or other. The final GTI can be a merging of the whole types of GTI.

The keyword TELAPSE in GTI should indicate the elapsed time of the whole observation (see the notes at n. 15 Time Issues) and not the difference of GTI STOP-START.

I suggest also to add two more columns in GTI table, OBTSTART,OBTSTOP, with the onboard time corresponding to START,STOP columns.

Perhaps it could be useful also to add two more columns with START and STOP in UTC.

Note: the keyword HDUCLAS2 for GTI header (if kept, see n. 2 HDUCLASS keywords) I think should be 'STANDARD'.

(Hirayama) About the TELAPSE keyword, I remember somebody (most likely a member of the HEASARC FITS Working Group) told us the definition is "time between START of the first GTI and STOP of the last," which is the current FT1 definition. However, I couldn't find such statement on the web. Instead, HFWG Recommendation R11 states "TELAPSE is the time interval (in seconds) obtained as difference between the start and stop times of an observation." It sounds like they assume a pointed observation, as usual, where the assumption is not quite appropriate for GLAST. My guess is that, when they explained the definition to us, they gave us a traditional explanation that works for pointed observations, without much consideration on a continuously scanned observation that GLAST will perform, although we should confirm it with the HEASARC people before we conclude so.

If my understanding is correct, I would agree that TELAPSE = TSTOP - TSTART and should apppear in an EVENTS extension, too. In the current definition, it shows up only in a GTI extension because it is defined as a derived quantity from the contents of a GTI extension.

For GTI_NAME, if we need it at all, it should cooperate with DSS keywords, I think. In order to compute and revise GTI's upon data subselection, not all DSS keywords will be taken into account of. So, one solution could be (although I don't think it is pretty) to list DSS keyword numbers that are used to compute GTI's in GTI_NAME keyword value.

For OBTSTART and OBTSTOP, I don't think it is a good idea. Noting GTI's will change upon data subselection, a data subselection tool must update OBTSTART and OBTSTOP keyword values, too. The GLAST tools can be modified to do so, but external tools such as XSELECT will not. At the least, that necessitates a special handling of a GLAST FT1 file, wihch is different from event files of other astrophysics missions.

I think we should adhere to a standard GTI format, unless a deviation from it will significantly improve users' benefits.

For START and STOP in UTC, I simply don't know how to do it. A moment in time in the UTC system cannot be expressed by a single number because of leap seconds, if I understand correctly. Also, for the same reason as for OBTSTART and OBTSTOP, I don't think inventing a new GTI extension format is not a good idea, either.

(Foschini) The OBTSTART,OBTSTOP keywords are useful only in case of problems, rather than for a direct use. That is, if there are problems in time correlation or anything else, the only way to reconstruct on ground the events sequence is to have the on board time (that is the only direct measurement of time of an event) and restart again the conversion to user friendly time values. So it is just to have a backup option.

For UTC keywords, these are generally expressed as a string: e.g. "2005-11-22T11:43:00". In this case, it is just for the end user, to have something more friendly than JD or anything else. To make the conversion, it is used a time correlation, that is a function that linearly correlates the UTC to the onboard time.

  • No labels