Definition of the Contents of the LAT Source Catalog
Note: This page was written in March 2005 and extensively re-arranged in October 2005
The current draft definition of the contents of the source catalog is in the draft of the Science Data Products Interface Control Document, available in Word and (in a document prepared for the review of the plans for the science tools in 2002) in HTML.
The draft is in need of updating in several respects. Some of the issues were discussed during a splinter session of the LAT collaboration meeting in September 2004. This presentation also includes the high-level requirements for producing the catalog.
This confluence page presents a summary of the issues and some suggested resolutions. Broadly speaking the issues are related either to what should be included in the catalog or how best to represent the included items in FITS. The issues and proposed resolutions, or any other aspect of the definition of the contents are open for comment (probably most effectively by editing this page rather than inserting Confluence comments).
New issues (July 2007) are here
Text in red shows what was adopted for the revised draft presented further below.
Jump to the comments added in addition to the issues.
(Ballet) I do not believe it is appropriate to define TLMIN/MAX for a string column (or is it ?). In any case, most likely we will number the versions of our catalog as everybody does, so that will be 1GL, 2GL, ... Are those registered at the IAU already ?
(Digel) I agree. The LAT catalog designation is not registered yet. I bet that the SSAC or SWG would like to have the responsibility of picking the letters to use, although there are not very many obvious choices. GL looks good to me. I imagine that GBM sources mostly will be given GRB ######-type names.
1GL J123456-012345 naming scheme has been assumed.
2. Conf_68_Region and Conf_95_Region
(Ballet) I think we had better split those fields into individual scalar columns. This is much easier to use in searches with standard tools.
The fields have been split.
(Strong) How does the ellipse allow for the cosine(dec) dependence of the RA in terms of great-circle angles ? (Consider what happens near the celestial poles). Is the ellipse in (RA, dec) or in real angles ?
(Ballet) We should add an uncertainty (1 sigma) to it.
(Digel) I agree.
4. Energy bands
(Ballet) I suggest we add the source count rates or fluxes (plus uncertainty) in several broad energy bands. This can be very useful when looking for unusual spectra (very different from a power law). I imagine the way to get that would be to run likelihood after event selection on energy in each energy band, then converting the prefactor value to a flux in that band for the spectral index of the source. Running likelihood directly on the image accumulated in that band would be another option, but this unfortunately requires an assumption on the spectral index as well (to build the PSF).
A simple choice of bands would be logarithmically spaced, with boundaries at log(EMeV) = 1.5, 2, 2.5, 3, 3.5 and maximum.
(Digel) This sounds sensible to me, although I'd prefer to use integral fluxes.
I stuck with integral fluxes, for >100 MeV, >300 MeV, >1 GeV, and >3 GeV
energy ranges, with uncertainties included for each flux. This will still be useful for finding unusual spectra.
(Ballet) The advantage I see in band-limited fluxes is that they are approximately independent.
In addition, your scheme introduces a confusion in the user's mind, because the main flux > 100 MeV that we provide (FLUX100) is NOT directly comparable to those integral fluxes. It is NOT obtained by selecting photons above 100 MeV, but is simply a representation of the source flux obtained by fitting the entire spectral range, adjusting the spectral index.
If we adopt your scheme, then we must go all the way, add a FLUX30 column which will be used for the full range, obtain FLUX100 from events > 100 MeV only, and adjust the spectral index (when it is possible) in all 'bands'. Then all the 'bands' will have the same meaning.
(Digel) Well, I actually did have in mind that the main flux >100 MeV could be compared with the fluxes for the other integral energy ranges, but I can see that the only use of presenting fluxes for different energy ranges is if they are fit separately. Otherwise a power-law fit contains all of the information.
The disadvantage of band-limited fluxes is that for many (or maybe most) sources they will also be quite statistics limited. I have not tried simulating spectra and making fits for these narrow energy ranges to see how limited we will be, but I guess that somebody should.
So, how about 30-100, 100-300, 300-1000, 1000-3000, and >3000 MeV? (I'm revealing some EGRET heritage in preferring these ranges to ranges of 0.5 in log~10~(E). What about going down to 20 MeV?
(Ballet) I have nothing against splitting at 300 MeV rather than 3162 MeV.
I don't think it is useful to extend down to 20 MeV. The events measured below 30 MeV are for a large part coming from above 30 MeV (the effective area increases steeply with energy there) and they have very bad spatial resolution.
On the other hand, it may be useful to add another band above 3 GeV (split 3 to 10 GeV and > 10 GeV). This will add another piece of information for bright hard sources and will not really change anything for faint sources (most photons above 3 GeV will be in the 3 to 10 GeV band).
It is true that for many sources the only useful bands will be 300 MeV to 1 GeV and 1 to 3 GeV, but I don't see that as a problem.
(Atwood, 31 Dec 2005) On the selection energy bands - one might consider that the background rejection analysis is done in quasi-logarithmic energy bands as well. However the definition of the bands is a little different with break points at < 100, 100-350, 350-1000, 1000-3500, 3500-10000, > 10000.
(Digel, 4 Jan 2006) No one else has weighed in on Bill's comment. I was going to say that the 30-100, 100-300, 300-1000, 1000-3000, and >3000 MeV ranges correspond to what was used for EGRET, but they don't really. (Above 1 GeV, the standard energy ranges used for EGRET analysis were 1-2, 2-4, and 4-10 GeV.) So I can't say that tips the balance one way or the other. I would hope that especially for point sources (as opposed to diffuse sources) we'll be essentially immune to any trouble (systematic errors, I guess) from residual background. But in case the character of the residual changes a lot between the bands used for the background rejection analysis, we may be better off using them to start with. Does anyone else have a strong opinion about this?
(Ballet) Did anybody study what was the best energy interval (in particular the lower boundary) to look for variability? Is 100 MeV indeed the best trade-off between number of source photons and number of background photons plus contamination by other sources ?
(Digel) This has not been studied, to my knowledge. Certainly 100 MeV is not the best in terms of being able to distinguish sources. Quoting fluxes for the range >100 MeV, though, is common.
The variability index can apply to whatever energy interval we choose, presumably specified along with the definition of the index
(Ballet) If I understood correctly the ideas developed in the September 2002 review of how to obtain a light curve, this will be done by running likelihood in each time bin (fixing the diffuse emission and the spectral indices). That document implied that this would be done in smaller time bins for brighter sources.
It would be much more homogeneous to do that in the same time intervals for all sources. It would ensure that the likelihood results for closeby sources would be obtained consistently, and would also be easier to use for systematic studies.
I would like to propose that we do that in large time intervals (like one month) for all sources. So the time interval (not the number of bins) would be fixed. In addition, we could add specific files (one per source) for bright sources where much more detailed information can be obtained (including spectral variability for example, or going beyond 100 bins).
(Digel) I like the idea of homogeneous time intervals in the catalog, but I am not sure what is best for sources that are variable on other time ranges. We'll certainly have ancillary information for the catalog (like images of the confidence contours, I think) that probably won't go in the FITS file for the catalog. I have to be careful about going too far down this path because then it starts to sound like an additional data product that needs defining.
The updated version has fixed one-month time intervals and includes integral fluxes and flux uncertainties (>100 MeV). As written, a fixed size array (12 elements) is specified. We may want to convert these back to variable length arrays so that the specification does not need to change when the time range gets longer.
(Ballet) 1 Byte (8 binary flags) is not much. Let's use I (2 Bytes) instead. This is a negligible size increase anyway.
2-byte integer are now used.
8. Extended Sources
(Digel) Should we expect the catalog to include extended sources in addition to point sources? If so, we should defined an 'extendedness' parameter and possibly also list an angular extent. This can be kind of a slippery slope, getting into semimajor and semiminor axes and position angles. I'd at least include a flag indicating whether a source is resolved.
The catalog does not include an extendedness parameter. If we decide we need one, we can use some of the flag bits defined above.
9. Source Identification
(Digel) For some sources, e.g., bright pulsars, identifications will be possible with high confidence. For most of the sources, though, the best that will be possible is a list of candidate sources. Is it realistic to assume that we will be able to assign confidence levels for assocations with counterpart sources? How about a Sowards-Emmerd-type "figure of merit"? If not, should we at least include angular offsets of the prospective counterparts from the maximum likelihood position of the source?
The updated definition includes space for just one counterpart source and for a flag value that defines the degree of confidence (1 for Figure-of-Merit above some threshold, 2 for correlated variability).
10. HDUCLASS Keywords (26 October 2005)
(Digel) I added placeholders for these. We need to figure out whether we are conforming enough to the HEASARC 'SRCLIST' definition to be able to use it.
11. Peak vs. Average (26 October 2005)
(Digel) It has gone unspecified until now, but the fluxes and flux history values described above are assumed to apply for the entire time range of the catalog (or of the time interval for a flux history evaluation). The catalog also includes Flux_Peak, Unc_peak_Flux, Signif_Peak, Time_Peak, and Peak_Interval to allow for specification of properties of flaring sources that we see only once for a short period of time. Defining significances for a specific time interval may be problematic, as the time interval has been selected to be when the detection significance is greatest.
Draft header (Last update: 30 December 2005)
Here is the FITS version of the first extension header of the FITS version of the catalog, with the changes specified above and in the comments section below incorporated.
Comments on the draft of 26 October 2005
Keywords to add:
EXTREL: release number of the template for the FITS header, to take into account for future developments and changes in the header.
CREATOR: the name and version of the executable that generated the FITS file.
CONFIGUR: name and version of the software system under which the executable run (e.g. SAE v X.x).
DATE: date of the creation of the FITS file.
TIMEREF: time reference frame (LOCAL, SOLAR SYSTEM, etc...).
TIMEUNIT: I suggest to change to days (JD), so that to use MJDREF as TZERO and it is possible to avoid huge numbers; TSTART and TSTOP should be updated accordingly.
VERSION: version of the catalog.
RADECSYS: FK5 default; stellar reference frame.
EQUINOX: 2000.0 default; coordinate system equinox.
I would add also a new column "NOTES" (character string) where to place some comments, like, for example, other names of the sources (e.g. the corresponding name in the 3rd EGRET catalog, etc...).
(Seth Digel, 30 December 2005)
I have updated the draft header above to take into account Luigi's comments and to finally include the fluxes for several energy bands. I also reformatted it to make it more like an actual template FITS header.
The updated draft includes DATE, TIMEREF, RADECSYS, and EQUINOX.
I omitted EXTREL because I believe the same information would be conveyed by HDUVERS.
CREATOR and VERSION are assumed to be in the main header for the file, which is not shown. To the extent possible we will have a common format for the primary headers of all our FITS data products; I intend to post a template for comment. CONFIGUR is assumed to be in the main header as well, with the name SOFTWARE
TIMEUNIT is changed to days, TIMESYS to 'MJD' and MJDREF is omitted. These changes, I think, permit the dates in the flux histories to be represented as MJD values. I think that we want these times in MJD rather than seconds of MET (as I had originally proposed) or as days with respect to January 1, 2001 (as Luigi proposed). Also I think that the description is correctly expressed in column 26 so that the duration of the interval used for the peak flux evaluation is in days. A detailed description of representing time in Chandra FITS files is available here (see section 2). I need to study it some more.
NOTES is omitted; the proposed use is good - especially for providing other names for identified sources - but how to make a NOTES column conveniently searchable or even figuring out how large a field to reserve is not clear. We'll have to revisit this.
Also, regarding flux histories, I am assuming that in columns 27 & 28, for intervals during which a source was not detected we'll have its flux entry as 0 and its flux uncertainty should be interpreted as a (2 sigma?) upper limit.
Regarding fluxes for multiple energy bands (see item 4 above), I have written the header so that each band has two columns, one for flux and one for its uncertainty. It could be written more compactly using arrays, so that only two columns total would be required to store the fluxes and flux uncertainties for all of the bands. But as Jean pointed out, having separate columns may be a convenience for searching. That said, the column names that I have used, e.g., FLUX30_100 and UNC_FLUX30_100 for the 30-100 MeV range, are fairly ugly and the UNC_FLUX* column names, although valid, are not unique within the first 8 characters as the HEASARC at least strongly prefers. Do you have any better ideas for naming the columns?
I've left the flux history and peak flux entries specified as corresponding to the range >100 MeV. I'd prefer to leave it that way for the purposes of the Catalog - the same energy range for every source and every time range.
(Luigi Foschini 4 Jan 2006): I think that the keyword MJDREF is important, otherwise the time system has no reference. Indeed, the start date can be selected to be also the launch date or any other useful date. I suggested Jan 1, 2001 just to say a number, but there are no constrains on this.
(Seth Digel, 5 Jan 2006) My intent in specifying TIMEUNIT = 'days', TIMESYS = 'MJD', and omitting MJDREF was so that the time-related columns (like times of peak fluxes) would be specified in the table as absolute dates in MJD, so no reference time is needed. I guess that this might be equivalent to setting MJDREF = 0. Also, I see that according to HEASARC recommendations I should have written the unit for days as 'd'.
Additional issues (starting 31 Dec 2005)
12. Galactic coordinates
(Strong) Hesitating to add comments at this stage, but I see that there are no Galactic coordinates.
With INTEGRAL catalogs it's always a pain to plot the catalogs as a Galactic distribution with eg fv since they also miss (l,b). Of course it can easily be done with e.g. an idl program but Galactic is so fundamental why not include it?
The EGRET catalogs DO have it which is very convenient! (Maybe all this was already discussed before).
(Digel) I think in Galactic coordinates, too, and would like to second this nomination. We are already at 34 columns, so what is a couple more? Regarding the error ellipses, we should keep the position angles (Conf_68_PosAng and Conf_95_PosAng) as degrees East of North, i.e., with respect to celestial coordinates.
(Jurgen) I think also that galactic coordinates would be good.
(Jurgen) One thing that is missing at this stage are UCDs (Uniform Content Descriptors). They are normally used to uniformly identify the content of catalogue columns (see for example the CDS Vizier website). I think it would be a good idea to add such UCDs.
14. Column names
(Jurgen) It could be reasonable to change the column names to more generic names (i.e. names that are also used in many other catalogues). For example, instead of RA and DEC we may use RAJ2000 and DECJ2000 (for galactic coordinates it would be GLON and GLAT). Uncertanties are often named "e_XXXX" so instead of "Unc_Flux100" we may use "e_Flux100".
Maybe we should submit our column naming proposal to CDS or HEASARC to get their opinion about our choices. This may help to create more conventional naming conventions ...
Additional issues (starting 25 July 2007)
15. Source names (25 July 2007)
Is the 1GL J123456-012345 naming scheme what we want to use? (1GL indicating first version of GLAST LAT catalog)
- Is fewer digits of precision acceptable? Probably, I think. So, a name like 1GL J123456-0123 should be enough precision
- Isabelle has suggested that we might be able to use decimal degrees in the names. These would be RA, Dec, but I'm not sure of the format, maybe something like 1GL J305.3-10.6, Wor perhaps just decimal minutes of time, e.g., 1GL J1234.5-0123. This would allow at worst 0.025 deg precision in the naming of a source. We don't expect to have sources closer together than that.
Answers to Seth by Isabelle:
- the name precision is guided by the source localization precision rather than by source separation to allow comparison with other catalogues.
- given the IAU guidelines and the brightest LAT sources localization down to a fraction of an arcminute, the only possible naming scheme in R.A and Dec is 1GL JHHMMSS.s-DDMMSS. IAU does not want decimals before the arcsecond and time second levels, nor decimal degrees in R.A. and Dec (no 1GL JDDD.ddd-DDD.ddd) even though it is much easier for your brain to grasp and memorize.
- Practice from INTEGRAL and HESS sources tells that 1GL JHHMMSS.s-DDMMSS names are extremely difficult to memorize and use efficiently. One gets lost only after a few sources, especially if they are closeby. For instance, try to memorize NGC4151 = J121032.6+392420 for a few days!
- One can use decimal degrees in Galactic coordinates, which would lead to 1GL GLLL.lll-BB.bbb and your brain will round the number up to help your memory. The COS-B catalogue was in Galactic coordinates. Yet, quasars (that will dominate the LAT catalogue) are usually in R.A. and Dec. So, what do we choose ?
- We can use 1GL, 1GLA, or 1LAT acronyms as long as it is longer or equal to 3 letters.
Comments from Jürgen:
- do we really need so many digits? Comparing to HESS (which uses HESS JHHMM+DD.d) we could also use 1GL JHHMM+DD.D. Although GLAST can localise more precisely than this precision it would not be able to distinguish two sources at this precision. So the names should be unique and also easy to remember. Another advantage: since the names would not include too many digits there is a fair chance that the names will not change from one catalogue to another (this was somewhat enoying for the EGRET catalogues). So a source detected in the 1GL would keep (probably) its name throughout all succeeding catalogues.
- concerning galactic versus RA/Dec: personnally I also think in galactic coordinates, but so many catalogues relevant for GLAST (radio, X-rays, gamma-rays) are in RA/Dec so that I would prefer also using RA/Dec for GLAST (otherwise one would also have to convert from one system into the other to compare sources ...)
16. Galactic coordinates (25 July 2007)
Isabelle asks if we should include Galactic coordinates in the catalog. Sounds reasonable, right?
To compare source positions and lock different frames in ds9 to compare counterpart positions, we need Galactic coordinates as well as R.A. and Dec !
Comments from Jürgen:
- I agree completely, we should also have galactic coordinates!
Comments from Jean: Yes, and galactic coordinates have always been included in the test catalogs that we have produced.
17. Technical issues in generating the catalog (25 July 2007)
These are from Isabelle:
- are we able to give ellipses for the 1st release?
- are we able to deliver a 30-100 MeV flux for all sources?
- we cannot fill the flux monthly history for sources that are just significant over one year. So we need to define what TS threshold allows to measure monthly lightcurves above 100 MeV. The vast majority of the sources, being too faint, won't have lightcurves.
Questions from Jürgen:
- Should the catalogue format be always the same over the succeeding catalogue versions? If we have no error ellipses in the first edition but in later ones, should we make evolve the catalogue format?
- As we discussed during the past LAT meeting we should take provision for more than a single counterpart association (to use the new term . For how many should we take provision? For each counterpart association we would have a counterpart name and a counterpart probability.
Comments from Jean:
- I see no reason why we should not deliver a 30-100 MeV flux for all sources (unless we think we do not master the IRF well enough), as well as a monthly flux history. A faint but very soft source can have a significant flux below 100 MeV, and faint sources can also be variable enough that they will show up in a few time bins. The real issue here is to decide how to handle upper limits.
18. Catalogue entry names (26 July 2007)
I (Jürgen) would like to reiterate the point 14: we should have more standard column names. By standard I mean names similar to those found in other catalogues. The coordinates would be RAJ2000, DECJ2000, GLON and GLAT, uncertainties would be preceeded by "e_..." etc.
19. Prefactor missing (20 November 2007)
I jusr recognised that the prefactor of the powerlaw fit was missing in the above catalogue format description. Jean's DC2 & SC2 catalogues had this prefactor but no error on it. I stronly recommend to officially add 'Prefactor' and 'e_Prefactor' to the catalogue (and to rename error quantities as suggested above using the standard 'e_<quantity>' scheme implemented by CDS.