Testing event queries

In order to get a better understanding of the cost of data queries some tests have been carried out.

Different databases, different ways of specifying the query and various indexing methods have been tried.

September - October

The spatial extensions are no longer used since performances are disappointing especially when indexes need to be rebuilt.

Selection of events located within a cone is implemented more accurately.
For each event cartesian coordinates (x,y,z) were computed from the celestial coordinates (ra,dec) and have been stored in the event table.
A bounding box of the intersection of the sphere by the cone is computed. If a pole is located in the cone a whole cap is kept.
Events located inside this box are selected. For each event in this subset the dot product between their direction and the axis of the cone is computed.
A query looks like :

select event_number, run_number, time, energy, quality, ra, decl from events
where (1198 <= ra and ra <= 2041 and 1980 <= decl and decl <= 2340) and
(x * 0.35355338 + y * -0.35355338 + z * 0.8660254 > 0.9961946980917455)

A typical result with MySQL is shown here.

January - June

With a pretty small number of events and using the geographical modules available in the database procucts a first evaluation has been made.

Sample Data are stored in a relational database. Events data are kept in a table.
A perl script is run to retrieve events located in a region defined by ra_max, dec_max, ra_min,dec_min. (typically 32x32 degrees).
The time spent by the query as well as the number of events found are kept.
The query is launched 100 times, statistics are output.

Radius

events

oracle

mysql

postgresql

1

135

90

4

30

2

360

110

9

40

8

4300

280

100

260

16

15000

900

380

2200





On the X axis is the dimension : a value of 16 defines a square of 32x32 degrees.
The Y axis indicates the time in milliseconds.
The crosshairs show the number of events found divided by 10.
These results are for queries using spatial functionnalities implemented by the RDMS.
A R-tree index has been created to improve performances.

Details are available on this page.

See also:

  • No labels

5 Comments

  1. When you are testing remember that when you retrieve data out for the science tools you need to extract all 20+ FT1 columns not just the four columns in the sample data you used. What impact does that have on the performance?

    1. Our plan is to have the meta-data database contain only the four columns we currently have, plus run number and event index, so that after doing the meta-data search we can pull the events out of any of the tuples or the full root trees. So the meta-data table will expand slightly, but it is not currently our plan to have it contain all 20 FT1 columns.

      We still need to do tests on the performance of pulling individual events randomly out of the tuples to establish the viability of this approach.

    2. Unknown User (lefevre)

      The purpose of this test is primarily to analyze the cost of finding events by coordinates and to see if spatial indexing is efficient.
      Later, it would be interesting to see if the weight of a row matters.

  2. Do any of the RDBMS have a function to search a circular region instead of an RA/DEC box? What is the performance there?

    How do they handle searchs near the poles of the coordinate system? Remember that a box in RA/DEC near the pole is not the same size as near the equator. Also a typical region will be on the order 30x30 not 2x2 degrees.

    1. Unknown User (lefevre)

      The cost of performing trigonometric computations is likely quite high.
      A possible approach would be to select first a subset of events located in a "rectangle" then to carry out precise selections on this subset.
      Another one would be to call some specialized geometric functions provided by the OpenGIS implementations.
      These tests are in progress ...
      My script used to execute the tests takes the region size as a parameter. I've results for 32x32 patch.