Problem Statement:
In Geo location algorithms, correct estimation of the location of any target node depends on the correct mapping of delay to distance. When we get RTT values from landmarks to the target node, the next important step is to map this RTT value to such a Distance value which can correctly represent the radius of the circle drawn around that particular landmark. Since, the overlapping area of the circles is used to estimate the location (latitude, longitude) of the target node, so the correct estimation purely depends on the radius of circles drawn.
Thus we need to find a suitable correlation between the delay measurements and the distance values. There are various factors affecting the RTT values including: propagation delays; router forwarding and queuing delays; unavailability of great circle path; presence of satellite connections etc. These make it impossible at at least difficult to reach to a single common factor which could be used in the delay to distance mapping.
We know that digital information travels in fiber at a speed of 0.6 times the speed of light in vacuum. Thus we can say that 1ms of RTT can equal roughly 100Km distance. But in order to tackle the additive distortions in RTT values due to the various delaying factors mentioned above, use of this 100Km/ms alpha value results in a large over estimation.
Then many geolocation location techniques use a much smaller value of 40-60 Km/ms for alpha.
Our goal is to find values of alpha that can more accurately map RTT values to geographical distance.
Analysis Data Sources
Using PingER data
The PingER database includes RTT data measured from over 50 monitoring sites around the world to over 700 remote sites. For all of these sites PingER has values of their latitude and longitudes
Using the TULIP reflector
We have a large number of landmarks across the globe with known locations, so we can hopefully use these landmark locations to study alpha and estimate a reasonable characterization of alpha. The idea is that out of a large number of available landmarks, we can take one landmark (having known location) as a target and ping it from all other landmarks. This can be repeated by taking each of the other landmarks one at a time as the target and ping it from the other landmarks. As a result, we can get a table having exact distances and measured RTTs from each landmark to other landmarks.
We can use the TULIP reflector CGI script to make the above RTT measurements. The reflector will be using landmark's information from the sites.xml file. This is enabled by a script written by Fida.
Simple Frequency Histograms of Alpha
Using Obtained Data to reach to a suitable alpha value:
Once we got a large data set containing RTTs from known location landmarks to known location targets, we can apply statistical and mathematical operations to reach to a correlation between RTT values and the known distances. In this step, to reach to a better alpha value, we can separately analyze RTT values region by region and can see that how similar the behavior is in different regions. This can help us to find out whether same alpha value can work for all regions or it should be selected region wise. We know that it makes no sense to consider a landmark which is very far from the target and is not expected to be the candidate for final landmarks to be used; yet for analysis purpose and to reach some accurate Alpha value, we can consider all landmarks in the analysis.
As a simple example, we can take a ratio of Actual distance to RTT for each of the landmark and then take an average of the resultant values. Now this average value can be a better Alpha value and multiplying this averaged value with any RTT can give us a Distance value which can be closer to exact distance.
If we are not interested in useless delay to distance mapping for the landmarks not possibly becoming a part of final few landmarks, we can neglect very high RTT landmarks (as high RTT is possibly because of longer physical distance of landmark from the target). But sometimes it's possible that our target is lying in such a geographical location where no active landmark is present in its nearer locations. In such case probably some far located landmarks would be candidates for final selection, so we can't make any prior estimate of RTT value beyond which we should neglect landmarks. Probably one better choice can be that we perform delay to distance mapping of a fixed number of landmarks for any target and leave any other landmarks (using Sorted RTTs).
The above method means that after identifying a tier1 landmark for any target, we should go delay to distance mapping only for the tier0 landmarks of that particular region.