Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

By: Les Cottrell SLAC and Qasim Lone NIIT/SLAC

Introduction

On January 31st, 2008, the NY Times, BBC, The Guardian, CNN and many others reported undersea cable cuts in the Mediterranean. There is also a Frequently Asked Questions (FAQ). One cable was damaged near Alexandria, Egypt, and another in the waters off Marseille, France. The two cables were damaged within hours of each other on Wednesday morning of January 30th 2008. Operators believe the damage was caused by ship's anchors during a heavy storm at sea. One of  the cables, Sea Me We 4,  is owned by 16 telecommunications companies along its route. The second cable, known as the Flag (for Fiber-optic Link Around the Globe) System, runs from Britain to Japan. The cables are separately managed and operated. The outages mainly affected the Middle East and Asia. Most disrupted communications were quickly rerouted through the remaining SEAMEW3 cable or fibres taking the other way around the globe. The cables involved are shown in the Telegeography map below. There are also world maps from Telegeography and Alcatel.

Investigation

We decided to look at the impact on Internet connectivity as seen by the PingER project measurements seen from SLAC (near San Francisco in California). These are very simple ping echo measurements of Round Trip Time (RTT), Loss, jitter etc. the variations for a given path are typically caused by congestion. The measurements provide a sample of the Internet connectivity to over 150 countries of the world, countries that between them contain over 95% of the world's population and 99% of the world's Internet connected population.

Also see Effects of Mediterranean Fibre Cuts December 2008.

Looking at the hourly ping losses (there are ~20 pings in an hour, so a loss of 1 ping is 5% loss) seen from SLAC for January 30th 2008 for large increases in losses which persisted to the end of the day (to avoid regular diurnal change, but unfortunately missing cases where the effect was removed by the end of the day, e.g. by re-routing), the main effects seen are shown in Table 1.  In Table 1, the Loss before is the average loss before the outage, the Loss after is the loss after the outage started. The Sites affected is the number of sites monitored in the country that observed an effect, the total is the total number of sites monitored in the country.  It is interesting that in many cases not all hosts were affected. This may be due to use of different carriers. The impact of such losses can make many applications unusable.

Country

Loss before

Loss after

Sites affected / total

Sudan

< 4.5%

>15%

3/3

Bahrein

0%

>10%

2/2

UAE

<4.5%

>20%

1/1

Jordan

0%

>15%

4/4

Oman

0%

>15%

1/1

Qatar

0%

>4.5%

1/1

Saudi Arabia

0%

>4.5%

2/3

India

0%

>50%

2/8

Table 1: Hourly Ping Losses

Since the effect may have been transitory while the data was re-routed we looked for increases in losses on January 30th.  The effect was seen in about 15 countries of the over 150 countries monitored by PingER. This is shown in Table 2 below.

Country

Loss before

Loss after

Sites affected / total

Egypt

<1%

>7.5%

3/3

Sudan

<5%

>30%

3/3

Hong Kong

<0.75%

>11%

1/1

UAE

<4%

>18%

1/1

Bahrein

<1.5%

>7%

2/2

Jordan

<3%

>7%

3/4

Oman

<8%

>13%

1/1

Saudi Arabia

<1.2%

>7%

2/3

Syria

<3%

>7%

1/1

Indonesia

< 2%

>8

1/7

Thailand

<0.2%

>8%

1/6

Bangladesh

<5%

> 7%

2/2

India

<3%

> 40%

2/8

Sri Lanka

<3%

>6%

2/5

Maldives <1%

>12%

1/3

Table 2: Daily Ping Losses

The differences between the daily and hourly tables may reflect the sites/countries abilities to switch to alternate routes.

An example of the effect is seen below for a host in Bangalore, India. It can be seen that the Round Trip Time (RTT) went from about 360ms to about 500ms and the losses increased dramatically from close to 0 to 40-60%. Closer inspection shows that for this connection the event occurred between 6:47 and 7:16 am January 30th, 2008 GMT. It appears that by the end of February 1, 2008, there were still significant losses for this site. It is also seen that though the RTTs were restored to normal on January 31st, the losses were still significant through February 3rd 2008.

The Figure below for a host in the United Arab Emirates shows the impact of the cut and the partial recovery 15-20 hours later as the traffic was presumably re-routed with full recovery after about 3 days.

Looking further into data by PingER we can estimate the approximate start time for this event. The exact time is a bit harder to explain but it is generally between 5 AM and 7 AM GMT on  January 30th 2008  for most of the countries in table 1 with the only exception of Bahrain where it all started at 9AM.  The differences might be  due to the measurement sampling rate or the lack of load to cause congestion until people came to work later.

PingER calculates the throughput of different nodes from the Round Trip Time (RTT) and loss using the Mathis formula. It is interesting to see the effect on throughput of the regions suffered from fibre outage. Below is the table which gives us  insight on the throughput of countries affected by this outage. 

 Countries |  Throughput before (kbits/s)
|  Throughput after (kbits/s)
|  Sites affected / total
|

UAE

1200

21

1/1

Bahrain

800

23

2/2

Jordan

500

30

3/4

Oman

125

18

1/1

Saudi Arabia

800

30

2/3

Bangladesh

400

35

2/2

India

800

38

2/8

The above table shows the average of throughput before and after the start of the fibre outage. The results clearly show order of magnitude reductions in throughput.

Another view of the changes is seen in the map below showing the throughputs before (Jan-Sep 2007) and after (Jan 30 2008). Bear in mind that the fact the outage did not start until around 6:00am, and re-routing traffic before the end of the day will both  dilute the effect. Also the effects were not uniform on all hosts in a country.

It is interesting that some countries such as Pakistan were mainly unaffected, despite the impact on neighboring countries such as India. This contrasts dramatically to the situation in June - July 2005, when due to a fibre cut of SEAMEWE3 off Karachi, Pakistan lost all terrestrial Internet connectivity which resulted, in many cases, in a complete 12 day outage of services. This is a tribute to the increased redundancy of international fibre connectivity installed for Pakistan in the last few years.

One can also see the effect by looking at the RTTs for affected hosts as a function of time. This is seen below where we plot the daily RTT seen from SLAC versus the date and the Top Level Domain (TLD) of the country the host is located in. It is seen taht there is a big spike going from around 300 ms to over 1000 ms for January 30th, followed by recovery in several cases. The main countries affected are the United Arab Emirates (AE), Bahrain (BH), Bangladesh (BD), Bhutan (BT), Djibouti (DJ), India (IN), Indonesia (ID), Jordan (JO), Sri Lanka (LK), the Maldives (MV), Oman (OM), Qatar(QA), Palestine(PS), Saudia Arabia (SA), Sudan (SD), and Thailand (TH). This is less than 5% of the total hosts monitored worldwide.

To better illustrate the recovery for the 30 hosts in 16 countries, in the Figure below we show the median RTT and Loss seen from SLAC to these hosts for the last week in January around this time. The sharp increase in RTT & Loss on January 30th is immediately obvious. It partially recovers on January 31st to about 550 ms and continues to improve for the next few days. The error bars show the variability of the results (as the Inter Quartile Range), which is seen to increase dramatically after January 29th.

To illustrate that the effects are also seen from elsewhere than SLAC in California, in the Figure below we show time series of the average RTT seen from the International Center for Theoretical Physics (ICTP) in Trieste Italy to a few of the hosts in some of the affected countries. It is not clear that the Pakistani host's increase in RTT was an effect of the outage or just normal variation (see Jan-18 and 19 for this host), it is therefore not included in the median shown. It also appears that Sudan recovered after 3 days while India was still recovering after  6 days, and there is no evidence that the Bangladesh host was recovering at all. The close tracking of the curves for both Indian hosts suggests they were using similar providers and routing.

It also needs to be stressed that not all hosts in all countries were impacted, e.g. India: 2 of 8; Sri Lanka 3 of 5; Malvinas 3 of 5; Indonesia 1 of 7; Turkey: 1 of 3; Thailand 1 of 6. On the other hand all monitored hosts were impacted in: UAE(1), Bangladesh(2), Bahrain(2), Djibouti(1), Jordan(4), Oman(1), Qatar(1), Saudi Arabia(2). The numbers in parentheses are the number of hosts monitored.   The list of countries in this region that have PingER monitored hosts, but none of the monitored hosts were noticeably impacted is also interesting. It includes: Egypt, Ethiopia, Iran, Israel, Lebanon, Pakistan, Palestine, Sudan, Central Asia, the Caucasus and East Asia. Some of these were lucky that they did not make major use of either of the two cables (e.g. Palestine is mainly served by SEAMEWE3), others had totally different routes from SLAC (e.g. via the Pacific for East Asia), others such as Israel had other alternate paths, other used satellite (e.g. Central Asia). We also did not observe complete loss of connectivity, associated with the outage, for any host.

Looking in more detail at the impact hour by hour on January 30th we get the map in the  Figure below. It shows the hourly average RTTs (z axis) by time of day (y axis) on January the 30th from SLAC to 30 impacted hosts (x axis) in 17 countries identified by their Top Level Domain (TLD). In this graph the vertical (Average RTT) is chopped off at 2 seconds, though some hosts took up to almost 10 seconds to respond at times. It can be seen that the impact (sudden increase in RTT) is very abrupt. The time of the impact varies by 2-3 hours (between 4am and 7am). Most hosts continued to respond apart from 3 in Sudan and 1 in Bahrain, each of which did not respond for up to an hour. The magnitude of the impact also varies by more than an order of magnitude from country to country. By looking at the data for other days in particular Jan 29th (see the second map below), we verified that the sudden increase was not caused by the normal diurnal variations of people coming to and leaving work etc.

We also looked at the losses as a function of time, see the Figure below. The numbers in each cell are the rough losses (rounded off to ahat not all sites in a country were equally affected. No measurement data was . For Bahrain one of two sites were affected, India 2 out of 4, Jordan 3 out of 4, Sri Lanka and the Maldives none were affected. Sites in some countries such as Jordan and Bangladesh took more than 5 days to recover, for India 2 sites took 4 days to recover.