You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Introduction

In response to a reported performance problem between SSRL and ANL reported by Brian Kobilka's group, Phil Reese of Stanford campus looked at the perfSONAR Bwctl (iperf) plots shown below.

Stanford to ESnet at Sunnyvale

Denver/ESnet to Stanford

Chicago/Esnet to Stanford

ANL/ESnet to Stanford





It is seen that as the distance from source to Destination increases, the asymmetry between the two directions also increases. This could be due to loss in one of the paths (the one to Stanford). Any loss can dramatically degrade performance.  See for example: 

The problem was reported to trouble@es.net. It was entered as trouble ticket ESNET-20130528-009

Seen from ANL

From Phil Reese:

Poking at the other end, I found another perfsonar at ANL collecting throughput traffic back toward the west cost. See anlborder-ps.it.anl.gov and look at the throughput graphs available there.

From those, it seems like there is symmetry to Kansas at least but then it degrades as the route moves further west.

From Stanford, we are primarily an I2 site but it looks like once we get into the CENIC world, there is a blending of routes between I2 and ESnet. Not sure if that is a problem or not.

More information from ESnet (Michael, Sinatra).

I have been looking at various perfsonar nodes in an effort to track down the issues that SSRL is experiencing with throughput to ANL.

You're correct to note that the routing between Stanford Campus and ANL is asymmetric. CENIC prefers to hand traffic bound for ANL off to ESnet at the 100G peering at Sunnyvale, while ANL prefers the path through MREN directly to Internet2. In other words, the ANL-->Stanford Campus path never touches ESnet. I can also see from the pS toolkit web interface on the ANL that there are similar issues between anlboder-ps and CENIC pS machines. This suggests to me that there is a more general issue between anlborder-ps and the rest of the world (lack of queue depth on the immediate upstream switch or router is one possibility).
That same issue could be affecting the ANL-->SSRL throughput.

It's a bit harder to see things from the SLAC end. Throughput tests between ESnet's pS boxes at the SLAC ESnet router (slac-pt1) and at the ANL ESnet router (anl-pt1) look really good. You can see the overall picture here:

https://my.es.net/network/performance/bwctl

Things degrade somewhat when we take one step inside the ANL border. I see significantly worse performance (but not horrible on an absolute
scale) between slac-pt1.es.net and anlborder-ps.it.anl.gov than with slac-pt1.es.net and anl-pt1.es.net. I also see better performance in the direction toward anlborder-ps than in the opposite direction, but I see really good performance in BOTH directions between slac-pt1 and anl-pt1.

It's harder to see things on the SLAC side because the only perfsonar host is on the Stanford campus and the outbound routing is different between campus and SLAC. Also, I am noticing the same outbound issues between the Stanford pt host and CENIC pS hosts that we are seeing between Stanford and ESnet hosts. It looks like there may be an outbound issue with the Stanford pS host as well.

So I think we need to take two steps here: One is to try to figure out why there seems to be some outbound throughput issue at ANL (at least with their perfsonar box); the other is to get a perfsonar box (even a temporary toolkit box that we can test with) deployed within SLAC, as close to SSRL as possible. That will give us a chance to test different parts of the (almost) end-to-end path.

  • No labels