Overview
The simple way to get at the data is not to use the archive (since the data needs cleansing) but to use the form https://www.slac.stanford.edu/cgi-bin/pingtable.pl . This enables you to select the metric, packet size, source (Measurement Agent - MA), the target group, the time aggregation, etc. There is also a filter which can be used for removing bad data from the monthly and hence yearly data. Bad data is typically from a host that at first looks to be in a certain region (e.g. China) but the RTT as seen from SLAC is impossibly small.
...
An example oif the output is below:
allyearly,?,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010
EDU.SLAC.STANFORD.N3-to-Afghanistan,.,.,.,.,.,.,.,.,.,.,12.504,4.481,9.257,5.284
EDU.SLAC.STANFORD.N3-to-Albania,.,.,.,.,.,.,.,.,.,.,.,4.949,7.133,4.590
later it is atomically copied to files in the permanent directory
How to retrieve data from the SLAC anonymous FTP archive:
Should you decide to go ahead I am happy to help/hold your hand and improve the documentation.
Data Flow
The mechanism of gathering, archiving the data, cronjobs etc is described in https://confluence.slac.stanford.edu/display/IEPM/PingER+data+flow+at+SLAC . this also describes the raw and analyzed data.
...