As reported by email from the PingER cronjob we ran out of space on /nfs/slac/g/net/pinger/ (example). This was confirmed as seen below, however there was space in the parent directory in pingerdata.unite.
207cottrell@pinger:~$df /nfs/slac/g/net/pinger Filesystem 1K-blocks Used Available Use% Mounted on netfs03:/u2/g.net.pinger 845881344 837422080 0 100% /nfs/slac/g/net/pinger 206cottrell@pinger:~$df /nfs/slac/g/net/ Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/VolGroup00-LogVol00 17001308 5586492 10551380 35% / ~$ls -l /nfs/slac/g/net/ total 1 drwxr-xr-x 2 root root 0 Dec 2 10:40 iepm-bw/ drwxrwsr-x 28 1049 iepm 1024 Jan 23 2014 pinger/ drwxr-xr-x 2 root root 0 Dec 2 10:40 pingerdata.unite/
Looking in more detail at:
$ls -l /nfs/slac/g/net/pinger/* > dir ls: cannot open directory /nfs/slac/g/net/pinger/lost+found: Permission denied Exit 2
Looking in dir, it is apparent that apart from the directories:
pingerlod/, pingerreports/, pingerdata/, pinger_mon_data/, pinger2/, tulip/
most of this data is old and from earlier projects. Rather than lose it we decided to move this other data from
/nfs/slac/g/net/pinger to /nfs/slac/g/net/pingerdata.unite/
hence preserving a copy if we run into problems.
We copied and deleted:
/u2/g.net.pinger/hep# /u2/g.net.pinger/hep--size /u2/g.net.pinger/iepm-bw /u2/g.net.pinger/shahryar /u2/g.net.pinger/slac-gateways /u2/g.net.pinger/traceroute This saved about 15GBytes then we deleted bandwidth-tests data iepm-bw.slac.stanford.edu monalisa nan-backup nettest2scratch node2.slac.stanford.edu pinger_mysql sc2002 sc2003 sc2004 223cottrell@pinger:~$df /nfs/slac/g/net/pinger Filesystem 1K-blocks Used Available Use% Mounted on netfs03:/u2/g.net.pinger 845881344 816465920 20956160 98% /nfs/slac/g/net/pinger i.e. we saved ~ 44Gbytes total
More saving 3/11/2016
Once again we are receiving:
/bin/mv: closing `/nfs/slac/g/net/pinger/pingerreports/hep/minimum_rtt/minimum_rtt-1000-by-site-2016-03.txt.gz': No space left on device [cottrell@pinger ~]$ df /nfs/slac/g/net/pinger/pingerreports/hep/ Filesystem 1K-blocks Used Available Use% Mounted on netfs03:/u2/g.net.pinger 845881344 837422080 1024 100% /nfs/slac/g/net/pinger [cottrell@pinger ~]$ df /nfs/slac/g/net/pinger Filesystem 1K-blocks Used Available Use% Mounted on netfs03:/u2/g.net.pinger 845881344 837422080 1024 100% /nfs/slac/g/net/pinger
It appears there are old files in
[cottrell@pinger ~]$ ls -l /nfs/slac/g/net/pinger/pingerreports/ total 19 drwxrwsr-x 15 cottrell iepm 512 Oct 24 2009 --by/ drwxrwsr-x 2 cottrell iepm 512 Dec 5 2009 --date/ drwxrwsr-x 20 iepm iepm 11264 Dec 4 17:21 hep/ drwxrwsr-x 2 pinger iepm 512 Oct 25 2009 hep#/ drwxrwsr-x 2 pinger iepm 512 Jul 7 2012 hep--size/ drwxr-sr-x 15 iepm iepm 512 Jun 14 2005 hep-rest/ drwxr-sr-x 3 cottrell iepm 512 May 18 2006 hepc/ drwxr-sr-x 15 cottrell iepm 512 May 17 2006 heps/ drwxr-sr-x 16 pinger iepm 512 Mar 8 2012 new/
We believe that all apart from hep/ are not needed. However before we delete we want to make a copy somewhere else just in case. So we need to mkdir man cpman cppingerreports/ and cp --by/, --date/, hhep–size, hep-rest, epc/, heps/, new/ from /nfs/slac/g/net/pinger/pingerreports/ to /nfs/slac/g/net/pingerdata.unite/pingerreports/. then we need to
Use cp -r -p -v to preserve the mode, ownership and timestamps, recursively copy directories and explain what is being done.
Since this will take a lot of time (day or so) you may want to try cp -r -p -v <from> <to> >! log& and use top and tail log to watch progress
Use rm -r -v to remove files that have been copied from /nfs/slac/g/net/pinger/pingerreports/new/
More saving 8/31/2016
We are receiving
Your "cron" job /afs/slac/package/pinger/tulip/vtrace0chk.pl produced the following output: can't close tmp file=/nfs/slac/g/net/pinger/tulip/cachetr/cache_tmp.txt: No space left on device at /afs/slac/package/pinger/tulip/vtrace0chk.pl line 168. TRSrun@pinger: Command exited with value 28
Looking at the space used, we see
311cottrell@pinger:~$df -h /nfs/slac/g/net/pinger Filesystem Size Used Avail Use% Mounted on netfs03:/u2/g.net.pinger 807G 799G 0 100% /nfs/slac/g/net/pinger
To find the space in each subdirectory of /nfs/slac/g/net/pinger, we use
307cottrell@pinger:/nfs/slac/g/net/pinger$du -sh * du: cannot read directory `lost+found': Permission denied 1.0K lost+found 2.7G pinger2 3.0K pinger_mon_data 607G pingerdata 29G pingerlod 53G pingerreports 63M tulip Exit 1
pingerlod is no longer required, when/if it is restored we will move it to a new place
We used the following to copy the files and watch progress
270cottrell@pinger:~$mkdir /nfs/slac/g/net/pingerdata.unite/pingerlod 274cottrell@pinger:~$cp -r -p -v /nfs/slac/g/net/pinger/pingerlod /nfs/slac/g/net/pingerdata.unite/pingerlod >! log& [1] 32068 278cottrell@pinger:~$tail log `/nfs/slac/g/net/pinger/pingerlod/Aduna_Data/openrdf-sesame/logs/main-2013-08-26.log' -> `/nfs/slac/g/net/pingerdata.unite/pingerlod/pingerlod/Aduna_Data/openrdf-sesame/logs/main-2013-08-26.log' 279cottrell@pinger:~$wc log 105 316 20480 300cottrell@pinger:~$ps -efl | grep 32068 0 S cottrell 4837 17630 0 80 0 - 1107 - 15:48 pts/3 00:00:00 grep 32068 0 D cottrell 32068 17630 1 80 0 - 1389 - 15:33 pts/3 00:00:14 cp -r -p -v /nfs/slac/g/net/pinger/pingerlod /nfs/slac/g/net/pingerdata.unite/pingerlod ...337 cottrell@pinger:~$du -sh /nfs/slac/g/net/pinger/pingerlod /nfs/slac/g/net/pingerdata.unite/pingerlod 29G /nfs/slac/g/net/pinger/pingerlod 29G /nfs/slac/g/net/pingerdata.unite/pingerlod
We then used rm -r -v to remove files that have been copied from /nfs/slac/g/net/pinger/pingerlod to remove the files however first we have to change the ownership from renan to pinger by submitting a ticket to unixadmin.
netfs03 # cd /u2/g.net.pinger/pingerlod netfs03 # find . -user renan -exec chown -h pinger {} \; pinger@pinger $ rm -r -v /nfs/slac/g/net/pinger/pingerlod/ removed `/nfs/slac/g/net/pinger/pingerlod/Aduna_Data/openrdf-sesame/logs/main-2013-08-14.log' removed `/nfs/slac/g/net/pinger/pingerlod/Aduna_Data/openrdf-sesame/logs/main-2013-08-15.log' ... pinger@pinger $ du -sh /nfs/slac/g/net/pinger/pingerlod/ du: cannot access `/nfs/slac/g/net/pinger/pingerlod/': No such file or directory
We also commented out all the pingerlod cronjobs for pinger@pinger.slac.stanford.
Now we have
338cottrell@pinger:~$df -h /nfs/slac/g/net/pinger Filesystem Size Used Avail Use% Mounted on netfs03:/u2/g.net.pinger 807G 770G 29G 97% /nfs/slac/g/net/pinger 339cottrell@pinger:~$du -sh /nfs/slac/g/net/pinger du: cannot read directory `/nfs/slac/g/net/pinger/lost+found': Permission denied 662G /nfs/slac/g/net/pinger Exit 1 341cottrell@pinger:/nfs/slac/g/net/pinger$du -sh * du: cannot read directory `lost+found': Permission denied 1.0K lost+found 2.7G pinger2 3.0K pinger_mon_data 607G pingerdata 53G pingerreports 63M tulip Exit 1
Note (see PingER data flow at SLAC) pingerdata holds the rawdata gathered from the MAs. pingerreports holds the analyzed data in particular the hourly data.
Next steps
The big elephant is pingerdata. It consists of
351cottrell@pinger:/nfs/slac/g/net/pinger/pingerdata$du -sh * 22M 1997 406M 1998 815M 1999 2.3G 2000 3.0G 2001 3.7G 2002 3.9G 2003 589G hep 2.0K new 3.9G oldftp
We have the following space in /nfs/slac/g/net/pingerdata.unite
54cottrell@pinger:~$df -h /nfs/slac/g/net/pingerdata.unite Filesystem Size Used Avail Use% Mounted on netfs03:/pingerdata.unite 610G 188G 417G 32% /nfs/slac/g/net/pingerdata.unite
Problem 10/19/2016
Getting message:
/afs/slac/package/pinger/tulip/vtracefromchk.pl produced the following output: Can't close list file: No space left on device at /afs/slac/package/pinger/tulip/vtracefromchk.pl line 150. AND /afs/slac/package/pinger/unite-monthly.pl produced the following output: 28: write error AND /afs/slac/package/pinger/tulip/vtrace0chk.pl produced the following output: can't close tmp file=/nfs/slac/g/net/pinger/tulip/cachetr/cache_tmp_26404.txt: No space left on device at /afs/slac/package/pinger/tulip/vtrace0chk.pl line 213.
323cottrell@pinger:~$cp -r -p -v /nfs/slac/g/net/pinger/pingerdata/oldftp /nfs/slac/g/net/pingerdata.unite/oldftp `/nfs/slac/g/net/pinger/pingerdata/oldftp' -> `/nfs/slac/g/net/pingerdata.unite/oldftp' `/nfs/slac/g/net/pinger/pingerdata/oldftp/ping' -> `/nfs/slac/g/net/pingerdata.unite/oldftp/ping' `/nfs/slac/g/net/pinger/pingerdata/oldftp/ping/2001' -> `/nfs/slac/g/net/pingerdata.unite/oldftp/ping/2001' `/nfs/slac/g/net/pinger/pingerdata/oldftp/ping/2001/data-2001-01.tar.gz' -> `/nfs/slac/g/net/pingerdata.unite/oldftp/ping/2001/data-2001-01.tar.gz' ... 326cottrell@pinger:~$ls -l /nfs/slac/g/net/pingerdata.unite/oldftp/*/* -rw-r--r-- 1 cottrell iepm 51 Jan 25 2005 /nfs/slac/g/net/pingerdata.unite/oldftp/ping/README -rwxr-xr-x 1 cottrell iepm 603 Jan 25 2005 /nfs/slac/g/net/pingerdata.unite/oldftp/ping/ral.pl* -rw-r--r-- 1 cottrell iepm 8127136 Jan 25 2005 /nfs/slac/g/net/pingerdata.unite/oldftp/ping/ral.txt /nfs/slac/g/net/pingerdata.unite/oldftp/ping/2001: total 3324088 -rw-r--r-- 1 cottrell iepm 211453443 Jan 25 2005 data-2001-01.tar.gz -rw-r--r-- 1 cottrell iepm 191452967 Jan 25 2005 data-2001-02.tar.gz ...
Via ServiceNow ticket INC0117383 I requested that the files in /nfs/slac/g/net/pinger/pingerdata have their ownership group changed to pinger iepm. I tried using sudo with chown but that does not work. Once this is done I can delete /nfs/slac/g/net/pinger/pingerdata/oldftp hence freeing up space for pinger.
pinger@pinger $ ls -l /nfs/slac/g/net/pinger/pingerdata/oldftp total 2 -rw-r--r-- 1 pinger iepm 0 Oct 20 13:11 junk drwxr-sr-x 4 pinger iepm 512 Jan 25 2005 ping/ drwxr-sr-x 5 pinger iepm 512 Jan 25 2005 traceroute/ pinger@pinger $ ls -l /nfs/slac/g/net/pingerdata.unite/oldftp total 2 -rw-r--r-- 1 cottrell iepm 0 Oct 20 13:11 junk drwxr-xr-x 4 cottrell iepm 512 Jan 25 2005 ping/ drwxr-xr-x 5 cottrell iepm 512 Jan 25 2005 traceroute/ pinger@pinger $ rm -r -v /nfs/slac/g/net/pinger/pingerdata/oldftp removed `df -0ping/2001/data-2001-01.tar.gz' removed `/nfs/slac/g/net/pinger/pingerdata/oldftp/ping/2001/data-2001-02.tar.gz' ...
The copies and deletes were repeated for:
351cottrell@pinger:/nfs/slac/g/net/pinger/pingerdata$du -sh * 22M 1997 406M 1998 815M 1999 2.3G 2000 3.0G 2001 3.7G 2002 3.9G 2003 e.g. 309cottrell@pinger:~$cp -r -p -v /nfs/slac/g/net/pinger/pingerdata/1997 /nfs/slac `/nfs/slac/g/net/pinger/pingerdata/1997' -> `/nfs/slac/g/net/pingerdata.unite/1997' `/nfs/slac/g/net/pinger/pingerdata/1997/ping-1997-05.txt.gz' -> `/nfs/slac/g/net/pingerdata.unite/1997/ping-1997-05.txt.gz' `/nfs/slac/g/net/pinger/pingerdata/1997/ping-1997-06.txt.gz' -> `/nfs/slac/g/net/pingerdata.unite/1997/ping-1997-06.txt.gz' `/nfs/slac/g/net/pinger/pingerdata/1997/ping-1997-07.txt.gz' -> `/nfs/slac/g/net/pingerdata.unite/1997/ping-1997-07.txt.gz' `/nfs/slac/g/net/pinger/pingerdata/1997/ping-1997-08.txt.gz' -> `/nfs/slac/g/net/pingerdata.unite/1997/ping-1997-08.txt.gz' `/nfs/slac/g/net/pinger/pingerdata/1997/ping-1997-09.txt.gz' -> `/nfs/slac/g/net/pingerdata.unite/1997/ping-1997-09.txt.gz' `/nfs/slac/g/net/pinger/pingerdata/1997/ping-1997-10.txt.gz' -> `/nfs/slac/g/net/pingerdata.unite/1997/ping-1997-10.txt.gz' `/nfs/slac/g/net/pinger/pingerdata/1997/ping-1997-11.txt.gz' -> `/nfs/slac/g/net/pingerdata.unite/1997/ping-1997-11.txt.gz' `/nfs/slac/g/net/pinger/pingerdata/1997/ping-1997-12.txt.gz' -> `/nfs/slac/g/net/pingerdata.unite/1997/ping-1997-12.txt.gz'
Since all this did not save much space and the next thing to copy hep/ requires 0.5TB I requested 1TB archive space via INC0117666 as /nfs/slac/staas/fs1/g/pinger