From owner-freebsd-current Fri Mar 28 03:09:42 1997 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.5/8.8.5) id DAA04503 for current-outgoing; Fri, 28 Mar 1997 03:09:42 -0800 (PST) Received: from oxmail4.ox.ac.uk (oxmail4.ox.ac.uk [163.1.2.33]) by freefall.freebsd.org (8.8.5/8.8.5) with SMTP id DAA04495 for ; Fri, 28 Mar 1997 03:09:37 -0800 (PST) Received: from njl2.materials.ox.ac.uk by oxmail4 with SMTP (PP); Fri, 28 Mar 1997 11:09:20 +0000 Received: by njl2.materials.ox.ac.uk (950413.SGI.8.6.12/940406.SGI) id LAA08977; Fri, 28 Mar 1997 11:09:17 GMT Date: Fri, 28 Mar 1997 11:09:17 GMT From: neil.long@materials.oxford.ac.uk (Neil J Long) Message-Id: <9703281109.ZM8975@njl2.materials.ox.ac.uk> In-Reply-To: Bill Fenner "Re: Possible routed problem 2.2" (Mar 22, 10:00am) References: <97Mar22.100042pst.177486@crevenia.parc.xerox.com> X-Mailer: Z-Mail-SGI (3.2S.3 08feb96 MediaMail) To: Bill Fenner , freebsd-current@freebsd.org Subject: 2.2.1 network and amanda dump problem (again) Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-current@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Repetition of previous problem which I thought was a routed bug. System is at 2.2.1 cvsup'd and a world and kernel built as of March 26th. I don't know how to debug this problem as remotely logging in changes the network related allocations and when it happens I cannot log in any more. This seems to be a problem during level 0 dumps where the quantity of data is higher and so takes longer. I will run netstat -m in a loop and log what I can but this looks like being obscure and probably limited to the amanda sendbackup program (which runs suid root). Original postings follow and then some more info after that. On Mar 22, 10:00am, Bill Fenner wrote: > Subject: Re: Possible routed problem 2.2 > >The machine was not contactable via the net but was still running when I > >went in to check. These are the errors logged in messages > > > >Mar 22 01:15:00 njl sendbackup[446]: error [dump returned 3, compress > >got signal > > 13] > >Mar 22 01:15:22 njl routed[58]: punt RTM_LOSING without gateway > >Mar 22 01:15:46 njl routed[58]: punt RTM_LOSING without gateway > > These mean that TCP to a machine on the local network was having > trouble. (Probably just another symptom of what you show below). It's > a bug in routed that it logs this fact at such a high level; the TCP > trouble has nothing to do with routed. > > >ping: sendto: No buffer space available > > Looks like your machine ran out of mbuf's. If you can recreate this, try > "netstat -m" and see if any category has a much higher number than the others. > > Bill >-- End of excerpt from Bill Fenner This happened again last night - the PC was idle when the amanda dumps started (/usr/ports/misc/amanda). Logging in on the console netstat -m shows 144 mbufs in use 86 mbufs allocated to data 49 mbufs allocated to packet headers 7 mbufs allocated to protocol control blocks 2 mbufs allocated to socket names and addresses 10/26 mbuf clusters in use 70 Kbytes allocated to network (54% in use) 0 requests for memery denied 0 requests for memory delayed 0 calls to protocol drain routines On the amanda server the dumps were recorded as FAILURE AND STRANGE DUMP SUMMARY: njl /usr lev 0 FAILED [data timeout] njl /home lev 0 FAILED [data timeout] njl /var lev 0 FAILED [could not connect to njl] njl / lev 1 FAILED [could not connect to njl] ... FAILED AND STRANGE DUMP DETAILS: /-- njl /usr lev 0 FAILED [data timeout] sendbackup: start [njl:/usr level 0 datestamp 19970328] | DUMP: Date of this level 0 dump: Fri Mar 28 00:56:23 1997 | DUMP: Date of last level 0 dump: the epoch | DUMP: Dumping /dev/rsd0s1e (/usr) to standard output | DUMP: mapping (Pass I) [regular files] | DUMP: mapping (Pass II) [directories] | DUMP: estimated 276523 tape blocks. | DUMP: dumping (Pass III) [directories] | DUMP: dumping (Pass IV) [regular files] | DUMP: 8.31% done, finished in 0:55 | DUMP: 15.83% done, finished in 0:53 | DUMP: 22.28% done, finished in 0:52 | DUMP: 28.84% done, finished in 0:49 | DUMP: 36.26% done, finished in 0:43 | DUMP: 44.74% done, finished in 0:37 | DUMP: 51.38% done, finished in 0:33 | DUMP: 57.95% done, finished in 0:29 \-------- /-- njl /home lev 0 FAILED [data timeout] sendbackup: start [njl:/home level 0 datestamp 19970328] | DUMP: Date of this level 0 dump: Fri Mar 28 00:56:12 1997 | DUMP: Date of last level 0 dump: the epoch | DUMP: Dumping /dev/rsd0s4e (/home) to standard output | DUMP: mapping (Pass I) [regular files] | DUMP: mapping (Pass II) [directories] | DUMP: estimated 272941 tape blocks. | DUMP: dumping (Pass III) [directories] | DUMP: dumping (Pass IV) [regular files] | DUMP: 5.83% done, finished in 1:20 | DUMP: 9.84% done, finished in 1:31 | DUMP: 17.51% done, finished in 1:10 | DUMP: 26.87% done, finished in 0:54 | DUMP: 34.17% done, finished in 0:48 | DUMP: 43.31% done, finished in 0:39 | DUMP: 52.10% done, finished in 0:32 | DUMP: 60.08% done, finished in 0:26 \-------- The PC has a 3Com 3C589 combo - ep0 routed reported the same error before sendbackup timed out. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Neil J Long, Department of Materials, University of Oxford * Parks Road, Oxford, OX1 3PH, UK * EMail: Neil.Long@materials.oxford.ac.uk * Tel: +44 (0)1865-273678 Fax: +44 (0)1865-273789