From owner-freebsd-current@FreeBSD.ORG Tue Nov 3 15:13:55 2009 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3DBE7106566C for ; Tue, 3 Nov 2009 15:13:55 +0000 (UTC) (envelope-from gavin@FreeBSD.org) Received: from mail-gw1.york.ac.uk (mail-gw1.york.ac.uk [144.32.128.246]) by mx1.freebsd.org (Postfix) with ESMTP id E23678FC13 for ; Tue, 3 Nov 2009 15:13:54 +0000 (UTC) Received: from mail-gw6.york.ac.uk (mail-gw6.york.ac.uk [144.32.129.26]) by mail-gw1.york.ac.uk (8.13.6/8.13.6) with ESMTP id nA3FDf7V029586; Tue, 3 Nov 2009 15:13:41 GMT Received: from buffy-128.york.ac.uk ([144.32.128.160] helo=buffy.york.ac.uk) by mail-gw6.york.ac.uk with esmtps (TLSv1:AES256-SHA:256) (Exim 4.68) (envelope-from ) id 1N5L4m-0005Vo-RE; Tue, 03 Nov 2009 15:13:40 +0000 Received: from buffy.york.ac.uk (localhost [127.0.0.1]) by buffy.york.ac.uk (8.14.3/8.14.3) with ESMTP id nA3FDZZt002132; Tue, 3 Nov 2009 15:13:35 GMT (envelope-from gavin@FreeBSD.org) Received: (from ga9@localhost) by buffy.york.ac.uk (8.14.3/8.14.3/Submit) id nA3FDYiA002131; Tue, 3 Nov 2009 15:13:34 GMT (envelope-from gavin@FreeBSD.org) X-Authentication-Warning: buffy.york.ac.uk: ga9 set sender to gavin@FreeBSD.org using -f From: Gavin Atkinson To: Weldon S Godfrey 3 In-Reply-To: References: <1257185816.44755.29.camel@buffy.york.ac.uk> Content-Type: text/plain Content-Transfer-Encoding: 7bit Date: Tue, 03 Nov 2009 15:13:34 +0000 Message-Id: <1257261214.98619.92.camel@buffy.york.ac.uk> Mime-Version: 1.0 X-Mailer: Evolution 2.26.3 FreeBSD GNOME Team Port X-York-MailScanner: Found to be clean X-York-MailScanner-From: gavin@freebsd.org Cc: freebsd-current@FreeBSD.org Subject: Re: FreeBSD 8.0 - network stack crashes? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Nov 2009 15:13:55 -0000 On Tue, 2009-11-03 at 08:32 -0500, Weldon S Godfrey 3 wrote: > > If memory serves me right, sometime around Yesterday, Gavin Atkinson told me: > > Gavin, thank you A LOT for helping us with this, I have answered as much > as I can from the most recent crash below. We did hit max mbufs. It is > at 25Kclusters, which is the default. I have upped it to 32K because a > rather old article mentioned that as the top end and I need to get into > work so I am not trying to do this with a remote console to go higher. I > have already set it to reboot next with 64K clusters. I already have kmem > maxed to what is bootable (or at least at one time) in 8.0, 4GB, how high > can I safely go? This is a NFS server running ZFS with sustained 5 min > averages of 120-200Mb/s running as a store for a mail system. > > > Some things that would be useful: > > > > - Does "arp -da" fix things? > > no, it hangs like ssh, route add, etc > > > - What's the output of "netstat -m" while the networking is broken? > Tue Nov 3 07:02:11 CST 2009 > 36971/2033/39004 mbufs in use (current/cache/total) > 24869/731/25600/25600 mbuf clusters in use (current/cache/total/max) > 24314/731 mbuf+clusters out of packet secondary zone in use > (current/cache) > 0/35/35/12800 4k (page size) jumbo clusters in use > (current/cache/total/max) > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) > 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) > 58980K/2110K/61091K bytes allocated to network (current/cache/total) > 0/201276/90662 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) > 0/0/0 sfbufs in use (current/peak/max) > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 0 requests for I/O initiated by sendfile > 0 calls to protocol drain routines OK, at least we've figured out what is going wrong then. As a workaround to get the machine to stay up longer, you should be able to set kern.ipc.nmbclusters=256000 in /boot/loader.conf -but hopefully we can resolve this soon. Firstly, what kernel was the above output from? And what network card are you using? In your initial post you mentioned testing both bce(4) and em(4) cards, be aware that em(4) had an issue that would cause exactly this issue, which was fixed with a commit on September 11th (r197093). Make sure your kernel is from after that date if you are using em(4). I guess it is also possible that bce(4) has the same issue, I'm not aware of any fixes to it recently. So, from here, I think the best thing would be to just use the em(4) NIC and an up-to-date kernel, and see if you can reproduce the issue. How important is this machine? If em(4) works, are you able to help debug the issues with the bce(4) driver? Thanks, Gavin