From owner-freebsd-fs@FreeBSD.ORG Fri Jun 11 03:18:12 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D3A1C1065679 for ; Fri, 11 Jun 2010 03:18:12 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta09.westchester.pa.mail.comcast.net (qmta09.westchester.pa.mail.comcast.net [76.96.62.96]) by mx1.freebsd.org (Postfix) with ESMTP id 7521A8FC17 for ; Fri, 11 Jun 2010 03:18:11 +0000 (UTC) Received: from omta14.westchester.pa.mail.comcast.net ([76.96.62.60]) by qmta09.westchester.pa.mail.comcast.net with comcast id USbJ1e0031HzFnQ59TJCw6; Fri, 11 Jun 2010 03:18:12 +0000 Received: from koitsu.dyndns.org ([98.248.46.159]) by omta14.westchester.pa.mail.comcast.net with comcast id UTJA1e00F3S48mS3aTJBzs; Fri, 11 Jun 2010 03:18:12 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id AEAB19B423; Thu, 10 Jun 2010 20:18:09 -0700 (PDT) Date: Thu, 10 Jun 2010 20:18:09 -0700 From: Jeremy Chadwick To: Rick Macklem Message-ID: <20100611031809.GA93666@icarus.home.lan> References: <20100608083649.GA77452@fupp.net> <20100609122517.GA16231@fupp.net> <20100610081710.GA64350@server.vk2pj.dyndns.org> <20100610110609.GA87243@fupp.net> <20100610114831.GB71432@icarus.home.lan> <20100610130307.GA33285@fupp.net> <20100610133859.GA74094@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-fs@FreeBSD.org, Peter Jeremy , Anders Nordby , PYUN Yong-Hyeon Subject: Re: Odd network issues on ZFS based NFS server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jun 2010 03:18:13 -0000 On Thu, Jun 10, 2010 at 07:48:49PM -0400, Rick Macklem wrote: > On Thu, 10 Jun 2010, Jeremy Chadwick wrote: > >The interrupt rate for bge1 (irq26) is very high during the problem, > >while otherwise is only ~6/sec. Shot in the dark, but this is probably > >the cause of the packet loss you see. Oddly, your uhci2 interface (used > >for USB) is also firing at a very high rate. I don't know if this is > >the sign of a NIC problem, driver problem, or interrupt (think APIC?) > >routing problem. > > > >Debugging this is beyond my capability, but folks like John Baldwin may > >have some ideas on where to go from here. > > > >Also, have you used "netstat -ibn -I bge1" (to look at byte counters) or > >"tcpdump -l -n -s 0 -i bge1" to watch network traffic live when this is > >happening? The reason I ask is to determine if there's any chance this > >box starts seeing problems due to DoS attacks or excessive LAN traffic > >which is unexpected. Basically, be sure that all the network I/O going > >on across bge1 is expected. > > > Yes, I think Jeremy is on the right track. I'd second the recommendation > to look at traffic when it is happening. I might choose: > tcpdump -s 0 -w -i bge1 > and then load "" into wireshark, since wireshark is much better at > making sense of NFS traffic. (Since the nfsd is at the top of the process > list, it hints that there may be heavy nfs traffic being received by > bge1.) > > If you do this tcpdump for a short period of time and then email "" > to me as an attachment, I can take a look at it. (If the traffic isn't > NFS, then there's not much point in doing this.) We might have a case > where a client is retrying the same RPC (or RPC sequence) over and over > and over again, my friend (sorry I couldn't resist:-). > > Given that you stated FreeBSD8.1-Prerelease I think you should have the > patch, but please make sure that your sys/nfsserver/nfs_srvsubs.c is > at least r206406. > > Let me know how it goes, rick Also for Anders -- With regards to possible bge(4) issues, Yong-Hyeon works on this driver fairly often. If it turns out to be a driver issue of some sort, he can probably help. Relevant commits are here (to give you some idea of activity): http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/bge/if_bge.c One commit caught my eye (rev 1.226.2.15), but that seems to be more focused on mbuf issues (your system doesn't appear to be having any, given your netstat -m output). CC'ing Yong-Hyeong, as he might know of some edge case where bge(4) could go crazy with interrupts. :-) Yong-Hyeon, the entire thread is here: http://lists.freebsd.org/pipermail/freebsd-fs/2010-June/008654.html -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |