Date: Thu, 10 Jun 2010 20:18:09 -0700 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: freebsd-fs@FreeBSD.org, Peter Jeremy <peter@vk2pj.dyndns.org>, Anders Nordby <anders@FreeBSD.org>, PYUN Yong-Hyeon <pyunyh@gmail.com> Subject: Re: Odd network issues on ZFS based NFS server Message-ID: <20100611031809.GA93666@icarus.home.lan> In-Reply-To: <Pine.GSO.4.63.1006101936100.6000@muncher.cs.uoguelph.ca> References: <20100608083649.GA77452@fupp.net> <Pine.GSO.4.63.1006081946040.8742@muncher.cs.uoguelph.ca> <20100609122517.GA16231@fupp.net> <20100610081710.GA64350@server.vk2pj.dyndns.org> <20100610110609.GA87243@fupp.net> <20100610114831.GB71432@icarus.home.lan> <20100610130307.GA33285@fupp.net> <20100610133859.GA74094@icarus.home.lan> <Pine.GSO.4.63.1006101936100.6000@muncher.cs.uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jun 10, 2010 at 07:48:49PM -0400, Rick Macklem wrote: > On Thu, 10 Jun 2010, Jeremy Chadwick wrote: > >The interrupt rate for bge1 (irq26) is very high during the problem, > >while otherwise is only ~6/sec. Shot in the dark, but this is probably > >the cause of the packet loss you see. Oddly, your uhci2 interface (used > >for USB) is also firing at a very high rate. I don't know if this is > >the sign of a NIC problem, driver problem, or interrupt (think APIC?) > >routing problem. > > > >Debugging this is beyond my capability, but folks like John Baldwin may > >have some ideas on where to go from here. > > > >Also, have you used "netstat -ibn -I bge1" (to look at byte counters) or > >"tcpdump -l -n -s 0 -i bge1" to watch network traffic live when this is > >happening? The reason I ask is to determine if there's any chance this > >box starts seeing problems due to DoS attacks or excessive LAN traffic > >which is unexpected. Basically, be sure that all the network I/O going > >on across bge1 is expected. > > > Yes, I think Jeremy is on the right track. I'd second the recommendation > to look at traffic when it is happening. I might choose: > tcpdump -s 0 -w <file> -i bge1 > and then load "<file>" into wireshark, since wireshark is much better at > making sense of NFS traffic. (Since the nfsd is at the top of the process > list, it hints that there may be heavy nfs traffic being received by > bge1.) > > If you do this tcpdump for a short period of time and then email "<file>" > to me as an attachment, I can take a look at it. (If the traffic isn't > NFS, then there's not much point in doing this.) We might have a case > where a client is retrying the same RPC (or RPC sequence) over and over > and over again, my friend (sorry I couldn't resist:-). > > Given that you stated FreeBSD8.1-Prerelease I think you should have the > patch, but please make sure that your sys/nfsserver/nfs_srvsubs.c is > at least r206406. > > Let me know how it goes, rick Also for Anders -- With regards to possible bge(4) issues, Yong-Hyeon works on this driver fairly often. If it turns out to be a driver issue of some sort, he can probably help. Relevant commits are here (to give you some idea of activity): http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/bge/if_bge.c One commit caught my eye (rev 1.226.2.15), but that seems to be more focused on mbuf issues (your system doesn't appear to be having any, given your netstat -m output). CC'ing Yong-Hyeong, as he might know of some edge case where bge(4) could go crazy with interrupts. :-) Yong-Hyeon, the entire thread is here: http://lists.freebsd.org/pipermail/freebsd-fs/2010-June/008654.html -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100611031809.GA93666>