From owner-freebsd-fs@FreeBSD.ORG Thu Jun 10 13:39:01 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 056281065679 for ; Thu, 10 Jun 2010 13:39:01 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta04.emeryville.ca.mail.comcast.net (qmta04.emeryville.ca.mail.comcast.net [76.96.30.40]) by mx1.freebsd.org (Postfix) with ESMTP id DDC6F8FC20 for ; Thu, 10 Jun 2010 13:39:00 +0000 (UTC) Received: from omta19.emeryville.ca.mail.comcast.net ([76.96.30.76]) by qmta04.emeryville.ca.mail.comcast.net with comcast id UC9f1e0031eYJf8A4Df0yN; Thu, 10 Jun 2010 13:39:00 +0000 Received: from koitsu.dyndns.org ([98.248.46.159]) by omta19.emeryville.ca.mail.comcast.net with comcast id UDez1e0033S48mS01DezCg; Thu, 10 Jun 2010 13:39:00 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 18E089B418; Thu, 10 Jun 2010 06:38:59 -0700 (PDT) Date: Thu, 10 Jun 2010 06:38:59 -0700 From: Jeremy Chadwick To: Anders Nordby Message-ID: <20100610133859.GA74094@icarus.home.lan> References: <20100608083649.GA77452@fupp.net> <20100609122517.GA16231@fupp.net> <20100610081710.GA64350@server.vk2pj.dyndns.org> <20100610110609.GA87243@fupp.net> <20100610114831.GB71432@icarus.home.lan> <20100610130307.GA33285@fupp.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100610130307.GA33285@fupp.net> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-fs@FreeBSD.org, Peter Jeremy Subject: Re: Odd network issues on ZFS based NFS server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2010 13:39:01 -0000 On Thu, Jun 10, 2010 at 03:03:07PM +0200, Anders Nordby wrote: > On Thu, Jun 10, 2010 at 04:48:32AM -0700, Jeremy Chadwick wrote: > > Can you also provide "vmstat -i" output, both when the issue is > > happening and after the machine has been rebooted (but been up for 5-10 > > minutes)? Thanks. > > While having issues: > > root@unixfile:~# vmstat -i > interrupt total rate > irq1: atkbd0 6 0 > irq14: ata0 1 0 > irq18: uhci2 78164874 953 > irq19: uhci1 643047 7 > irq26: bge1 73830825 900 > irq51: ciss0 642774 7 > cpu0: timer 163861455 1998 > cpu1: timer 163853438 1998 > cpu3: timer 163906515 1999 > cpu2: timer 163906515 1999 > Total > > 5 minutes after a reboot: > > root@unixfile:~# vmstat -i > interrupt total rate > irq1: atkbd0 6 0 > irq14: ata0 1 0 > irq18: uhci2 5813 19 > irq19: uhci1 2503 8 > irq26: bge1 1997 6 > irq51: ciss0 2503 8 > cpu0: timer 592619 1995 > cpu1: timer 584601 1968 > cpu2: timer 584605 1968 > cpu3: timer 584606 1968 > Total 2359254 7943 The interrupt rate for bge1 (irq26) is very high during the problem, while otherwise is only ~6/sec. Shot in the dark, but this is probably the cause of the packet loss you see. Oddly, your uhci2 interface (used for USB) is also firing at a very high rate. I don't know if this is the sign of a NIC problem, driver problem, or interrupt (think APIC?) routing problem. Debugging this is beyond my capability, but folks like John Baldwin may have some ideas on where to go from here. Also, have you used "netstat -ibn -I bge1" (to look at byte counters) or "tcpdump -l -n -s 0 -i bge1" to watch network traffic live when this is happening? The reason I ask is to determine if there's any chance this box starts seeing problems due to DoS attacks or excessive LAN traffic which is unexpected. Basically, be sure that all the network I/O going on across bge1 is expected. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |