From owner-freebsd-fs@FreeBSD.ORG  Thu Jun 10 13:39:01 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 056281065679
	for <freebsd-fs@freebsd.org>; Thu, 10 Jun 2010 13:39:01 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta04.emeryville.ca.mail.comcast.net
	(qmta04.emeryville.ca.mail.comcast.net [76.96.30.40])
	by mx1.freebsd.org (Postfix) with ESMTP id DDC6F8FC20
	for <freebsd-fs@freebsd.org>; Thu, 10 Jun 2010 13:39:00 +0000 (UTC)
Received: from omta19.emeryville.ca.mail.comcast.net ([76.96.30.76])
	by qmta04.emeryville.ca.mail.comcast.net with comcast
	id UC9f1e0031eYJf8A4Df0yN; Thu, 10 Jun 2010 13:39:00 +0000
Received: from koitsu.dyndns.org ([98.248.46.159])
	by omta19.emeryville.ca.mail.comcast.net with comcast
	id UDez1e0033S48mS01DezCg; Thu, 10 Jun 2010 13:39:00 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 18E089B418; Thu, 10 Jun 2010 06:38:59 -0700 (PDT)
Date: Thu, 10 Jun 2010 06:38:59 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Anders Nordby <anders@FreeBSD.org>
Message-ID: <20100610133859.GA74094@icarus.home.lan>
References: <20100608083649.GA77452@fupp.net>
	<Pine.GSO.4.63.1006081946040.8742@muncher.cs.uoguelph.ca>
	<20100609122517.GA16231@fupp.net>
	<20100610081710.GA64350@server.vk2pj.dyndns.org>
	<20100610110609.GA87243@fupp.net>
	<20100610114831.GB71432@icarus.home.lan>
	<20100610130307.GA33285@fupp.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20100610130307.GA33285@fupp.net>
User-Agent: Mutt/1.5.20 (2009-06-14)
Cc: freebsd-fs@FreeBSD.org, Peter Jeremy <peter@vk2pj.dyndns.org>
Subject: Re: Odd network issues on ZFS based NFS server
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Jun 2010 13:39:01 -0000

On Thu, Jun 10, 2010 at 03:03:07PM +0200, Anders Nordby wrote:
> On Thu, Jun 10, 2010 at 04:48:32AM -0700, Jeremy Chadwick wrote:
> > Can you also provide "vmstat -i" output, both when the issue is
> > happening and after the machine has been rebooted (but been up for 5-10
> > minutes)?  Thanks.
> 
> While having issues:
> 
> root@unixfile:~# vmstat -i
> interrupt                          total       rate
> irq1: atkbd0                           6          0
> irq14: ata0                            1          0
> irq18: uhci2                    78164874        953
> irq19: uhci1                      643047          7
> irq26: bge1                     73830825        900
> irq51: ciss0                      642774          7
> cpu0: timer                    163861455       1998
> cpu1: timer                    163853438       1998
> cpu3: timer                    163906515       1999
> cpu2: timer                    163906515       1999
> Total      
> 
> 5 minutes after a reboot:
> 
> root@unixfile:~# vmstat -i
> interrupt                          total       rate
> irq1: atkbd0                           6          0
> irq14: ata0                            1          0
> irq18: uhci2                        5813         19
> irq19: uhci1                        2503          8
> irq26: bge1                         1997          6
> irq51: ciss0                        2503          8
> cpu0: timer                       592619       1995
> cpu1: timer                       584601       1968
> cpu2: timer                       584605       1968
> cpu3: timer                       584606       1968
> Total                            2359254       7943

The interrupt rate for bge1 (irq26) is very high during the problem,
while otherwise is only ~6/sec.  Shot in the dark, but this is probably
the cause of the packet loss you see.  Oddly, your uhci2 interface (used
for USB) is also firing at a very high rate.  I don't know if this is
the sign of a NIC problem, driver problem, or interrupt (think APIC?)
routing problem.

Debugging this is beyond my capability, but folks like John Baldwin may
have some ideas on where to go from here.

Also, have you used "netstat -ibn -I bge1" (to look at byte counters) or
"tcpdump -l -n -s 0 -i bge1" to watch network traffic live when this is
happening?  The reason I ask is to determine if there's any chance this
box starts seeing problems due to DoS attacks or excessive LAN traffic
which is unexpected.  Basically, be sure that all the network I/O going
on across bge1 is expected.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |