From owner-freebsd-stable@FreeBSD.ORG  Sat Sep 13 12:08:37 2014
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id ABCF4C09;
 Sat, 13 Sep 2014 12:08:37 +0000 (UTC)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 5FB53AB;
 Sat, 13 Sep 2014 12:08:35 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Aq8EAG4zFFSDaFve/2dsb2JhbABcA4NgVwSCeMYZCoZ6VAGBH3iEAwEBAQMBAQEBICYFGAgLBRYHEQICDRkCKQEJJgYIBwQBHASIFQgNpkiVSgEXgSyNSQEGAQEIEyQQBxGCJkESgUEFhh+EKYs8hAGEYpNggWcegXUhLwd/AQgXIoECAQEB
X-IronPort-AV: E=Sophos;i="5.04,517,1406606400"; d="scan'208";a="153841900"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 13 Sep 2014 08:08:28 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 17E48B4044;
 Sat, 13 Sep 2014 08:08:28 -0400 (EDT)
Date: Sat, 13 Sep 2014 08:08:28 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Mike Tancsa <mike@sentex.net>
Message-ID: <1472615261.35797393.1410610108085.JavaMail.root@uoguelph.ca>
In-Reply-To: <54139566.7050202@sentex.net>
Subject: Re: svn commit: r267935 - head/sys/dev/e1000 (with work around?)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926)
Cc: Glen Barber <gjb@freebsd.org>, freebsd-stable <freebsd-stable@freebsd.org>,
 Jack Vogel <jfvogel@gmail.com>
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 Sep 2014 12:08:37 -0000

Mike Tancsa wrote:
> On 9/12/2014 7:33 PM, Rick Macklem wrote:
> > I wrote:
> >> The patches are in 10.1. I thought his report said 10.0 in the
> >> message.
> >>
> >> If Mike is running a recent stable/10 or releng/10.1, then it has
> >> been
> >> patched for this and NFS should work with TSO enabled. If it
> >> doesn't,
> >> then something else is broken.
> > Oops, I looked and I see Mike was testing r270560 (which would have
> > both
> > the patches). I don't have an explanation why TSO and 64K rsize,
> > wsize
> > would cause a hang, but does appear it will exist in 10.1 unless it
> > gets resolved.
> >
> > Mike, one difference is that, even with the patches the driver will
> > be
> > copying the transmit mbuf list via m_defrag() to 32 MCLBYTE
> > clusters
> > when using 64K rsize, wsize.
> > If you can reproduce the hang, you might want to look at how many
> > mbuf
> > clusters are allocated. If you've hit the limit, then I think that
> > would explain it.
> 
> 
> I have been running the test for a few hrs now and no lockups of the
> nic, so doing the nfs mount with -orsize=32768,wsize=32768 certainly
> seems to work around the lockup.   How do I check the mbuf clusters ?
> 
> root@backup3:/usr/home/mdtancsa # vmstat -z | grep -i clu
> mbuf_cluster:          2048, 760054,    4444,     370, 3088708,   0,
>   0
> root@backup3:/usr/home/mdtancsa #
> root@backup3:/usr/home/mdtancsa # netstat -m
> 3322/4028/7350 mbufs in use (current/cache/total)
> 2826/1988/4814/760054 mbuf clusters in use (current/cache/total/max)
This was all I was thinking of. It certainly doesn't look like a problem.
If the 64K rsize, wsize test is about the same, I'd say you aren't running
out of mbuf clusters.

> 2430/1618 mbuf+clusters out of packet secondary zone in use
> (current/cache)
> 0/4/4/380026 4k (page size) jumbo clusters in use
> (current/cache/total/max)
> 0/0/0/112600 9k jumbo clusters in use (current/cache/total/max)
> 0/0/0/63337 16k jumbo clusters in use (current/cache/total/max)
> 6482K/4999K/11481K bytes allocated to network (current/cache/total)
> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 0 requests for I/O initiated by sendfile
> root@backup3:/usr/home/mdtancsa #
> 
> Interface is RUNNING and ACTIVE
> em1: hw tdh = 343, hw tdt = 838
> em1: hw rdh = 512, hw rdt = 511
> em1: Tx Queue Status = 1
> em1: TX descriptors avail = 516
> em1: Tx Descriptors avail failure = 1
I don't know anything about the hardware, but this looks
suspicious to me?

Hopefully someone familiar with the hardware can help, rick

> em1: RX discarded packets = 0
> em1: RX Next to Check = 512
> em1: RX Next to Refresh = 511
> 
> 
> I just tested on the other em nic and I can wedge it as well, so its
> not
> limited to one particular type of em nic.
> 
> 
> em0: Watchdog timeout -- resetting
> em0: Queue(0) tdh = 349, hw tdt = 176
> em0: TX(0) desc avail = 173,Next TX to Clean = 349
> em0: link state changed to DOWN
> em0: link state changed to UP
> 
> so it does not seem limited to just certain em nics
> 
> em0@pci0:0:25:0:        class=0x020000 card=0x34ec8086
> chip=0x10ef8086
> rev=0x05 hdr=0x00
>      vendor     = 'Intel Corporation'
>      device     = '82578DM Gigabit Network Connection'
>      class      = network
>      subclass   = ethernet
>      bar   [10] = type Memory, range 32, base 0xb1a00000, size
>      131072,
> enabled
>      bar   [14] = type Memory, range 32, base 0xb1a25000, size 4096,
>      enabled
>      bar   [18] = type I/O Port, range 32, base 0x2040, size 32,
>      enabled
>      cap 01[c8] = powerspec 2  supports D0 D3  current D0
>      cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1
>      message
>      cap 13[e0] = PCI Advanced Features: FLR TP
> 
> 
> I can lock things up fairly quickly by running these 2 scripts across
> an
> nfs mount.
> 
> #!/bin/sh
> 
> while true
> do
>   dd if=/dev/urandom ibs=64k count=1000 | pbzip2 -c -p3 >
>   /mnt/test.bz2
>   dd if=/dev/urandom ibs=63k count=1000 | pbzip2 -c -p3 >
>   /mnt/test.bz2
>   dd if=/dev/urandom ibs=66k count=1000 | pbzip2 -c -p3 >
>   /mnt/test.bz2
> done
> root@backup3:/usr/home/mdtancsa # cat i3
> #!/bin/sh
> 
> while true
> do
> dd if=/dev/zero of=/mnt/test2 bs=128k count=2000
> sleep 10
> done
> 
> 
> 	---Mike
> 
> 
> 
> 
> --
> -------------------
> Mike Tancsa, tel +1 519 651 3400
> Sentex Communications, mike@sentex.net
> Providing Internet services since 1994 www.sentex.net
> Cambridge, Ontario Canada   http://www.tancsa.com/
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
> "freebsd-stable-unsubscribe@freebsd.org"
>