From owner-freebsd-stable@FreeBSD.ORG Sat Sep 13 00:52:45 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 06B64D82; Sat, 13 Sep 2014 00:52:45 +0000 (UTC) Received: from smarthost1.sentex.ca (smarthost1.sentex.ca [IPv6:2607:f3e0:0:1::12]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "smarthost.sentex.ca", Issuer "smarthost.sentex.ca" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id D1F951AA; Sat, 13 Sep 2014 00:52:44 +0000 (UTC) Received: from [IPv6:2607:f3e0:0:4:f025:8813:7603:7e4a] (saphire3.sentex.ca [IPv6:2607:f3e0:0:4:f025:8813:7603:7e4a]) by smarthost1.sentex.ca (8.14.9/8.14.9) with ESMTP id s8D0qeud089983; Fri, 12 Sep 2014 20:52:40 -0400 (EDT) (envelope-from mike@sentex.net) Message-ID: <54139566.7050202@sentex.net> Date: Fri, 12 Sep 2014 20:52:54 -0400 From: Mike Tancsa Organization: Sentex Communications User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1 MIME-Version: 1.0 To: Rick Macklem , Glen Barber Subject: Re: svn commit: r267935 - head/sys/dev/e1000 (with work around?) References: <1109209778.35732953.1410564825048.JavaMail.root@uoguelph.ca> In-Reply-To: <1109209778.35732953.1410564825048.JavaMail.root@uoguelph.ca> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.74 Cc: freebsd-stable , Jack Vogel X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Sep 2014 00:52:45 -0000 On 9/12/2014 7:33 PM, Rick Macklem wrote: > I wrote: >> The patches are in 10.1. I thought his report said 10.0 in the message. >> >> If Mike is running a recent stable/10 or releng/10.1, then it has been >> patched for this and NFS should work with TSO enabled. If it doesn't, >> then something else is broken. > Oops, I looked and I see Mike was testing r270560 (which would have both > the patches). I don't have an explanation why TSO and 64K rsize, wsize > would cause a hang, but does appear it will exist in 10.1 unless it > gets resolved. > > Mike, one difference is that, even with the patches the driver will be > copying the transmit mbuf list via m_defrag() to 32 MCLBYTE clusters > when using 64K rsize, wsize. > If you can reproduce the hang, you might want to look at how many mbuf > clusters are allocated. If you've hit the limit, then I think that > would explain it. I have been running the test for a few hrs now and no lockups of the nic, so doing the nfs mount with -orsize=32768,wsize=32768 certainly seems to work around the lockup. How do I check the mbuf clusters ? root@backup3:/usr/home/mdtancsa # vmstat -z | grep -i clu mbuf_cluster: 2048, 760054, 4444, 370, 3088708, 0, 0 root@backup3:/usr/home/mdtancsa # root@backup3:/usr/home/mdtancsa # netstat -m 3322/4028/7350 mbufs in use (current/cache/total) 2826/1988/4814/760054 mbuf clusters in use (current/cache/total/max) 2430/1618 mbuf+clusters out of packet secondary zone in use (current/cache) 0/4/4/380026 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/112600 9k jumbo clusters in use (current/cache/total/max) 0/0/0/63337 16k jumbo clusters in use (current/cache/total/max) 6482K/4999K/11481K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile root@backup3:/usr/home/mdtancsa # Interface is RUNNING and ACTIVE em1: hw tdh = 343, hw tdt = 838 em1: hw rdh = 512, hw rdt = 511 em1: Tx Queue Status = 1 em1: TX descriptors avail = 516 em1: Tx Descriptors avail failure = 1 em1: RX discarded packets = 0 em1: RX Next to Check = 512 em1: RX Next to Refresh = 511 I just tested on the other em nic and I can wedge it as well, so its not limited to one particular type of em nic. em0: Watchdog timeout -- resetting em0: Queue(0) tdh = 349, hw tdt = 176 em0: TX(0) desc avail = 173,Next TX to Clean = 349 em0: link state changed to DOWN em0: link state changed to UP so it does not seem limited to just certain em nics em0@pci0:0:25:0: class=0x020000 card=0x34ec8086 chip=0x10ef8086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = '82578DM Gigabit Network Connection' class = network subclass = ethernet bar [10] = type Memory, range 32, base 0xb1a00000, size 131072, enabled bar [14] = type Memory, range 32, base 0xb1a25000, size 4096, enabled bar [18] = type I/O Port, range 32, base 0x2040, size 32, enabled cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 13[e0] = PCI Advanced Features: FLR TP I can lock things up fairly quickly by running these 2 scripts across an nfs mount. #!/bin/sh while true do dd if=/dev/urandom ibs=64k count=1000 | pbzip2 -c -p3 > /mnt/test.bz2 dd if=/dev/urandom ibs=63k count=1000 | pbzip2 -c -p3 > /mnt/test.bz2 dd if=/dev/urandom ibs=66k count=1000 | pbzip2 -c -p3 > /mnt/test.bz2 done root@backup3:/usr/home/mdtancsa # cat i3 #!/bin/sh while true do dd if=/dev/zero of=/mnt/test2 bs=128k count=2000 sleep 10 done ---Mike -- ------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/