Date: Fri, 12 Sep 2014 17:00:27 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Mike Tancsa <mike@sentex.net> Cc: stable@freebsd.org, Jack Vogel <jfvogel@gmail.com> Subject: Re: svn commit: r267935 - head/sys/dev/e1000 (with work around?) Message-ID: <932734062.35704765.1410555627658.JavaMail.root@uoguelph.ca> In-Reply-To: <541319F5.1020502@sentex.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Mike Tansca wrote: > On 9/12/2014 10:09 AM, Mike Tancsa wrote: > > > > FYI, I just ran into this bug on another box, with an onboard em > > nic, so > > I dont think its a one off hardware issue. AMD64, FreeBSD > > 10.0-STABLE > > #4 r270560: > > This is on an Intel MB S1200BTL ( > > S1200BT.86B.02.00.0035.030220120927) > > > > Unfortunately, this is also a production box so its difficult to > > test. I > > am going to see if I can find a similar MB to test against. > > > I found another board I can test with. It takes a bit of random > traffic > to wedge, but I can lock up the NIC to the point where I have to down > and up it > > When the NIC is wedged, sending sysctl -w em.1.debug=1 shows > > > Sep 12 11:05:05 backup3 kernel: Interface is RUNNING and ACTIVE > Sep 12 11:05:05 backup3 kernel: em1: hw tdh = 414, hw tdt = 980 > Sep 12 11:05:05 backup3 kernel: em1: hw rdh = 768, hw rdt = 767 > Sep 12 11:05:05 backup3 kernel: em1: Tx Queue Status = 1 > Sep 12 11:05:05 backup3 kernel: em1: TX descriptors avail = 449 > Sep 12 11:05:05 backup3 kernel: em1: Tx Descriptors avail failure = 3 > Sep 12 11:05:05 backup3 kernel: em1: RX discarded packets = 0 > Sep 12 11:05:05 backup3 kernel: em1: RX Next to Check = 768 > Sep 12 11:05:05 backup3 kernel: em1: RX Next to Refresh = 767 > > em1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu > 1500 > > options=4219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,VLAN_HWTSO> > ether 00:15:17:ed:68:a4 > inet 1.1.1.2 netmask 0xffffff00 broadcast 1.1.1.255 > nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> > media: Ethernet autoselect (1000baseT <full-duplex>) > status: active > > The network traffic involves sending a lot of traffic via NFS. I > found > that if I disable TSO on the nic, it seems to fix the problem, or at > least makes its hard to reproduce. With tso enabled, it took perhaps > 30-120 seconds for the problem to manifest. Both on my test and > production box, I have not run into the problem in the past 45min. > To get NFS (which generates 35 mbuf lists for transmission) to work with TSO enabled, you need both r264630 (which reduces the default size of if_hw_tsomax slightly) and r268726 (which fixes the retry: case for if_em.c). Neither of these patches (the above rNNN are for head) are in 10.0. So, either run with TSO disabled or reduce the rsize, wsize of all NFS mounts to 32768 (which reduces the # of mbufs in a transmit list to 19). rick ps: For 10.0, 9.2 and earlier, you need to disable TSO for any network interface that supports less than 35 transmit fragments. (Transmit fragments are variously called *TXSEG*, or *SCAT* or similar in the .h file for the device driver.) > ---Mike > > > > > -- > ------------------- > Mike Tancsa, tel +1 519 651 3400 > Sentex Communications, mike@sentex.net > Providing Internet services since 1994 www.sentex.net > Cambridge, Ontario Canada http://www.tancsa.com/ > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to > "freebsd-stable-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?932734062.35704765.1410555627658.JavaMail.root>