From owner-freebsd-net@FreeBSD.ORG Tue Mar 25 21:46:23 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 09418D8E; Tue, 25 Mar 2014 21:46:23 +0000 (UTC) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 7C175994; Tue, 25 Mar 2014 21:46:21 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqUEAOT4MVODaFve/2dsb2JhbABZg0FXgwe4MIZkUYEzdIIlAQEBAwEBAQEgBCcgCxsYAgINGQIpAQkmBggHBAEcAQOHUAgNrUSiJheBKYxjCwUCARsBMweCb4FJBJV2hAmRAINKITF8QQ X-IronPort-AV: E=Sophos;i="4.97,730,1389762000"; d="scan'208";a="108832838" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 25 Mar 2014 17:45:59 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 0A309B4050; Tue, 25 Mar 2014 17:46:00 -0400 (EDT) Date: Tue, 25 Mar 2014 17:46:00 -0400 (EDT) From: Rick Macklem To: Markus Gebert Message-ID: <2042344654.506796.1395783960030.JavaMail.root@uoguelph.ca> In-Reply-To: <906D7DF8-DD6E-4501-B3ED-42EF728241F4@hostpoint.ch> Subject: Re: 9.2 ixgbe tx queue hang MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: FreeBSD Net , Garrett Wollman , Jack Vogel , Christopher Forgeron X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Mar 2014 21:46:23 -0000 Markus Gebert wrote: >=20 > On 25.03.2014, at 02:18, Rick Macklem wrote: >=20 > > Christopher Forgeron wrote: > >>=20 > >>=20 > >>=20 > >> This is regarding the TSO patch that Rick suggested earlier. (With > >> many thanks for his time and suggestion) > >>=20 > >>=20 > >> As I mentioned earlier, it did not fix the issue on a 10.0 system. > >> It > >> did make it less of a problem on 9.2, but either way, I think it's > >> not needed, and shouldn't be considered as a patch for > >> testing/etc. > >>=20 > >>=20 > >> Patching TSO to anything other than a max value (and by default > >> the > >> code gives it IP_MAXPACKET) is confusing the matter, as the packet > >> length ultimately needs to be adjusted for many things on the fly > >> like TCP Options, etc. Using static header sizes won't be a good > >> idea. > >>=20 > > If you look at tcp_output(), you'll notice that it doesn't do TSO > > if > > there are any options. That way it knows that the TCP/IP header is > > just hdrlen. > >=20 > > If you don't limit the TSO packet (including TCP/IP and ethernet > > headers) > > to 64K, then the "ix" driver can't send them, which is the problem > > you guys are seeing. > >=20 > > There are other ways to fix this problem, but they all may > > introduce > > issues that reducing if_hw_tsomax by a small amount does not. > > For example, m_defrag() could be modified to use 4K pagesize > > clusters, > > but this might introduce memory fragmentation problems. (I observed > > what I think are memory fragmentation problems when I switched NFS > > to use 4K pagesize clusters for large I/O messages.) > >=20 > > If setting IP_MAXPACKET to 65518 fixes the problem (no more EFBIG > > error replies), then that is the size that if_hw_tsomax can be set > > to (just can't change IP_MAXPACKET, but that is defined for other > > things). (It just happens that IP_MAXPACKET is what if_hw_tsomax > > defaults to. It has no other effect w.r.t. TSO.) > >=20 > >>=20 > >> Additionally, it seems that setting nic TSO will/may be ignored by > >> code like this in sys/netinet/tcp_output.c: > >>=20 >=20 > Is this confirmed or still a =E2=80=98it seems=E2=80=99? Have you actuall= y seen a > tp->t_tsomax value in tcp_output() bigger than if_hw_tsomax or was > this just speculation because the values are stored in different > places? (Sorry, if you already stated this in another email, it=E2=80=99s > currently hard to keep track of all the information.) >=20 > Anyway, this dtrace one-liner should be a good test if other values > appear in tp->t_tsomax: >=20 > # dtrace -n 'fbt::tcp_output:entry / args[0]->t_tsomax !=3D 0 && > args[0]->t_tsomax !=3D 65518 / { printf("unexpected tp->t_tsomax: > %i\n", args[0]->t_tsomax); stack(); }' >=20 > Remember to adjust the value in the condition to whatever you=E2=80=99re > currently expecting. The value seems to be 0 for new connections, > probably when tcp_mss() has not been called yet. So that=E2=80=99s seems > normal and I have excluded that case too. This will also print a > kernel stack trace in case it sees an unexpected value. >=20 >=20 > > Yes, but I don't know why. > > The only conjecture I can come up with is that another net driver > > is > > stacked above "ix" and the setting for if_hw_tsomax doesn't > > propagate > > up. (If you look at the commit log message for r251296, the intent > > of adding if_hw_tsomax was to allow device drivers to set a smaller > > tsomax than IP_MAXPACKET.) > >=20 > > Are you using any of the "stacked" network device drivers like > > lagg? I don't even know what the others all are? > > Maybe someone else can list them? >=20 > I guess the most obvious are lagg and vlan (and probably carp on > FreeBSD 9.x or older). >=20 > On request from Jack, we=E2=80=99ve eliminated lagg and vlan from the > picture, which gives us plain ixgbe interfaces with no stacked > interfaces on top of it. And we can still reproduce the problem. >=20 This was related to the "did if_hw_tsomax set tp->t_tsomax to the same value?" question. Since you reported that my patch that set if_hw_tsomax in the driver didn't fix the problem, that suggests that tp->t_tsomax isn't being set to if_hw_tsomax from the driver, but we don't know why? rick >=20 > Markus >=20 >=20 > >=20 > > rick > >>=20 > >> 10.0 Code: > >>=20 > >> 780 if (len > tp->t_tsomax - hdrlen) { !! > >> 781 len =3D tp->t_tsomax - hdrlen; !! > >> 782 sendalot =3D 1; > >> 783 } > >>=20 > >>=20 > >>=20 > >>=20 > >> I've put debugging here, set the nic's max TSO as per Rick's patch > >> ( > >> set to say 32k), and have seen that tp->t_tsomax =3D=3D IP_MAXPACKET. > >> It's being set someplace else, and thus our attempts to set TSO on > >> the nic may be in vain. > >>=20 > >>=20 > >> It may have mattered more in 9.2, as I see the code doesn't use > >> tp->t_tsomax in some locations, and may actually default to what > >> the > >> nic is set to. > >>=20 > >> The NIC may still win, I didn't walk through the code to confirm, > >> it > >> was enough to suggest to me that setting TSO wouldn't fix this > >> issue. > >>=20 > >>=20 > >> However, this is still a TSO related issue, it's just not one > >> related > >> to the setting of TSO's max size. > >>=20 > >> A 10.0-STABLE system with tso disabled on ix0 doesn't have a > >> single > >> packet over IP_MAXPACKET in 1 hour of runtime. I'll let it go a > >> bit > >> longer to increase confidence in this assertion, but I don't want > >> to > >> waste time on this when I could be logging problem packets on a > >> system with TSO enabled. > >>=20 > >>=20 > >> Comments are very welcome.. > >>=20 > >>=20 > >>=20 > > _______________________________________________ > > freebsd-net@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-net > > To unsubscribe, send any mail to > > "freebsd-net-unsubscribe@freebsd.org" > >=20 >=20 >=20