From owner-freebsd-net@FreeBSD.ORG Tue Mar 25 22:11:22 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7C7566FB; Tue, 25 Mar 2014 22:11:22 +0000 (UTC) Received: from mail.adm.hostpoint.ch (mail.adm.hostpoint.ch [IPv6:2a00:d70:0:a::e0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 0D758C07; Tue, 25 Mar 2014 22:11:21 +0000 (UTC) Received: from 46-127-132-15.dynamic.hispeed.ch ([46.127.132.15]:57080 helo=[172.16.1.156]) by mail.adm.hostpoint.ch with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1WSZZ1-00084F-Iv; Tue, 25 Mar 2014 23:11:19 +0100 Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.2 \(1874\)) Subject: Re: 9.2 ixgbe tx queue hang From: Markus Gebert In-Reply-To: <2042344654.506796.1395783960030.JavaMail.root@uoguelph.ca> Date: Tue, 25 Mar 2014 23:11:17 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <1573EFCE-EFCF-4ABF-A1A5-77714B56F9F1@hostpoint.ch> References: <2042344654.506796.1395783960030.JavaMail.root@uoguelph.ca> To: Rick Macklem X-Mailer: Apple Mail (2.1874) Cc: FreeBSD Net , Garrett Wollman , Jack Vogel , Christopher Forgeron X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Mar 2014 22:11:22 -0000 On 25.03.2014, at 22:46, Rick Macklem wrote: > Markus Gebert wrote: >>=20 >> On 25.03.2014, at 02:18, Rick Macklem wrote: >>=20 >>> Christopher Forgeron wrote: >>>>=20 >>>>=20 >>>>=20 >>>> This is regarding the TSO patch that Rick suggested earlier. (With >>>> many thanks for his time and suggestion) >>>>=20 >>>>=20 >>>> As I mentioned earlier, it did not fix the issue on a 10.0 system. >>>> It >>>> did make it less of a problem on 9.2, but either way, I think it's >>>> not needed, and shouldn't be considered as a patch for >>>> testing/etc. >>>>=20 >>>>=20 >>>> Patching TSO to anything other than a max value (and by default >>>> the >>>> code gives it IP_MAXPACKET) is confusing the matter, as the packet >>>> length ultimately needs to be adjusted for many things on the fly >>>> like TCP Options, etc. Using static header sizes won't be a good >>>> idea. >>>>=20 >>> If you look at tcp_output(), you'll notice that it doesn't do TSO >>> if >>> there are any options. That way it knows that the TCP/IP header is >>> just hdrlen. >>>=20 >>> If you don't limit the TSO packet (including TCP/IP and ethernet >>> headers) >>> to 64K, then the "ix" driver can't send them, which is the problem >>> you guys are seeing. >>>=20 >>> There are other ways to fix this problem, but they all may >>> introduce >>> issues that reducing if_hw_tsomax by a small amount does not. >>> For example, m_defrag() could be modified to use 4K pagesize >>> clusters, >>> but this might introduce memory fragmentation problems. (I observed >>> what I think are memory fragmentation problems when I switched NFS >>> to use 4K pagesize clusters for large I/O messages.) >>>=20 >>> If setting IP_MAXPACKET to 65518 fixes the problem (no more EFBIG >>> error replies), then that is the size that if_hw_tsomax can be set >>> to (just can't change IP_MAXPACKET, but that is defined for other >>> things). (It just happens that IP_MAXPACKET is what if_hw_tsomax >>> defaults to. It has no other effect w.r.t. TSO.) >>>=20 >>>>=20 >>>> Additionally, it seems that setting nic TSO will/may be ignored by >>>> code like this in sys/netinet/tcp_output.c: >>>>=20 >>=20 >> Is this confirmed or still a =91it seems=92? Have you actually seen a >> tp->t_tsomax value in tcp_output() bigger than if_hw_tsomax or was >> this just speculation because the values are stored in different >> places? (Sorry, if you already stated this in another email, it=92s >> currently hard to keep track of all the information.) >>=20 >> Anyway, this dtrace one-liner should be a good test if other values >> appear in tp->t_tsomax: >>=20 >> # dtrace -n 'fbt::tcp_output:entry / args[0]->t_tsomax !=3D 0 && >> args[0]->t_tsomax !=3D 65518 / { printf("unexpected tp->t_tsomax: >> %i\n", args[0]->t_tsomax); stack(); }' >>=20 >> Remember to adjust the value in the condition to whatever you=92re >> currently expecting. The value seems to be 0 for new connections, >> probably when tcp_mss() has not been called yet. So that=92s seems >> normal and I have excluded that case too. This will also print a >> kernel stack trace in case it sees an unexpected value. >>=20 >>=20 >>> Yes, but I don't know why. >>> The only conjecture I can come up with is that another net driver >>> is >>> stacked above "ix" and the setting for if_hw_tsomax doesn't >>> propagate >>> up. (If you look at the commit log message for r251296, the intent >>> of adding if_hw_tsomax was to allow device drivers to set a smaller >>> tsomax than IP_MAXPACKET.) >>>=20 >>> Are you using any of the "stacked" network device drivers like >>> lagg? I don't even know what the others all are? >>> Maybe someone else can list them? >>=20 >> I guess the most obvious are lagg and vlan (and probably carp on >> FreeBSD 9.x or older). >>=20 >> On request from Jack, we=92ve eliminated lagg and vlan from the >> picture, which gives us plain ixgbe interfaces with no stacked >> interfaces on top of it. And we can still reproduce the problem. >>=20 > This was related to the "did if_hw_tsomax set tp->t_tsomax to the > same value?" question. Since you reported that my patch that set > if_hw_tsomax in the driver didn't fix the problem, that suggests > that tp->t_tsomax isn't being set to if_hw_tsomax from the driver, > but we don't know why? Jack asked us to remove lagg/vlans in the very beginning of this thread, = and when had done that, the problem was still there. So my answer was = not related to your recent patch. I wanted to clarify that we have been = testing with ixgbe only for quite some time and that stacked interfaces = could not be a source of problems in our test scenario. We have just started testing your patch that sets if_hw_tsomax = yesterday. So far I have it running on two systems along with some = printfs and the dtrace one-liner that watches over tp->t_tsomax in = tcp_output(). So far we=92ve haven=92t had any problems with these two = servers, and the dtrace probe never fired, so far it looks like = tp->t_tsomax always gets set from if_hw_tsomax. But it=92s too soon to = make a conclusion, it may take days to trigger the problem again. It = might also be fixed with your patch. I=92m booting more systems with the test kernel and I will be watching = all of them with dtrace to see I i find an occurence where tp->t_tsomax = is off. I hope that with more systems, I=92ll have an answer more = quickly. But digging around the code, I still don=92t see a way how tp->tsomax = could not have been set from if_hw_tsomax when there are no stacked = interfaces=85 Markus > rick >=20 >>=20 >> Markus >>=20 >>=20 >>>=20 >>> rick >>>>=20 >>>> 10.0 Code: >>>>=20 >>>> 780 if (len > tp->t_tsomax - hdrlen) { !! >>>> 781 len =3D tp->t_tsomax - hdrlen; !! >>>> 782 sendalot =3D 1; >>>> 783 } >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>> I've put debugging here, set the nic's max TSO as per Rick's patch >>>> ( >>>> set to say 32k), and have seen that tp->t_tsomax =3D=3D = IP_MAXPACKET. >>>> It's being set someplace else, and thus our attempts to set TSO on >>>> the nic may be in vain. >>>>=20 >>>>=20 >>>> It may have mattered more in 9.2, as I see the code doesn't use >>>> tp->t_tsomax in some locations, and may actually default to what >>>> the >>>> nic is set to. >>>>=20 >>>> The NIC may still win, I didn't walk through the code to confirm, >>>> it >>>> was enough to suggest to me that setting TSO wouldn't fix this >>>> issue. >>>>=20 >>>>=20 >>>> However, this is still a TSO related issue, it's just not one >>>> related >>>> to the setting of TSO's max size. >>>>=20 >>>> A 10.0-STABLE system with tso disabled on ix0 doesn't have a >>>> single >>>> packet over IP_MAXPACKET in 1 hour of runtime. I'll let it go a >>>> bit >>>> longer to increase confidence in this assertion, but I don't want >>>> to >>>> waste time on this when I could be logging problem packets on a >>>> system with TSO enabled. >>>>=20 >>>>=20 >>>> Comments are very welcome.. >>>>=20 >>>>=20 >>>>=20 >>> _______________________________________________ >>> freebsd-net@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>> To unsubscribe, send any mail to >>> "freebsd-net-unsubscribe@freebsd.org" >>>=20 >>=20 >>=20 > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"