Date: Fri, 7 Sep 2012 17:44:48 -0400 From: Jeremiah Lott <jlott@averesystems.com> To: freebsd-net@FreeBSD.org Cc: freebsd-bugs@FreeBSD.org Subject: Re: kern/167325: [netinet] [patch] sosend sometimes return EINVAL with TSO and VLAN on 82599 NIC Message-ID: <8339513D-6727-4343-A86E-4F5BB1F9827D@averesystems.com> In-Reply-To: <201204270607.q3R67TiO026862@freefall.freebsd.org> References: <201204270607.q3R67TiO026862@freefall.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Apr 27, 2012, at 2:07 AM, linimon@FreeBSD.org wrote: > Old Synopsis: sosend sometimes return EINVAL with TSO and VLAN on = 82599 NIC > New Synopsis: [netinet] [patch] sosend sometimes return EINVAL with = TSO and VLAN on 82599 NIC > http://www.freebsd.org/cgi/query-pr.cgi?pr=3D167325 I did an analysis of this pr a while back and I figured I'd share. = Definitely looks like a real problem here, but at least in 8.2 it is = difficult to hit it. First off, vlan tagging is not required to hit = this. The code is question does not account for any amount of = link-local header, so you can reproduce the bug even without vlans. In order to trigger it, the tcp stack must choose to send a tso "packet" = with a total size (including tcp+ip header and options, but not = link-local header) between 65522 and 65535 bytes (because adding 14 byte = link-local header will then exceed 64K limit). In 8.1, the tcp stack = only chooses to send tso bursts that will result in full mtu-size = on-wire packets. To achieve this, it will truncate the tso packet size = to be a multiple of mss, not including header and tcp options. The = check has been relaxed a little in head, but the same basic check is = still there. None of the "normal" mtus have multiples falling in this = range. To reproduce it I used an mtu of 1445. When timestamps are in = use, every packet has a 40 bytes tcp/ip header + 10 bytes for the = timestamp option + 2 bytes pad. You can get a packet length 65523 as = follows: 65523 - (40 + 10 + 2) =3D 65471 (size of tso packet data) 65471 / 47 =3D 1393 (size of data per on-wire packet) 1393 + (40 + 10 + 2) =3D 1445 (mtu is data + header + options + pad) Once you set your mtu to 1445, you need a program that can get the stack = to send a maximum sized packet. With the congestion window that can be = more difficult than it seems. I used some python that sends enough data = to open the window, sleeps long enough to drain all outstanding data, = but not long enough for the congestion window to go stale and close = again, then sends a bunch more data. It also helps to turn off delayed = acks on the receiver. Sometimes you will not drain the entire send = buffer because an ack for the final chunk is still delayed when you = start the second transmit. When the problem described in the pr hits, = the EINVAL from bus_dmamap_load_mbuf_sg bubbles right up to userspace. At first I thought this was a driver bug rather than stack bug. The = code in question does what it is commented to do (limit the tso packet = so that ip->ip_len does not overflow). However, it also seems = reasonable that the driver limit its dma tag at 64K (do we really want = it allocating another whole page just for the 14 byte link-local = header). Perhaps the tcp stack should ensure that the tso packet + = max_linkhdr is < 64K. Comments? As an aside, the patch attached to the pr is also slightly wrong. = Taking the max_linkhdr into account when rounding the packet to be a = multiple of mss does not make sense, it should only take it into account = when calculating the max tso length. Jeremiah Lott Avere Systems
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8339513D-6727-4343-A86E-4F5BB1F9827D>