Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 7 Sep 2012 17:44:48 -0400
From:      Jeremiah Lott <jlott@averesystems.com>
To:        freebsd-net@FreeBSD.org
Cc:        freebsd-bugs@FreeBSD.org
Subject:   Re: kern/167325: [netinet] [patch] sosend sometimes return EINVAL with TSO and VLAN on 82599 NIC
Message-ID:  <8339513D-6727-4343-A86E-4F5BB1F9827D@averesystems.com>
In-Reply-To: <201204270607.q3R67TiO026862@freefall.freebsd.org>
References:  <201204270607.q3R67TiO026862@freefall.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Apr 27, 2012, at 2:07 AM, linimon@FreeBSD.org wrote:

> Old Synopsis: sosend sometimes return EINVAL with TSO and VLAN on =
82599 NIC
> New Synopsis: [netinet] [patch] sosend sometimes return EINVAL with =
TSO and VLAN on 82599 NIC

> http://www.freebsd.org/cgi/query-pr.cgi?pr=3D167325

I did an analysis of this pr a while back and I figured I'd share.  =
Definitely looks like a real problem here, but at least in 8.2 it is =
difficult to hit it.  First off, vlan tagging is not required to hit =
this.  The code is question does not account for any amount of =
link-local header, so you can reproduce the bug even without vlans.

In order to trigger it, the tcp stack must choose to send a tso "packet" =
with a total size (including tcp+ip header and options, but not =
link-local header) between 65522 and 65535 bytes (because adding 14 byte =
link-local header will then exceed 64K limit).  In 8.1, the tcp stack =
only chooses to send tso bursts that will result in full mtu-size =
on-wire packets.  To achieve this, it will truncate the tso packet size =
to be a multiple of mss, not including header and tcp options.  The =
check has been relaxed a little in head, but the same basic check is =
still there.  None of the "normal" mtus have multiples falling in this =
range.  To reproduce it I used an mtu of 1445.  When timestamps are in =
use, every packet has a 40 bytes tcp/ip header + 10 bytes for the =
timestamp option + 2 bytes pad.  You can get a packet length 65523 as =
follows:

65523 - (40 + 10 + 2) =3D 65471 (size of tso packet data)
65471 / 47 =3D 1393 (size of data per on-wire packet)
1393 + (40 + 10 + 2) =3D 1445 (mtu is data + header + options + pad)

Once you set your mtu to 1445, you need a program that can get the stack =
to send a maximum sized packet.  With the congestion window that can be =
more difficult than it seems.  I used some python that sends enough data =
to open the window, sleeps long enough to drain all outstanding data, =
but not long enough for the congestion window to go stale and close =
again, then sends a bunch more data.  It also helps to turn off delayed =
acks on the receiver.  Sometimes you will not drain the entire send =
buffer because an ack for the final chunk is still delayed when you =
start the second transmit.  When the problem described in the pr hits, =
the EINVAL from bus_dmamap_load_mbuf_sg bubbles right up to userspace.

At first I thought this was a driver bug rather than stack bug.  The =
code in question does what it is commented to do (limit the tso packet =
so that ip->ip_len does not overflow).  However, it also seems =
reasonable that the driver limit its dma tag at 64K (do we really want =
it allocating another whole page just for the 14 byte link-local =
header).  Perhaps the tcp stack should ensure that the tso packet + =
max_linkhdr is < 64K.  Comments?

As an aside, the patch attached to the pr is also slightly wrong.  =
Taking the max_linkhdr into account when rounding the packet to be a =
multiple of mss does not make sense, it should only take it into account =
when calculating the max tso length.

  Jeremiah Lott
  Avere Systems




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8339513D-6727-4343-A86E-4F5BB1F9827D>