From owner-freebsd-net@FreeBSD.ORG Fri Sep 14 04:52:00 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4C049106564A; Fri, 14 Sep 2012 04:52:00 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from mail-pz0-f54.google.com (mail-pz0-f54.google.com [209.85.210.54]) by mx1.freebsd.org (Postfix) with ESMTP id 11AE38FC0A; Fri, 14 Sep 2012 04:51:59 +0000 (UTC) Received: by dadr6 with SMTP id r6so2383848dad.13 for ; Thu, 13 Sep 2012 21:51:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=EFUh68nDeINLdt0jkrIJ3x/VajfHUMoXFfCseVUg100=; b=PyupF+hxOwNinqgPPjpdGydmt7jSYAei6+l1TiT5w8qHm36q+HMQP887fJhHmyAOxh aHLFlVKqJx5wB4ttFMYp3+tlZrzmruh3+Bl7OUA4iR0oyeYyduCPcAwlh71Oy/gLtsTU G8qDoxHTa7pTCG5CymiZzbaCWNKIXvQr6wY0b8qcKQKkw/pPUfwPx0y011+9LYQbgWBQ q87gEf+WujWkrNhj1oZ2/O3R6EO1JiP7uz8d6K3CD/Io7EhB3agLaJcACgKgiMYvMT34 L6FtYnVjTQKXME6hfXgB5HX1WUz1CN1lMW9uQIIpi31Klcng/BrYNGYDmiBJ/BGK7USY vbhA== Received: by 10.66.90.36 with SMTP id bt4mr2552362pab.54.1347598319547; Thu, 13 Sep 2012 21:51:59 -0700 (PDT) Received: from pyunyh@gmail.com (lpe4.p59-icn.cdngp.net. [114.111.62.249]) by mx.google.com with ESMTPS id vf8sm408700pbc.27.2012.09.13.21.51.56 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 13 Sep 2012 21:51:58 -0700 (PDT) Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Fri, 14 Sep 2012 13:51:45 -0700 From: YongHyeon PYUN Date: Fri, 14 Sep 2012 13:51:45 -0700 To: Jeremiah Lott Message-ID: <20120914205145.GA7612@michelle.cdnetworks.com> References: <201204270607.q3R67TiO026862@freefall.freebsd.org> <8339513D-6727-4343-A86E-4F5BB1F9827D@averesystems.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8339513D-6727-4343-A86E-4F5BB1F9827D@averesystems.com> User-Agent: Mutt/1.4.2.3i Cc: freebsd-net@freebsd.org, freebsd-bugs@freebsd.org Subject: Re: kern/167325: [netinet] [patch] sosend sometimes return EINVAL with TSO and VLAN on 82599 NIC X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Sep 2012 04:52:00 -0000 On Fri, Sep 07, 2012 at 05:44:48PM -0400, Jeremiah Lott wrote: > On Apr 27, 2012, at 2:07 AM, linimon@FreeBSD.org wrote: > > > Old Synopsis: sosend sometimes return EINVAL with TSO and VLAN on 82599 NIC > > New Synopsis: [netinet] [patch] sosend sometimes return EINVAL with TSO and VLAN on 82599 NIC > > > http://www.freebsd.org/cgi/query-pr.cgi?pr=167325 > > I did an analysis of this pr a while back and I figured I'd share. Definitely looks like a real problem here, but at least in 8.2 it is difficult to hit it. First off, vlan tagging is not required to hit this. The code is question does not account for any amount of link-local header, so you can reproduce the bug even without vlans. > > In order to trigger it, the tcp stack must choose to send a tso "packet" with a total size (including tcp+ip header and options, but not link-local header) between 65522 and 65535 bytes (because adding 14 byte link-local header will then exceed 64K limit). In 8.1, the tcp stack only chooses to send tso bursts that will result in full mtu-size on-wire packets. To achieve this, it will truncate the tso packet size to be a multiple of mss, not including header and tcp options. The check has been relaxed a little in head, but the same basic check is still there. None of the "normal" mtus have multiples falling in this range. To reproduce it I used an mtu of 1445. When timestamps are in use, every packet has a 40 bytes tcp/ip header + 10 bytes for the timestamp option + 2 bytes pad. You can get a packet length 65523 as follows: > > 65523 - (40 + 10 + 2) = 65471 (size of tso packet data) > 65471 / 47 = 1393 (size of data per on-wire packet) > 1393 + (40 + 10 + 2) = 1445 (mtu is data + header + options + pad) > > Once you set your mtu to 1445, you need a program that can get the stack to send a maximum sized packet. With the congestion window that can be more difficult than it seems. I used some python that sends enough data to open the window, sleeps long enough to drain all outstanding data, but not long enough for the congestion window to go stale and close again, then sends a bunch more data. It also helps to turn off delayed acks on the receiver. Sometimes you will not drain the entire send buffer because an ack for the final chunk is still delayed when you start the second transmit. When the problem described in the pr hits, the EINVAL from bus_dmamap_load_mbuf_sg bubbles right up to userspace. > > At first I thought this was a driver bug rather than stack bug. The code in question does what it is commented to do (limit the tso packet so that ip->ip_len does not overflow). However, it also seems reasonable that the driver limit its dma tag at 64K (do we really want it allocating another whole page just for the 14 byte link-local header). Perhaps the tcp stack should ensure that the tso packet + max_linkhdr is < 64K. Comments? Hmm, I think it's a driver bug. Upper stack may not know whether L2 includes VLAN. Almost all drivers in tree includes L2 header size in DMA tag. If ethernet hardwares can handle this oversized frames(64KB + L2 header) with TSOv4/TSOv6 I think there is no reason not to support it. > > As an aside, the patch attached to the pr is also slightly wrong. Taking the max_linkhdr into account when rounding the packet to be a multiple of mss does not make sense, it should only take it into account when calculating the max tso length. > > Jeremiah Lott > Avere Systems