From owner-freebsd-net@FreeBSD.ORG Wed May 30 16:50:47 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 338AA106566B; Wed, 30 May 2012 16:50:47 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: from mail-we0-f182.google.com (mail-we0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 8FAD68FC18; Wed, 30 May 2012 16:50:46 +0000 (UTC) Received: by werg1 with SMTP id g1so18742wer.13 for ; Wed, 30 May 2012 09:50:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=YNsp0X/iQVgiIDAqVQiSO2ipJl0dgq5n71XUZBDEtyc=; b=E85Co+mrFqNyM0zmSxOej0PMGevzwsbI/KQe2z5eHpfpOEQ1jlRG5JOWDkbepnOrBZ 1m1WnkGmCLeKqbULE5Tg1yQHq9NnyrVEbaOKZev8LjeNEOhO7Adfy1b5kmUuG9iFKRsB jpf/a1RB1c+WL288QAWrvKtkQxInyROo6iLbevMQGJNiZP/otOARGgjgcPAXRdn3nizq eIjFS2/ugMx490bgtWu8ETDY/WpgN7LjDcDaUvJTceEQxA+IcCAejsDHp/9S3AsRO/aS pnmwX1IXeBoJAIfLNnpbT/7hrlQLdmKTDU54Jta1jDuCgHJ37nmPKpBIitR9pQVKiwZe vudw== MIME-Version: 1.0 Received: by 10.216.140.33 with SMTP id d33mr11047879wej.113.1338396645481; Wed, 30 May 2012 09:50:45 -0700 (PDT) Received: by 10.180.105.232 with HTTP; Wed, 30 May 2012 09:50:45 -0700 (PDT) In-Reply-To: <4FC63D27.70807@cs.duke.edu> References: <4FC635CC.5030608@freebsd.org> <4FC63D27.70807@cs.duke.edu> Date: Wed, 30 May 2012 09:50:45 -0700 Message-ID: From: Jack Vogel To: Andrew Gallatin Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-net@freebsd.org, Colin Percival Subject: Re: [please review] TSO mbuf chain length limiting patch X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 May 2012 16:50:47 -0000 On Wed, May 30, 2012 at 8:30 AM, Andrew Gallatin wrote: > On 05/30/12 10:59, Colin Percival wrote: > >> Hi all, >> >> The Xen virtual network interface has an issue (ok, really the issue is >> with >> the linux back-end, but that's what most people are using) where it can't >> handle scatter-gather writes with lots of pieces, aka. long mbuf chains. >> This currently bites us hard with TSO enabled, since it produces said long >> mbuf chains. >> > > Colin, > > Thanks for pointing me at this. I've been talking about this > with bz@ a little. > > I've never been clear about what the max TSO size supported by FreeBSD > is. The NIC I maintain (mxge) is limited to 64K - epsilon for both > IPv4 *AND* IPv6. Up until now, this has been enforced by the 16-bit > ip length limit of IPv4 and we have not had IPv6 TSO until this week. > With IPv6, I'm worried that FreeBSD may now send packets down larger > than I could handle. In my case, however, the problem is not s/g list > length, but rather it is internal limits in the NIC which limit us to > 64K - epsilon for IPv6 as well. I think there may be other NICs in > the same boat for IPv6 (and maybe even some which cannot handle the > full 64K for IPv4). > > Your approach would not work well for my size limit. For > example, I'd have to set the limit to 4 mbufs to stay under 64KB. > This would be assuming the worst case of 16KB jumbo mbufs, so > that would limit me to ~8KB per TSO if 2KB mbufs were used. > > I think a better approach would be to have a limit on the size of the > pre-segmented TCP payload size sent to the driver. I tend to think > that this would be more generically useful, and it is a better match > for the NDIS APIs, where a driver must specify the max TSO size. I > think the changes to the TCP stack might be simpler (eg, they > would seem to jive better with the existing "maxmtu" approach). > > I think this could work for you as well. You could set the Xen max > tso size to be 32K (derived from 18 pages/skb, multiplied by a typical > 2KB mbuf size, with some slack built in). If the chain was too large, > you could m_defrag it down to size. > Think I favor Drew's idea as well for what that's worth. Jack