From owner-freebsd-net@FreeBSD.ORG Wed May 30 15:30:54 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id DA658106566B; Wed, 30 May 2012 15:30:54 +0000 (UTC) (envelope-from gallatin@cs.duke.edu) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.freebsd.org (Postfix) with ESMTP id 634158FC0A; Wed, 30 May 2012 15:30:54 +0000 (UTC) Received: from [192.168.200.2] (c-24-125-204-77.hsd1.va.comcast.net [24.125.204.77]) (authenticated bits=0) by duke.cs.duke.edu (8.14.5/8.14.5) with ESMTP id q4UFUlO8017589 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 30 May 2012 11:30:47 -0400 (EDT) X-DKIM: Sendmail DKIM Filter v2.8.3 duke.cs.duke.edu q4UFUlO8017589 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=cs.duke.edu; s=mail; t=1338391847; bh=eY5/99f72/hyIsYnn4tGISkXykojrcJ5uz6e5MpfhH0=; h=Message-ID:Date:From:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=EP+i3NMB/jQEhh8qV4IjDPrZu7wRU/uLbN1AxldHX2YXJ7V0il7a/OvMgNayu3AT4 xD+mlIQkkGO+SIaWUpJL0r/plzrB7ya5Uq5EJhgvJUe/7amqIp1oWP5morn+y+dzFj NCMV7aqcguvBRvN9OCKiz2qU2Qj6kmj11Iup6BiPY2hGFvToJhXJGVOkzQStRoVVVo FHVV7baXC8YQxIGEpRarbx681xirnHzZ/7wvrej55/J3Ms74s7+G2qlXHxEEXuDWhO uHG5NQnsgGpB7NTwRqCWwDz7ADC3wsrpSGvcVHAgisAIisgbE/WBbcamfa2K+27cWi 76kCpxvNVbWug== Message-ID: <4FC63D27.70807@cs.duke.edu> Date: Wed, 30 May 2012 11:30:47 -0400 From: Andrew Gallatin User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120329 Thunderbird/11.0.1 MIME-Version: 1.0 To: freebsd-net@freebsd.org References: <4FC635CC.5030608@freebsd.org> In-Reply-To: <4FC635CC.5030608@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Colin Percival Subject: Re: [please review] TSO mbuf chain length limiting patch X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 May 2012 15:30:54 -0000 On 05/30/12 10:59, Colin Percival wrote: > Hi all, > > The Xen virtual network interface has an issue (ok, really the issue is with > the linux back-end, but that's what most people are using) where it can't > handle scatter-gather writes with lots of pieces, aka. long mbuf chains. > This currently bites us hard with TSO enabled, since it produces said long > mbuf chains. Colin, Thanks for pointing me at this. I've been talking about this with bz@ a little. I've never been clear about what the max TSO size supported by FreeBSD is. The NIC I maintain (mxge) is limited to 64K - epsilon for both IPv4 *AND* IPv6. Up until now, this has been enforced by the 16-bit ip length limit of IPv4 and we have not had IPv6 TSO until this week. With IPv6, I'm worried that FreeBSD may now send packets down larger than I could handle. In my case, however, the problem is not s/g list length, but rather it is internal limits in the NIC which limit us to 64K - epsilon for IPv6 as well. I think there may be other NICs in the same boat for IPv6 (and maybe even some which cannot handle the full 64K for IPv4). Your approach would not work well for my size limit. For example, I'd have to set the limit to 4 mbufs to stay under 64KB. This would be assuming the worst case of 16KB jumbo mbufs, so that would limit me to ~8KB per TSO if 2KB mbufs were used. I think a better approach would be to have a limit on the size of the pre-segmented TCP payload size sent to the driver. I tend to think that this would be more generically useful, and it is a better match for the NDIS APIs, where a driver must specify the max TSO size. I think the changes to the TCP stack might be simpler (eg, they would seem to jive better with the existing "maxmtu" approach). I think this could work for you as well. You could set the Xen max tso size to be 32K (derived from 18 pages/skb, multiplied by a typical 2KB mbuf size, with some slack built in). If the chain was too large, you could m_defrag it down to size. Drew