From owner-svn-src-head@FreeBSD.ORG Sat Sep 13 20:45:16 2014 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 297CBDDB; Sat, 13 Sep 2014 20:45:16 +0000 (UTC) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id AD0A188A; Sat, 13 Sep 2014 20:45:15 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqIEAF6sFFSDaFve/2dsb2JhbABfhDuCeMpngyEBgR54hAQBAQQjBFIbGAICDRIHAlkGCohHpw+VIwEXgSyNSiM0B4J4gVMFskeBZx4WgV8hgTZBgQIBAQE X-IronPort-AV: E=Sophos;i="5.04,518,1406606400"; d="scan'208";a="155014070" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 13 Sep 2014 16:45:14 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 4009EB4031; Sat, 13 Sep 2014 16:45:14 -0400 (EDT) Date: Sat, 13 Sep 2014 16:45:14 -0400 (EDT) From: Rick Macklem To: Hans Petter Selasky Message-ID: <205391404.35879067.1410641114256.JavaMail.root@uoguelph.ca> In-Reply-To: <5414AA8C.8000809@selasky.org> Subject: Re: svn commit: r271504 - in head/sys: dev/oce dev/vmware/vmxnet3 dev/xen/netfront net netinet ofed/drivers/net/mlx4 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Adrian Chadd X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Sep 2014 20:45:16 -0000 Hans Petter Selasky wrote: > On 09/13/14 22:28, Rick Macklem wrote: > > Hans Petter Selasky wrote: > >> On 09/13/14 22:04, Rick Macklem wrote: > >>> Hans Petter Selasky wrote: > >>>> On 09/13/14 18:54, Adrian Chadd wrote: > >>>>> Hi, > >>>>> > >>>>> Just for the record: > >>>>> > >>>>> * I'm glad you're tackling the TSO config stuff; > >>>>> * I'm not glad you're trying to pack it into a u_int rather > >>>>> than > >>>>> creating a new structure and adding fields for it. > >>>>> > >>>>> I appreciate that you're trying to rush this in before 10.1, > >>>>> but > >>>>> this > >>>>> is exactly why things shouldn't be rushed in before release > >>>>> deadlines. > >>>>> :) > >>>>> > >>>>> I'd really like to see this be broken out as a structure and > >>>>> the > >>>>> bit > >>>>> shifting games for what really shouldn't be packed into a u_int > >>>>> fixed. > >>>>> Otherwise this is going to be deadweight that has to persist > >>>>> past > >>>>> 11.0. > >>>>> > >>>> > >>>> Hi Adrian, > >>>> > >>>> I can make that change for -current, making the new structure > >>>> and > >>>> such. > >>>> This change was intended for 10 where there is only one u_int > >>>> for > >>>> this > >>>> information. Or do you want me to change that in 10 too? > >>>> > >>> Well, there are spare fields (if_ispare[4]) in struct ifnet that > >>> I > >>> believe can be used for new u_ints when MFC'ng a patch that adds > >>> fields to struct ifnet in head. (If I have this wrong, someone > >>> please > >>> correct me.) > >>> > >>> I'll admit I don't really see an advantage to defining a > >>> structure > >>> vs > >>> just defining a couple of additional u_ints, but so long as the > >>> structure > >>> doesn't cause alignment issues for any arch, I don't see a > >>> problem > >>> with > >>> a structure. > >>> > >>> I tend to agree with Adrian that this shouldn't be rushed. (I, > >>> personally, > >>> think that if_hw_tsomax was poorly chosen, but that is already in > >>> use, so > >>> I think we need to add to that and not replace it.) > >>> > >>> I also hope that your testing has included quite a bit of > >>> activity > >>> on > >>> an NFS mount using TSO and the default 64K rsize, wsize, since > >>> that > >>> is > >>> going to generate a bunch of 35 mbuf transmit fragment lists and > >>> there > >>> is an edge case where the total data length excluding ethernet > >>> header > >>> is just under 64K (by less than the ethernet header length) where > >>> the > >>> list must be split by tcp_output() to avoid disaster. > >> > >> Hi, > >> > >> The ethernet and VLAN headers are still subtracted. > >> > > Where? I couldn't see it when I glanced at the patch. > > (hdrlen in tcp_output() does not include ethernet header length and > > that is the only thing I see subtracted from the max_len.) > > Hi Rick, > > When the drivers setup the "if_hw_tsomax" field, they need to > subtract > the ethernet and vlan headers from the maximum TSO payload. > > For example here: > > + sc->ifp->if_hw_tsomax = IF_HW_TSOMAX_BUILD_VALUE( > + 65535 - (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN) /* bytes */, > + OCE_MAX_TX_ELEMENTS /* maximum frag count */, > + 12 /* 4K frag size */); > > > > > I see the default set to (65536 - 4). I don't know why you > > subtracted 4 > > but I would have expected the max ethernet header length to be > > subtracted > > here? > > That is another technical point. If you have a bunch of data to > transfer > you want the start and stop physical addresses to be aligned to some > boundary, like cache line or 32-bit or 64-bit, because then the > hardware > doesn't have to do the byte shifting when it starts reading the data > payload - right? > > > > > Note that this must be subtracted before use by tcp_output() > > because there > > are several network device drivers that support 32 transmit > > segments and this > > means that the TSO segment including ethernet headers must fit in > > 65536 bytes > > (32 * MCLBYTES). If it does not, then NFS over these devices is > > busted because > > even m_defrag() can`t make them fit. > > I only found a few network drivers which actually set the TSO limit, > and > for the rest: The default limit is 255 frags of MAX 65536 bytes, > which > should not be reached in the cases you are describing. > The problem is that almost no driver (last I looked xen/netfront was the only one) sets if_hw_tsomax. However many of these driver do need the maximum TSO segment size to be 64K - max_ethernet_header_len, so they can pack the TSO segment (including ethernet header) into 32 mbuf clusters via m_defrag(). So, the default needs to handle this case, because the drivers may never get fixed. rick > --HPS > >