Date: Sat, 13 Sep 2014 16:45:14 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Hans Petter Selasky <hps@selasky.org> Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Adrian Chadd <adrian@freebsd.org> Subject: Re: svn commit: r271504 - in head/sys: dev/oce dev/vmware/vmxnet3 dev/xen/netfront net netinet ofed/drivers/net/mlx4 Message-ID: <205391404.35879067.1410641114256.JavaMail.root@uoguelph.ca> In-Reply-To: <5414AA8C.8000809@selasky.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Hans Petter Selasky wrote: > On 09/13/14 22:28, Rick Macklem wrote: > > Hans Petter Selasky wrote: > >> On 09/13/14 22:04, Rick Macklem wrote: > >>> Hans Petter Selasky wrote: > >>>> On 09/13/14 18:54, Adrian Chadd wrote: > >>>>> Hi, > >>>>> > >>>>> Just for the record: > >>>>> > >>>>> * I'm glad you're tackling the TSO config stuff; > >>>>> * I'm not glad you're trying to pack it into a u_int rather > >>>>> than > >>>>> creating a new structure and adding fields for it. > >>>>> > >>>>> I appreciate that you're trying to rush this in before 10.1, > >>>>> but > >>>>> this > >>>>> is exactly why things shouldn't be rushed in before release > >>>>> deadlines. > >>>>> :) > >>>>> > >>>>> I'd really like to see this be broken out as a structure and > >>>>> the > >>>>> bit > >>>>> shifting games for what really shouldn't be packed into a u_int > >>>>> fixed. > >>>>> Otherwise this is going to be deadweight that has to persist > >>>>> past > >>>>> 11.0. > >>>>> > >>>> > >>>> Hi Adrian, > >>>> > >>>> I can make that change for -current, making the new structure > >>>> and > >>>> such. > >>>> This change was intended for 10 where there is only one u_int > >>>> for > >>>> this > >>>> information. Or do you want me to change that in 10 too? > >>>> > >>> Well, there are spare fields (if_ispare[4]) in struct ifnet that > >>> I > >>> believe can be used for new u_ints when MFC'ng a patch that adds > >>> fields to struct ifnet in head. (If I have this wrong, someone > >>> please > >>> correct me.) > >>> > >>> I'll admit I don't really see an advantage to defining a > >>> structure > >>> vs > >>> just defining a couple of additional u_ints, but so long as the > >>> structure > >>> doesn't cause alignment issues for any arch, I don't see a > >>> problem > >>> with > >>> a structure. > >>> > >>> I tend to agree with Adrian that this shouldn't be rushed. (I, > >>> personally, > >>> think that if_hw_tsomax was poorly chosen, but that is already in > >>> use, so > >>> I think we need to add to that and not replace it.) > >>> > >>> I also hope that your testing has included quite a bit of > >>> activity > >>> on > >>> an NFS mount using TSO and the default 64K rsize, wsize, since > >>> that > >>> is > >>> going to generate a bunch of 35 mbuf transmit fragment lists and > >>> there > >>> is an edge case where the total data length excluding ethernet > >>> header > >>> is just under 64K (by less than the ethernet header length) where > >>> the > >>> list must be split by tcp_output() to avoid disaster. > >> > >> Hi, > >> > >> The ethernet and VLAN headers are still subtracted. > >> > > Where? I couldn't see it when I glanced at the patch. > > (hdrlen in tcp_output() does not include ethernet header length and > > that is the only thing I see subtracted from the max_len.) > > Hi Rick, > > When the drivers setup the "if_hw_tsomax" field, they need to > subtract > the ethernet and vlan headers from the maximum TSO payload. > > For example here: > > + sc->ifp->if_hw_tsomax = IF_HW_TSOMAX_BUILD_VALUE( > + 65535 - (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN) /* bytes */, > + OCE_MAX_TX_ELEMENTS /* maximum frag count */, > + 12 /* 4K frag size */); > > > > > I see the default set to (65536 - 4). I don't know why you > > subtracted 4 > > but I would have expected the max ethernet header length to be > > subtracted > > here? > > That is another technical point. If you have a bunch of data to > transfer > you want the start and stop physical addresses to be aligned to some > boundary, like cache line or 32-bit or 64-bit, because then the > hardware > doesn't have to do the byte shifting when it starts reading the data > payload - right? > > > > > Note that this must be subtracted before use by tcp_output() > > because there > > are several network device drivers that support 32 transmit > > segments and this > > means that the TSO segment including ethernet headers must fit in > > 65536 bytes > > (32 * MCLBYTES). If it does not, then NFS over these devices is > > busted because > > even m_defrag() can`t make them fit. > > I only found a few network drivers which actually set the TSO limit, > and > for the rest: The default limit is 255 frags of MAX 65536 bytes, > which > should not be reached in the cases you are describing. > The problem is that almost no driver (last I looked xen/netfront was the only one) sets if_hw_tsomax. However many of these driver do need the maximum TSO segment size to be 64K - max_ethernet_header_len, so they can pack the TSO segment (including ethernet header) into 32 mbuf clusters via m_defrag(). So, the default needs to handle this case, because the drivers may never get fixed. rick > --HPS > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?205391404.35879067.1410641114256.JavaMail.root>