Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 13 Sep 2014 16:45:14 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Hans Petter Selasky <hps@selasky.org>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Adrian Chadd <adrian@freebsd.org>
Subject:   Re: svn commit: r271504 - in head/sys: dev/oce dev/vmware/vmxnet3 dev/xen/netfront net netinet ofed/drivers/net/mlx4
Message-ID:  <205391404.35879067.1410641114256.JavaMail.root@uoguelph.ca>
In-Reply-To: <5414AA8C.8000809@selasky.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Hans Petter Selasky wrote:
> On 09/13/14 22:28, Rick Macklem wrote:
> > Hans Petter Selasky wrote:
> >> On 09/13/14 22:04, Rick Macklem wrote:
> >>> Hans Petter Selasky wrote:
> >>>> On 09/13/14 18:54, Adrian Chadd wrote:
> >>>>> Hi,
> >>>>>
> >>>>> Just for the record:
> >>>>>
> >>>>> * I'm glad you're tackling the TSO config stuff;
> >>>>> * I'm not glad you're trying to pack it into a u_int rather
> >>>>> than
> >>>>> creating a new structure and adding fields for it.
> >>>>>
> >>>>> I appreciate that you're trying to rush this in before 10.1,
> >>>>> but
> >>>>> this
> >>>>> is exactly why things shouldn't be rushed in before release
> >>>>> deadlines.
> >>>>> :)
> >>>>>
> >>>>> I'd really like to see this be broken out as a structure and
> >>>>> the
> >>>>> bit
> >>>>> shifting games for what really shouldn't be packed into a u_int
> >>>>> fixed.
> >>>>> Otherwise this is going to be deadweight that has to persist
> >>>>> past
> >>>>> 11.0.
> >>>>>
> >>>>
> >>>> Hi Adrian,
> >>>>
> >>>> I can make that change for -current, making the new structure
> >>>> and
> >>>> such.
> >>>> This change was intended for 10 where there is only one u_int
> >>>> for
> >>>> this
> >>>> information. Or do you want me to change that in 10 too?
> >>>>
> >>> Well, there are spare fields (if_ispare[4]) in struct ifnet that
> >>> I
> >>> believe can be used for new u_ints when MFC'ng a patch that adds
> >>> fields to struct ifnet in head. (If I have this wrong, someone
> >>> please
> >>> correct me.)
> >>>
> >>> I'll admit I don't really see an advantage to defining a
> >>> structure
> >>> vs
> >>> just defining a couple of additional u_ints, but so long as the
> >>> structure
> >>> doesn't cause alignment issues for any arch, I don't see a
> >>> problem
> >>> with
> >>> a structure.
> >>>
> >>> I tend to agree with Adrian that this shouldn't be rushed. (I,
> >>> personally,
> >>> think that if_hw_tsomax was poorly chosen, but that is already in
> >>> use, so
> >>> I think we need to add to that and not replace it.)
> >>>
> >>> I also hope that your testing has included quite a bit of
> >>> activity
> >>> on
> >>> an NFS mount using TSO and the default 64K rsize, wsize, since
> >>> that
> >>> is
> >>> going to generate a bunch of 35 mbuf transmit fragment lists and
> >>> there
> >>> is an edge case where the total data length excluding ethernet
> >>> header
> >>> is just under 64K (by less than the ethernet header length) where
> >>> the
> >>> list must be split by tcp_output() to avoid disaster.
> >>
> >> Hi,
> >>
> >> The ethernet and VLAN headers are still subtracted.
> >>
> > Where? I couldn't see it when I glanced at the patch.
> > (hdrlen in tcp_output() does not include ethernet header length and
> >   that is the only thing I see subtracted from the max_len.)
> 
> Hi Rick,
> 
> When the drivers setup the "if_hw_tsomax" field, they need to
> subtract
> the ethernet and vlan headers from the maximum TSO payload.
> 
> For example here:
> 
> +	sc->ifp->if_hw_tsomax = IF_HW_TSOMAX_BUILD_VALUE(
> +	    65535 - (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN) /* bytes */,
> +	    OCE_MAX_TX_ELEMENTS /* maximum frag count */,
> +	    12 /* 4K frag size */);
> 
> >
> > I see the default set to (65536 - 4). I don't know why you
> > subtracted 4
> > but I would have expected the max ethernet header length to be
> > subtracted
> > here?
> 
> That is another technical point. If you have a bunch of data to
> transfer
> you want the start and stop physical addresses to be aligned to some
> boundary, like cache line or 32-bit or 64-bit, because then the
> hardware
> doesn't have to do the byte shifting when it starts reading the data
> payload - right?
> 
> >
> > Note that this must be subtracted before use by tcp_output()
> > because there
> > are several network device drivers that support 32 transmit
> > segments and this
> > means that the TSO segment including ethernet headers must fit in
> > 65536 bytes
> > (32 * MCLBYTES). If it does not, then NFS over these devices is
> > busted because
> > even m_defrag() can`t make them fit.
> 
> I only found a few network drivers which actually set the TSO limit,
> and
> for the rest: The default limit is 255 frags of MAX 65536 bytes,
> which
> should not be reached in the cases you are describing.
> 
The problem is that almost no driver (last I looked xen/netfront was the
only one) sets if_hw_tsomax. However many of these driver do need the
maximum TSO segment size to be 64K - max_ethernet_header_len, so they
can pack the TSO segment (including ethernet header) into 32 mbuf clusters
via m_defrag().

So, the default needs to handle this case, because the drivers may never
get fixed.

rick

> --HPS
> 
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?205391404.35879067.1410641114256.JavaMail.root>