Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 13 Sep 2014 22:35:24 +0200
From:      Hans Petter Selasky <hps@selasky.org>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Adrian Chadd <adrian@freebsd.org>
Subject:   Re: svn commit: r271504 - in head/sys: dev/oce dev/vmware/vmxnet3 dev/xen/netfront net netinet ofed/drivers/net/mlx4
Message-ID:  <5414AA8C.8000809@selasky.org>
In-Reply-To: <1971026585.35877589.1410640100903.JavaMail.root@uoguelph.ca>
References:  <1971026585.35877589.1410640100903.JavaMail.root@uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
On 09/13/14 22:28, Rick Macklem wrote:
> Hans Petter Selasky wrote:
>> On 09/13/14 22:04, Rick Macklem wrote:
>>> Hans Petter Selasky wrote:
>>>> On 09/13/14 18:54, Adrian Chadd wrote:
>>>>> Hi,
>>>>>
>>>>> Just for the record:
>>>>>
>>>>> * I'm glad you're tackling the TSO config stuff;
>>>>> * I'm not glad you're trying to pack it into a u_int rather than
>>>>> creating a new structure and adding fields for it.
>>>>>
>>>>> I appreciate that you're trying to rush this in before 10.1, but
>>>>> this
>>>>> is exactly why things shouldn't be rushed in before release
>>>>> deadlines.
>>>>> :)
>>>>>
>>>>> I'd really like to see this be broken out as a structure and the
>>>>> bit
>>>>> shifting games for what really shouldn't be packed into a u_int
>>>>> fixed.
>>>>> Otherwise this is going to be deadweight that has to persist past
>>>>> 11.0.
>>>>>
>>>>
>>>> Hi Adrian,
>>>>
>>>> I can make that change for -current, making the new structure and
>>>> such.
>>>> This change was intended for 10 where there is only one u_int for
>>>> this
>>>> information. Or do you want me to change that in 10 too?
>>>>
>>> Well, there are spare fields (if_ispare[4]) in struct ifnet that I
>>> believe can be used for new u_ints when MFC'ng a patch that adds
>>> fields to struct ifnet in head. (If I have this wrong, someone
>>> please
>>> correct me.)
>>>
>>> I'll admit I don't really see an advantage to defining a structure
>>> vs
>>> just defining a couple of additional u_ints, but so long as the
>>> structure
>>> doesn't cause alignment issues for any arch, I don't see a problem
>>> with
>>> a structure.
>>>
>>> I tend to agree with Adrian that this shouldn't be rushed. (I,
>>> personally,
>>> think that if_hw_tsomax was poorly chosen, but that is already in
>>> use, so
>>> I think we need to add to that and not replace it.)
>>>
>>> I also hope that your testing has included quite a bit of activity
>>> on
>>> an NFS mount using TSO and the default 64K rsize, wsize, since that
>>> is
>>> going to generate a bunch of 35 mbuf transmit fragment lists and
>>> there
>>> is an edge case where the total data length excluding ethernet
>>> header
>>> is just under 64K (by less than the ethernet header length) where
>>> the
>>> list must be split by tcp_output() to avoid disaster.
>>
>> Hi,
>>
>> The ethernet and VLAN headers are still subtracted.
>>
> Where? I couldn't see it when I glanced at the patch.
> (hdrlen in tcp_output() does not include ethernet header length and
>   that is the only thing I see subtracted from the max_len.)

Hi Rick,

When the drivers setup the "if_hw_tsomax" field, they need to subtract 
the ethernet and vlan headers from the maximum TSO payload.

For example here:

+	sc->ifp->if_hw_tsomax = IF_HW_TSOMAX_BUILD_VALUE(
+	    65535 - (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN) /* bytes */,
+	    OCE_MAX_TX_ELEMENTS /* maximum frag count */,
+	    12 /* 4K frag size */);

>
> I see the default set to (65536 - 4). I don't know why you subtracted 4
> but I would have expected the max ethernet header length to be subtracted
> here?

That is another technical point. If you have a bunch of data to transfer 
you want the start and stop physical addresses to be aligned to some 
boundary, like cache line or 32-bit or 64-bit, because then the hardware 
doesn't have to do the byte shifting when it starts reading the data 
payload - right?

>
> Note that this must be subtracted before use by tcp_output() because there
> are several network device drivers that support 32 transmit segments and this
> means that the TSO segment including ethernet headers must fit in 65536 bytes
> (32 * MCLBYTES). If it does not, then NFS over these devices is busted because
> even m_defrag() can`t make them fit.

I only found a few network drivers which actually set the TSO limit, and 
for the rest: The default limit is 255 frags of MAX 65536 bytes, which 
should not be reached in the cases you are describing.

--HPS



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5414AA8C.8000809>