Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 31 Oct 2012 08:59:06 +0100
From:      Andre Oppermann <andre@freebsd.org>
To:        pyunyh@gmail.com
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, Gleb Smirnoff <glebius@freebsd.org>, src-committers@freebsd.org
Subject:   Re: svn commit: r242161 - in head/sys: net netinet netpfil/pf
Message-ID:  <5090DA4A.5090502@freebsd.org>
In-Reply-To: <20121030022507.GB4298@michelle.cdnetworks.com>
References:  <201210262106.q9QL6YgY000943@svn.freebsd.org> <508BBE6C.7010409@freebsd.org> <20121027220137.GJ70741@FreeBSD.org> <20121029204104.GA1431@michelle.cdnetworks.com> <20121029052100.GO70741@FreeBSD.org> <20121029214038.GD1431@michelle.cdnetworks.com> <508E3C6B.2090102@freebsd.org> <20121030022507.GB4298@michelle.cdnetworks.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 30.10.2012 03:25, YongHyeon PYUN wrote:
> On Mon, Oct 29, 2012 at 09:20:59AM +0100, Andre Oppermann wrote:
>> On 29.10.2012 22:40, YongHyeon PYUN wrote:
>>> On Mon, Oct 29, 2012 at 09:21:00AM +0400, Gleb Smirnoff wrote:
>>>> On Mon, Oct 29, 2012 at 01:41:04PM -0700, YongHyeon PYUN wrote:
>>>> Y> On Sun, Oct 28, 2012 at 02:01:37AM +0400, Gleb Smirnoff wrote:
>>>> Y> > On Sat, Oct 27, 2012 at 12:58:52PM +0200, Andre Oppermann wrote:
>>>> Y> > A> On 26.10.2012 23:06, Gleb Smirnoff wrote:
>>>> Y> > A> > Author: glebius
>>>> Y> > A> > Date: Fri Oct 26 21:06:33 2012
>>>> Y> > A> > New Revision: 242161
>>>> Y> > A> > URL: http://svn.freebsd.org/changeset/base/242161
>>>> Y> > A> >
>>>> Y> > A> > Log:
>>>> Y> > A> >    o Remove last argument to ip_fragment(), and obtain all
>>>> needed information
>>>> Y> > A> >      on checksums directly from mbuf flags. This simplifies
>>>> code.
>>>> Y> > A> >    o Clear CSUM_IP from the mbuf in ip_fragment() if we did
>>>> checksums in
>>>> Y>
>>>> Y> I'm not sure whether ti(4)'s checksum offloading for IP fragmented
>>>> Y> packets(CSUM_IP_FRAGS) still works after this change.  ti(4)
>>>> Y> requires CSUM_IP should be set for IP fragmented packets. Not sure
>>>> Y> whether it's a bug or not. I have a ti(4) controller but I don't
>>>> Y> remember where I can find it and don't have a link
>>>> Y> parter(1000baseSX) to test it. :-(
>>>>
>>>> ti(4) declares both CSUM_IP and CSUM_IP_FRAGS, so ip_fragment() won't do
>>>
>>> Because it supports both CSUM_IP and CSUM_IP_FRAGS. Probably ti(4)
>>> is the only controller that supports TCP/UDP checksum offloading
>>> for an IP fragmented packet.
>>
>> This is a bit weird if it doesn't do the fragmentation itself.
>> Computing the IP header checksum doesn't differ for normal and
>> fragmented packets.  The protocol checksum (TCP or UDP) stays
>> the same for in the case of IP level fragmentation.  It is only
>> visible in the first fragment which includes the protocol header.
>
> My interpretation for CSUM_IP_FRAGS works like the following.
>   - Only peuso header checksum for TCP/UDP is computed by upper
>     stack.
>   - Controller has no ability to fragment the packet so it should
>     done in upper stack(i.e. ip_output()).
>   - When ip_output() has to fragment the packet, it just fragments
>     the packet without completing TCP/UDP and IP checksum. If
>     controller does not support CSUM_IP_FRAGS feature, ip_output()
>     can't delay TCP/UDP checksum in this stage.
>   - The fragmented packets are sent to driver. Driver sets
>     appropriate bits of DMA descriptor based on fragmentation field
>     of mbuf(M_FRAG, M_LASTFRAG) and issue the frame to controller.
>   - The firmware of controller queues the fragmented frames up in
>     its internal memory and hold off sending out the frames since it
>     has to compute TCP/UDP checksum. When it sees a frame which
>     indicates the end of fragmented frame it finally computes
>     TCP/UDP checksum and send each frame out to wire by computing
>     IP checksum on the fly.
> The difference is which one(upper stack vs. controller) computes
> TCP/UDP/IP checksum.

Such a behavior doesn't make much sense and probably wasn't used at all
in practice.  It's very complex as well.  Plus you can't guarantee that
there won't be other packet slipping into the interface queue in an SMP
world.

IP fragmentation really isn't done for TCP within the kernel.  We try
to prevent it as it would have a huge performance impact. Hence the
internal MTU discovery and the Don't Fragment bit set on TCP packets.

IP fragmentation does happen for large UDP packet locally generated.
There however because of the past absence of UDP fragmentation offload
coupled with UDP checksum offloading caused all fragmentation to be
done at the UDP level before it hits ip_output.

The remaining use of IP fragmentation is when the machine is acting
as a router and it has to send packets out on an interface with a
smaller MTU than the one it came in on.

So the only two useful features regarding UDP+IP fragmentation are:

  1. IP fragmentation including UDP checksum calculation for locally
     generated large UDP packets.  This is the TSO for UDP.

  2. Pure IP fragmentation for in-transit packets.  Here only the
     IP header checksum needs to be recalculated for each fragment.
     The layer 4 checksums (UDP, TCP and others) stay the same.

-- 
Andre

>>
>>>> software checksums, and thus won't clear these flags.
>>>>
>>>> Potentially a driver that announces one flag in if_hwassist but relies on
>>>> couple of flags to be set on mbuf is not correct. If a driver can't do
>>>> single
>>>> checksum  processing independently from others, then it should set or
>>>> clear
>>>> appropriate flags in if_hwassist as a group.
>>>
>>> Hmm, then what would be best way to achieve CSUM_IP_FRAGS in
>>> driver? I don't have clear idea how to utilize the hardware
>>> feature. The stack should tell that the mbuf needs TCP/UDP checksum
>>> offloading for IP fragmented packet(i.e. CSUM_IP_FRAGS is not set by
>>> upper stack).
>>
>> As I said there can't be fragment checksumming without hardware
>
> It's up to controller's firmware. It does not send the fragmented
> frame until it computes TCP/UDP checksum.
>
>> based fragmentation.  We have three cases here:
>>
>>   1. TSO where the hardware does the segmentation, TCP and IP header
>>      checksums for each generated packet.
>>   2. IP packet fragmentation where a packet is split, the IP header
>>      checksum is recomputed for each fragment, but the protocol csum
>>      stays the same and is not modified.
>>   3. UDP fragmentation where a large packet is sent to the hardware
>>      and it generates first the UDP checksum and then splits it into
>>      IP fragments each with its own IP header checksum.
>>
>> So we end up with these possible large send hardware offload capabilities:
>>   TSO: including IPv4hdr and TCP checksumming
>>   UDP fragmentation: including IPv4hdr and UDP checksumming
>>   IP fragmentation: including IPv4hdr checksumming
>>
>> Besides that we have the packet <= MTU sized offload capabilities:
>>   TCP checksumming
>>   UDP checksumming
>>   SCTP checksumming
>>   IPv4hdr checksumming
>>
>>>> Y> > A> >      hardware. Some driver may not announce CSUM_IP in theur
>>>> if_hwassist,
>>>>                 ^^^^^^^^
>>>>
>>>> Oh, that was a typo! Software was meant.
>>
>> That explains quite a bit of confusion.
>>
>> --
>> Andre
>>
>
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5090DA4A.5090502>