From owner-freebsd-net@FreeBSD.ORG  Tue Mar  5 14:03:57 2013
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 0075E801
 for <freebsd-net@freebsd.org>; Tue,  5 Mar 2013 14:03:56 +0000 (UTC)
 (envelope-from andre@freebsd.org)
Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2])
 by mx1.freebsd.org (Postfix) with ESMTP id 77294120
 for <freebsd-net@freebsd.org>; Tue,  5 Mar 2013 14:03:56 +0000 (UTC)
Received: (qmail 41221 invoked from network); 5 Mar 2013 15:17:45 -0000
Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2])
 (envelope-sender <andre@freebsd.org>)
 by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP
 for <lstewart@freebsd.org>; 5 Mar 2013 15:17:45 -0000
Message-ID: <5135FB48.1000809@freebsd.org>
Date: Tue, 05 Mar 2013 15:03:52 +0100
From: Andre Oppermann <andre@freebsd.org>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:17.0) Gecko/20130107 Thunderbird/17.0.2
MIME-Version: 1.0
To: Lawrence Stewart <lstewart@freebsd.org>
Subject: Re: Bug in sbsndptr()
References: <512CBADB.3050004@freebsd.org> <5134CD5D.6090107@freebsd.org>
 <513564AD.7000006@freebsd.org>
In-Reply-To: <513564AD.7000006@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Mar 2013 14:03:57 -0000

On 05.03.2013 04:21, Lawrence Stewart wrote:
> On 03/05/13 03:35, Andre Oppermann wrote:
>> On 26.02.2013 14:38, Lawrence Stewart wrote:
>>> Hi Andre,
>>
>> Hi Lawrence, :-)
>>
>>> A colleague and I spent a very frustrating day tracing an accounting bug
>>> in the multipath TCP patch we're working on at CAIA to a bug in
>>> sbsndptr(). I haven't tested it with regular TCP yet, but I believe the
>>> following patch fixes the bug (proposed commit log message is at the top
>>> of the patch):
>>>
>>> http://people.freebsd.org/~lstewart/patches/misctcp/sbsndptr_mnext_10.x.r247314.diff
>>>
>>>
>>> The patch should have no tangible effect to operation other than to
>>> ensure the function delivers on the promise to return the closest mbuf
>>> in the chain for the given offset.
>>
>> I agree that the description of sbsndptr() can be misleading as it refers
>> to the point in time when the pointer was updated last.  Relative to now
>> the real offset may be at the beginning of the next mbuf.
>
> Right, and we ran into the issue because we made an assumption based on
> the use of the present tense in the comment:
>
>      "Return closest mbuf in chain for current offset."

I apologize for the incorrect and misleading description. :-)

>> As you note in the proposed commit message by the time the send pointer
>> is calculated we may have reached the end of the chain and must avoid
>> storing a NULL pointer.  The mbuf copy routines simply skips over the
>> additional mbuf in the chain using the returned offset.
>>
>> I wonder how this has caused trouble with your multipath patch.  You'd
>> have to copy the sockbuf contents as well and unless you're using custom
>> sockbuf and mbuf chain functions this shouldn't be a problem.  Using
>> custom functions on a socket buffer is a delicate approach.  For a sockbuf
>> consumer being able to handle valid offsets into an mbuf chain is a core
>> feature and must-have part of the functionality.
>
> No custom sockbuf or mbuf routines are in use. We've implemented a
> mapping shim between subflows and the socket buffer. When a subflow asks
> the multipath layer for some data to send, the multipath layer returns a
> mapping onto the socket buffer, which will remain valid until such time
> as the subflow has marked the mapped data as acknowledged.
 >
> Part of the map accounting is tracking the pointer of the first mbuf in
> the sockbuf where the map's data begins. Our accounting assumed the mbuf
> + the offset returned by sbsndptr had data available, which is how we
> triggered the problem. We could have accounted for the issue in our new
> map accounting code, but that would add additional complexity to some
> already complex code and the better solution is to make sbsndptr DTRT.

So effectively you run a separate sbsndptr for each subflow using the
real sbsndptr to track the head of the queue?

/me fears the day a mptcp import comes up.  tcp-complexity^^3. :-o

>>> I would appreciate a review and any thoughts.
>>
>> I think you have found a valid (micro-)optimization.  However you're
>> still making a dangerous assumption in that the next mbuf is indeed
>> the one you want.  This may not be true in subtle ways when the chain
>> contains m_len=0 mbufs in it.  I'm not aware of it actually happening
>> but it can't be ruled out either if custom sockbuf manipulation functions
>> are in use.
>
> True, though I'm struggling to think why there would be m_len=0 mbufs
> interspersed with m_len > 0 mbufs in a socket send buffer mbuf chain.

sbcompress() doesn't allow for m_len=0 mbufs.  This holds true as long
as the sbappend functions are used.  If not, we may get anything there.
As long as nobody is using custom sockbuf appends we're safe.  Because
I first assumed from your description some custom sockbuf munging the
guarantee wouldn't haven been there anymore.

>> I'd recommend the following:
>> have you custom sockbuf function handle forward seeking like the other
>> m_copy() functions; and/or apply a patch along the (untested) example
>> below.
>
> If you believe it is both correct and possible for m_len=0 mbufs to
> exist in a socket buffer chain, then I agree that we should amend my
> proposed patch to loop and skip over m_len=0 mbufs as you've suggested.

No.  So far it is neither possible or correct.

> However, I'm more inclined to suspect it is undesirable and potentially
> buggy behaviour to end up with m_len=0 mbufs in a socket buffer chain on
> which sbsndptr is being used, and would instead suggest a
> "KASSERT(ret->m_len > 0, (...));" be added to the end of my proposed if
> block.

Agreed.

-- 
Andre