From owner-freebsd-net@FreeBSD.ORG Wed Mar 6 01:54:18 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id C3E81CF3; Wed, 6 Mar 2013 01:54:18 +0000 (UTC) (envelope-from lstewart@freebsd.org) Received: from lauren.room52.net (lauren.room52.net [210.50.193.198]) by mx1.freebsd.org (Postfix) with ESMTP id 482419A6; Wed, 6 Mar 2013 01:54:17 +0000 (UTC) Received: from lstewart.caia.swin.edu.au (lstewart.caia.swin.edu.au [136.186.229.95]) by lauren.room52.net (Postfix) with ESMTPSA id 414EC7E84A; Wed, 6 Mar 2013 12:54:15 +1100 (EST) Message-ID: <5136A1C6.4000406@freebsd.org> Date: Wed, 06 Mar 2013 12:54:14 +1100 From: Lawrence Stewart User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130213 Thunderbird/17.0.2 MIME-Version: 1.0 To: Andre Oppermann Subject: Re: Bug in sbsndptr() References: <512CBADB.3050004@freebsd.org> <5134CD5D.6090107@freebsd.org> <513564AD.7000006@freebsd.org> <5135FB48.1000809@freebsd.org> In-Reply-To: <5135FB48.1000809@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=0.0 required=5.0 tests=UNPARSEABLE_RELAY autolearn=unavailable version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on lauren.room52.net Cc: "freebsd-net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2013 01:54:18 -0000 On 03/06/13 01:03, Andre Oppermann wrote: > On 05.03.2013 04:21, Lawrence Stewart wrote: >> On 03/05/13 03:35, Andre Oppermann wrote: >>> On 26.02.2013 14:38, Lawrence Stewart wrote: >>>> Hi Andre, >>> >>> Hi Lawrence, :-) >>> >>>> A colleague and I spent a very frustrating day tracing an accounting >>>> bug >>>> in the multipath TCP patch we're working on at CAIA to a bug in >>>> sbsndptr(). I haven't tested it with regular TCP yet, but I believe the >>>> following patch fixes the bug (proposed commit log message is at the >>>> top >>>> of the patch): >>>> >>>> http://people.freebsd.org/~lstewart/patches/misctcp/sbsndptr_mnext_10.x.r247314.diff >>>> >>>> >>>> >>>> The patch should have no tangible effect to operation other than to >>>> ensure the function delivers on the promise to return the closest mbuf >>>> in the chain for the given offset. >>> >>> I agree that the description of sbsndptr() can be misleading as it >>> refers >>> to the point in time when the pointer was updated last. Relative to now >>> the real offset may be at the beginning of the next mbuf. >> >> Right, and we ran into the issue because we made an assumption based on >> the use of the present tense in the comment: >> >> "Return closest mbuf in chain for current offset." > > I apologize for the incorrect and misleading description. :-) No drama, just explaining the crux of the problem from our perspective so it's clear why we ran into this. >>> As you note in the proposed commit message by the time the send pointer >>> is calculated we may have reached the end of the chain and must avoid >>> storing a NULL pointer. The mbuf copy routines simply skips over the >>> additional mbuf in the chain using the returned offset. >>> >>> I wonder how this has caused trouble with your multipath patch. You'd >>> have to copy the sockbuf contents as well and unless you're using custom >>> sockbuf and mbuf chain functions this shouldn't be a problem. Using >>> custom functions on a socket buffer is a delicate approach. For a >>> sockbuf >>> consumer being able to handle valid offsets into an mbuf chain is a core >>> feature and must-have part of the functionality. >> >> No custom sockbuf or mbuf routines are in use. We've implemented a >> mapping shim between subflows and the socket buffer. When a subflow asks >> the multipath layer for some data to send, the multipath layer returns a >> mapping onto the socket buffer, which will remain valid until such time >> as the subflow has marked the mapped data as acknowledged. >> >> Part of the map accounting is tracking the pointer of the first mbuf in >> the sockbuf where the map's data begins. Our accounting assumed the mbuf >> + the offset returned by sbsndptr had data available, which is how we >> triggered the problem. We could have accounted for the issue in our new >> map accounting code, but that would add additional complexity to some >> already complex code and the better solution is to make sbsndptr DTRT. > > So effectively you run a separate sbsndptr for each subflow using the > real sbsndptr to track the head of the queue? Yes, essentially works as you describe. The initial goal/design was to make multi-stream support a first class citizen inside the socket buffer, but we ran out of time to do this. The design we've come up with is a reasonable interim to get to an alpha patch release, which should be happening later this week if you're interested to take a look. We'll make an announcement when it's up on the website. > /me fears the day a mptcp import comes up. tcp-complexity^^3. :-o Yeah it's pretty invasive but does bring some useful features too. There is a lot more work to do before I'd consider proposing we import it into the stack and even then, we'll want to have a robust discussion about when and how to do it. Given that this is being done as part of a research project, we've also taken the opportunity to experiment with changing some ideas and idiosyncrasies in the existing stack code and will be doing a lot of experimental research with the stack and iteratively refining things as we go. >>>> I would appreciate a review and any thoughts. >>> >>> I think you have found a valid (micro-)optimization. However you're >>> still making a dangerous assumption in that the next mbuf is indeed >>> the one you want. This may not be true in subtle ways when the chain >>> contains m_len=0 mbufs in it. I'm not aware of it actually happening >>> but it can't be ruled out either if custom sockbuf manipulation >>> functions >>> are in use. >> >> True, though I'm struggling to think why there would be m_len=0 mbufs >> interspersed with m_len > 0 mbufs in a socket send buffer mbuf chain. > > sbcompress() doesn't allow for m_len=0 mbufs. This holds true as long > as the sbappend functions are used. If not, we may get anything there. > As long as nobody is using custom sockbuf appends we're safe. Because > I first assumed from your description some custom sockbuf munging the > guarantee wouldn't haven been there anymore. Ok cool. >>> I'd recommend the following: >>> have you custom sockbuf function handle forward seeking like the other >>> m_copy() functions; and/or apply a patch along the (untested) example >>> below. >> >> If you believe it is both correct and possible for m_len=0 mbufs to >> exist in a socket buffer chain, then I agree that we should amend my >> proposed patch to loop and skip over m_len=0 mbufs as you've suggested. > > No. So far it is neither possible or correct. > >> However, I'm more inclined to suspect it is undesirable and potentially >> buggy behaviour to end up with m_len=0 mbufs in a socket buffer chain on >> which sbsndptr is being used, and would instead suggest a >> "KASSERT(ret->m_len > 0, (...));" be added to the end of my proposed if >> block. > > Agreed. How does this look? http://people.freebsd.org/~lstewart/patches/misctcp/sbsndptr_mnext_10.x.r247314_v2.diff Sockbuf code is tricky so I'll test this for a while and commit after it has had a reasonable run and not shown any side effects. Cheers, Lawrence