From owner-freebsd-transport@freebsd.org Mon Nov 16 20:41:47 2015 Return-Path: Delivered-To: freebsd-transport@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7517FA3042F for ; Mon, 16 Nov 2015 20:41:47 +0000 (UTC) (envelope-from hiren@strugglingcoder.info) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 580FD1F4D for ; Mon, 16 Nov 2015 20:41:47 +0000 (UTC) (envelope-from hiren@strugglingcoder.info) Received: by mailman.ysv.freebsd.org (Postfix) id 56A40A3042D; Mon, 16 Nov 2015 20:41:47 +0000 (UTC) Delivered-To: transport@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3C66AA3042A for ; Mon, 16 Nov 2015 20:41:47 +0000 (UTC) (envelope-from hiren@strugglingcoder.info) Received: from mail.strugglingcoder.info (strugglingcoder.info [65.19.130.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.strugglingcoder.info", Issuer "mail.strugglingcoder.info" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 1DA5E1F4C for ; Mon, 16 Nov 2015 20:41:46 +0000 (UTC) (envelope-from hiren@strugglingcoder.info) Received: from localhost (unknown [10.1.1.3]) (Authenticated sender: hiren@strugglingcoder.info) by mail.strugglingcoder.info (Postfix) with ESMTPA id 80333C0EF6; Mon, 16 Nov 2015 12:41:45 -0800 (PST) Date: Mon, 16 Nov 2015 12:41:45 -0800 From: hiren panchasara To: Jonathan Looney Cc: FreeBSD Transports , Randall Stewart Subject: Re: Maintaining dupack counter per hole (was: The trouble with sack..) Message-ID: <20151116204145.GX29829@strugglingcoder.info> References: <20151030062423.GB5261@strugglingcoder.info> <20151116180644.GU29829@strugglingcoder.info> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="Cou6PmgoyP0+llr2" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-transport@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions of transport level network protocols in FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Nov 2015 20:41:47 -0000 --Cou6PmgoyP0+llr2 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 11/16/15 at 07:19P, Jonathan Looney wrote: > On 11/16/15, 1:06 PM, "hiren panchasara" > wrote: >=20 > >Getting back to this old thread... > > > >On 10/30/15 at 01:09P, Jonathan Looney wrote: > >>The SACK hole-tracking code is already quite complex. If we're going to > >> make a fundamental change, perhaps it is time to consider a rewrite, > >> rather than a smaller patch? Maybe this is the best code we can write. > >>Or, > >> maybe it is time for a re-coding to make it more easily accessible. > > > >As Randall said (in response to this), that rewrite is not necessary. And > >I agree to that. I don't see tcp_sack.c being in that bad shape that it > >demands a rewrite. But ofc, if you have something really cleaner/faster > >in mind, I'm all ears. :-) >=20 > I don't have anything better in mind. :-) >=20 > In fact, I think the code performs pretty well. And, that may be because > it prioritizes performance over readability. But, since I think we've all > stared at this code for a long time trying to figure out how it works > (note the repetitive comments about obscure code), I think it is worth > considering whether it is time to try to rewrite it to be more readable. >=20 > Having said that, I don't see any evidence that now is the time. I just > wanted to prompt the discussion. :-) Fair enough. And we should look out for such opportunities whenever possible :-) >=20 >=20 > >>=20 > >> Do you charge all three holes with the duplicate ACKs? Do you copy the > >> counter to the holes? > >>=20 > >> Or, is the fact that the ACK is slightly different enough to reset the > >> counter? > > > >Basically, whenever a hole gets broken up, subholes carry-forward the > >dupack strike counter. >=20 > This somewhat makes sense. And, it looks like you are charging all > existing holes for each ACK that doesn't cover them (regardless of the > fact that it isn't really a "duplicate ACK" in this case). That also > generally makes sense. Why do you say those are not duplicate acks? I know this goes into now the definition of a duplicate ack what what we mean by that but to me, this is *the* benefit we get from SACK that we can identify these guys as duplicate acks. >=20 > But, I think we could change this to help cases where packets are > re-ordered. I disagree with the proposal with reasons in-line below.=20 > > >> If you reset the counter anytime the hole is broken up, it will take a > >> while to get to three in a really out-of-order network scenario. On the > >> other hand, if you don't reset the counter, you may retransmit too fas= t. > > > >By 'too fast', I think you mean spurious retransmissions. If so, can you > >explain a bit more? >=20 > Sure. >=20 > Imagine that my network is load-balancing over a 4xGE bundle, but the > bundle links have different loads and slightly re-order the packets. >=20 > The packets arrive at the remote side out of order, like this: >=20 > 4500:5999 >=20 > 1500:2999 >=20 > 3000:4499 >=20 > 0:1499 >=20 > The remote side ACKs them in that order and we receive the ACKs in that > order. >=20 > After the third ACK, we will view the 0:1499 block as having three strikes > and will retransmit it (assuming that is our threshold), even though the > ACK will arrive momentarily. >=20 > Of course, I think that problem already exists with our code today. But, > my question is how the per-hole counter improves this. I don't have an articulated answer for this right now. One clear issue with the way we do retransmit is that we wait for 1/2 the inflight to drain. (Look at 'Compute the amount of data in flight....' comment and nearby code) > I think the bottom > line is that our code today assumes that we need to do a SACK retransmit > (for all outstanding SACK holes) as soon as the first outstanding byte is > not ACKd for three ACKs in a row. But getting up to this dupack thresh "three" is also broken right now. A slightly different but related problem. (As explained by Randall in 2) of 'The trouble with sack..' email on this list.) >=20 > If we instead reset the counter for a hole anytime the hole was partially > ACKd (including being split), we would retransmit less aggressively. In > this example, that would probably be good. >=20 > A middle-ground is to charge all holes every time we receive an ACK that > does not partially ACK the hole. If a hole is partially ACKd, neither > charge the ACK with a strike nor reset the counter. This is the idea I don't agree to. SACK which partially acks does tell us something important and IMO it'd be bad if don't consider that information and strike the counter. >=20 > In any case, I think a rule that any particular byte must be ACKd in one > of the next three packets after a higher byte is ACKd is potentially too > rigid given the myriad ways packets can be re-ordered. So, coming back to your example, it's a valid problem. But this is not a valid solution for it. When you know that you are introducing some=20 degree of reordering as a sender (as your example shows), the dupack threshold should be bumped up to that degree. Now, linux does this with some smart heuristics. It tracks per connection degree of reordering that dupack threshold is always set to that for any point in time. I am wondering it something like that'd help. OR at least, a sockopt that anyone can set (much simpler than coming up with heuristics logic for the first pass) for their dupack threshold. This would probably help your case very much. Apologies if I am all over the place in this response. Cheers, Hiren --Cou6PmgoyP0+llr2 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQF8BAABCgBmBQJWSj+FXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRBNEUyMEZBMUQ4Nzg4RjNGMTdFNjZGMDI4 QjkyNTBFMTU2M0VERkU1AAoJEIuSUOFWPt/lzYkH/0lEHuG/XbqeK+HNuumRxToe tFrP+Xit+mA3Gf4cOhExj8f5ti93CdfVLK07Hg/oRpN4GaVUupVQHu2HoSiSXvBL hOi10NIC7l2Mlf9agm5H1kiTlzIoWlWqaSOsviJjTzpm+r5aFyE62VZvkXL7BqWz YfSbJBLwg3ObZ7zDiPwlcRVtuUO9iEmGhwdTnyssrzXDsdeb4KJwZO0wgaAEsxY/ hWyFz8R/kM9Wdapy1t9lfTTxqifchv9/lbIsfAnkoBA1S5M5pu120rI5a1dbP/Jm fK/30qOyuil1tKLBRukQrrRjAwOaMiUZ7+98B73VDEpCUq+Rfbwjehryoo8+5VE= =uUsr -----END PGP SIGNATURE----- --Cou6PmgoyP0+llr2--