From owner-freebsd-net@FreeBSD.ORG Sun Mar 21 13:15:32 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7671A1065672 for ; Sun, 21 Mar 2010 13:15:32 +0000 (UTC) (envelope-from rpaulo@gmail.com) Received: from mail-bw0-f228.google.com (mail-bw0-f228.google.com [209.85.218.228]) by mx1.freebsd.org (Postfix) with ESMTP id E91DC8FC14 for ; Sun, 21 Mar 2010 13:15:31 +0000 (UTC) Received: by bwz28 with SMTP id 28so4137001bwz.14 for ; Sun, 21 Mar 2010 06:15:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:subject:mime-version :content-type:from:in-reply-to:date:cc:content-transfer-encoding :message-id:references:to:x-mailer; bh=QKTwg/QDWI3MpToNHCB94nF45B1VxmjoL71RoBh+9qo=; b=JRghZSq6T504/3IJgxMV0ERbJyOYbwTcS2fbfgJXz7YHars/mqWO+9sLi87IS2MhdF PvG/x9QqrDR1HIa4ON8bhbAXFxH7mNF01hoW3gD7szVsK0dhwcI1GOfG9OnTAXPXWvXp EQDbgCQcjmTtj8rnGGdl+o/WdHvAAjq24XPls= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; b=jgvhOkW6eOtdAdonXVmxmXO6ebIvy6r/2xOkfZVBDKh3354rgQxXYxjqAhyeA54b1o JVx+6e+TEPdmtBmJe3KDd4RYvGbNBxaMBvQ93K0o6KQ54w71cDfOJPjq4L8bQ5SQmHGK tgEgVEoNfMkvRaet3ZAvL0er0zTQ3jwZ9HzFg= Received: by 10.204.174.194 with SMTP id u2mr1806360bkz.40.1269177330652; Sun, 21 Mar 2010 06:15:30 -0700 (PDT) Received: from [10.0.10.2] (54.81.54.77.rev.vodafone.pt [77.54.81.54]) by mx.google.com with ESMTPS id l1sm14538915bkl.20.2010.03.21.06.15.29 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sun, 21 Mar 2010 06:15:29 -0700 (PDT) Sender: Rui Paulo Mime-Version: 1.0 (Apple Message framework v1077) Content-Type: text/plain; charset=windows-1252 From: Rui Paulo In-Reply-To: <20100321173545.O3020@delplex.bde.org> Date: Sun, 21 Mar 2010 13:15:27 +0000 Content-Transfer-Encoding: quoted-printable Message-Id: <8FEEC471-81FB-4FCA-BA4E-8E90048EFD8B@freebsd.org> References: <006f01cac6d8$5fc03cb0$1f40b610$@net> <2FEF7D36-31B7-4FD7-9350-EECA7B38D519@freebsd.org> <20100321173545.O3020@delplex.bde.org> To: Bruce Evans X-Mailer: Apple Mail (2.1077) Cc: freebsd-net@freebsd.org, Chris Harrer Subject: Re: Bug in tcp_output? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Mar 2010 13:15:32 -0000 On 21 Mar 2010, at 07:17, Bruce Evans wrote: > On Sat, 20 Mar 2010, Rui Paulo wrote: >=20 >> On 18 Mar 2010, at 20:19, Chris Harrer wrote: >>>=20 >>> In the following block of code, running on a x86_64 platform, I = believe that >>> cwin should be declared as an int: >>> ... >>> else { >>>=20 >>> long cwin; =DF-- Should be an int >>> ... >>> if (len > 0) { >>>=20 >>> cwin =3D tp->snd_cwnd - >>>=20 >>> (tp->snd_nxt - tp->sack_newdata) = - >>>=20 >>> sack_bytes_rxmt; >>>=20 >>> if (cwin < 0) >>>=20 >>> cwin =3D 0; >>>=20 >>> len =3D lmin(len, cwin); >>>=20 >>> } >>>=20 >>> } >>>=20 >>> } >>>=20 >>>=20 >>>=20 >>> Consider the case where: >>>=20 >>> sack_rxmit =3D 0 >>>=20 >>> sack_bytes_rxmt =3D 0x2238 >>>=20 >>> off =3D 0 >>>=20 >>> len =3D0xa19c >>>=20 >>> tp->snd_cwnd =3D 0x2238 >>>=20 >>> tp->snd_nxt =3D 0xdd6d7974 >>>=20 >>> tp->sack_newdata =3D 0xdd6d6858 >>>=20 >>> In this case cwin evaluates to 0x00000000ffffe37c, which is not <0, = but >>> instead huge. This causes the remaining data on the socket=92s = so->so_snd >>> buffer to be sent to the network causing more problems at the = receiver which >>> is already dropping frames. >>=20 >> I see. This is most likely a bug. Can you send-pr so this doesn't get = lost? >=20 > What bug do you see? This is most likely not a bug. I only see the > following bugs > - the suggestion to increase the fragility of the code by changing = cwin to > int > - lots of whitespace lossage > - the style bug in the declaration of cwin (nested declaration) > - lots fragile but working code. It depends on the machine being a = normal > 2's complement one. It would fail on normal 1's complement machines = and > on abnormal 2's complement ones, but so would many other things in = the > kernel. > - type and arithmetic errors that are not made at runtime resulting in = a > value that wouldn't work, though the runtime value would. >=20 > Relevant code quoted again, with the whitespace fixed: >=20 >>> cwin =3D tp->snd_cwnd - >>> (tp->snd_nxt - tp->sack_newdata) = - >>> sack_bytes_rxmt; >=20 > On 64-bit machines, with the above values, this is: >=20 > rhs =3D (u_long)0x2238UL - > ((tcp_seq)0xdd6d7974 - > (tcp_seq)0xdd6d6858) - > (int)0x2238; > =3D (u_long)0x2238UL - > ((uint32_t)0xdd6d7974 - > (uint32_t)0xdd6d6858) - > (int)0x2238; > =3D (u_long)0x2238UL - > (u_int)0x111c - > (int)0x2238; > =3D (u_long)0x111c - > (int)0x2238; > =3D (u_long)0x111c - > (u_long)0x2238; > =3D (u_long)0xffffffffffffeee4; > cwin =3D (long)rhs; > =3D -(long)0x111c; >=20 > I might have made arithmetic errors too, but I'm sure that I got the > conversions essentially correct. On machines with 64-bit u_longs, > almost everything is evaluated modulo 2^64. This gives a large = positive > value, but not one with the top bits set up to only the 31st as would > happen on machines with 32-bit u_longs. Then the final conversion to > long gives a negative value. Right. I made some bad calculations. >=20 > This is fragile, but it is a standard 2's complement hack. It would > fail mainly on normal ones complement machines when the rhs is > (u_long)0xFF...FF. Then the lhs is probably negative 0, which is > not less than 0. >=20 > The fragility is essentially the same on machines with 32-bit u_longs. > Almost everything is evaluated modulo 2^32... >=20 > Using 64-bit u_longs for tp->snd_cwnd (and thus for almost the entire > calculation) is exessive but doesn't cause any problems. >=20 > Using a signed type for sack_bytes_rxmt asks for sign extension bugs = but > doesn't get them. Here it is promoted to a u_long so there are no > sign extension bugs for it here. >=20 > Using a signed type for cwin is essential for the comparison of cwin > with 0 to work. Right. > This signed type should have the same size as the rhs > to avoid even more fragility (if it were int, then you would have to > worry about the value being changed to a non-working value by the > implementation-defined conversion of the rhs to cwin not just for > values larger than LONG_MAX but also for ones larger than INT_MAX. > `int' should work in practice. This and other things depend on the > difference of the tcp_seq's not being anywhere near as large as > 0x7fffffff). I assumed that Chris saw a problem with this code after being hit by = some TCP/IP interop issue. Was this the case? -- Rui Paulo