From owner-freebsd-net@FreeBSD.ORG Sun Mar 21 07:17:28 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AB51A106564A; Sun, 21 Mar 2010 07:17:28 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au [211.29.132.184]) by mx1.freebsd.org (Postfix) with ESMTP id 455548FC12; Sun, 21 Mar 2010 07:17:27 +0000 (UTC) Received: from c122-106-171-192.carlnfd1.nsw.optusnet.com.au (c122-106-171-192.carlnfd1.nsw.optusnet.com.au [122.106.171.192]) by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o2L7HNn3012239 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 21 Mar 2010 18:17:24 +1100 Date: Sun, 21 Mar 2010 18:17:23 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Rui Paulo In-Reply-To: <2FEF7D36-31B7-4FD7-9350-EECA7B38D519@freebsd.org> Message-ID: <20100321173545.O3020@delplex.bde.org> References: <006f01cac6d8$5fc03cb0$1f40b610$@net> <2FEF7D36-31B7-4FD7-9350-EECA7B38D519@freebsd.org> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="0-1105507168-1269155843=:3020" Cc: freebsd-net@freebsd.org, Chris Harrer Subject: Re: Bug in tcp_output? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Mar 2010 07:17:28 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-1105507168-1269155843=:3020 Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Sat, 20 Mar 2010, Rui Paulo wrote: > On 18 Mar 2010, at 20:19, Chris Harrer wrote: >> >> In the following block of code, running on a x86_64 platform, I believe = that >> cwin should be declared as an int: >> ... >> else { >> >> long cwin; =DF-- Should be an int >> ... >> if (len > 0) { >> >> cwin =3D tp->snd_cwnd - >> >> (tp->snd_nxt - tp->sack_newdata) - >> >> sack_bytes_rxmt; >> >> if (cwin < 0) >> >> cwin =3D 0; >> >> len =3D lmin(len, cwin); >> >> } >> >> } >> >> } >> >> >> >> Consider the case where: >> >> sack_rxmit =3D 0 >> >> sack_bytes_rxmt =3D 0x2238 >> >> off =3D 0 >> >> len =3D0xa19c >> >> tp->snd_cwnd =3D 0x2238 >> >> tp->snd_nxt =3D 0xdd6d7974 >> >> tp->sack_newdata =3D 0xdd6d6858 >> >> In this case cwin evaluates to 0x00000000ffffe37c, which is not <0, but >> instead huge. This causes the remaining data on the socket=92s so->so_s= nd >> buffer to be sent to the network causing more problems at the receiver w= hich >> is already dropping frames. > > I see. This is most likely a bug. Can you send-pr so this doesn't get los= t? What bug do you see? This is most likely not a bug. I only see the following bugs - the suggestion to increase the fragility of the code by changing cwin to int - lots of whitespace lossage - the style bug in the declaration of cwin (nested declaration) - lots fragile but working code. It depends on the machine being a normal 2's complement one. It would fail on normal 1's complement machines and on abnormal 2's complement ones, but so would many other things in the kernel. - type and arithmetic errors that are not made at runtime resulting in a value that wouldn't work, though the runtime value would. Relevant code quoted again, with the whitespace fixed: >> cwin =3D tp->snd_cwnd - >> (tp->snd_nxt - tp->sack_newdata) - >> sack_bytes_rxmt; On 64-bit machines, with the above values, this is: =09=09=09=09rhs =3D (u_long)0x2238UL - =09=09=09=09 ((tcp_seq)0xdd6d7974 - =09=09=09=09 (tcp_seq)0xdd6d6858) - =09=09=09=09 (int)0x2238; =09=09=09=09 =3D (u_long)0x2238UL - =09=09=09=09 ((uint32_t)0xdd6d7974 - =09=09=09=09 (uint32_t)0xdd6d6858) - =09=09=09=09 (int)0x2238; =09=09=09=09 =3D (u_long)0x2238UL - =09=09=09=09 (u_int)0x111c - =09=09=09=09 (int)0x2238; =09=09=09=09 =3D (u_long)0x111c - =09=09=09=09 (int)0x2238; =09=09=09=09 =3D (u_long)0x111c - =09=09=09=09 (u_long)0x2238; =09=09=09=09 =3D (u_long)0xffffffffffffeee4; =09=09=09=09cwin =3D (long)rhs; =09=09=09=09 =3D -(long)0x111c; I might have made arithmetic errors too, but I'm sure that I got the conversions essentially correct. On machines with 64-bit u_longs, almost everything is evaluated modulo 2^64. This gives a large positive value, but not one with the top bits set up to only the 31st as would happen on machines with 32-bit u_longs. Then the final conversion to long gives a negative value. This is fragile, but it is a standard 2's complement hack. It would fail mainly on normal ones complement machines when the rhs is (u_long)0xFF...FF. Then the lhs is probably negative 0, which is not less than 0. The fragility is essentially the same on machines with 32-bit u_longs. Almost everything is evaluated modulo 2^32... Using 64-bit u_longs for tp->snd_cwnd (and thus for almost the entire calculation) is exessive but doesn't cause any problems. Using a signed type for sack_bytes_rxmt asks for sign extension bugs but doesn't get them. Here it is promoted to a u_long so there are no sign extension bugs for it here. Using a signed type for cwin is essential for the comparison of cwin with 0 to work. This signed type should have the same size as the rhs to avoid even more fragility (if it were int, then you would have to worry about the value being changed to a non-working value by the implementation-defined conversion of the rhs to cwin not just for values larger than LONG_MAX but also for ones larger than INT_MAX. `int' should work in practice. This and other things depend on the difference of the tcp_seq's not being anywhere near as large as 0x7fffffff). Bruce --0-1105507168-1269155843=:3020--