From owner-freebsd-net@FreeBSD.ORG  Sun Mar 21 07:17:28 2010
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id AB51A106564A;
	Sun, 21 Mar 2010 07:17:28 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au
	[211.29.132.184])
	by mx1.freebsd.org (Postfix) with ESMTP id 455548FC12;
	Sun, 21 Mar 2010 07:17:27 +0000 (UTC)
Received: from c122-106-171-192.carlnfd1.nsw.optusnet.com.au
	(c122-106-171-192.carlnfd1.nsw.optusnet.com.au [122.106.171.192])
	by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	o2L7HNn3012239
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sun, 21 Mar 2010 18:17:24 +1100
Date: Sun, 21 Mar 2010 18:17:23 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Rui Paulo <rpaulo@freebsd.org>
In-Reply-To: <2FEF7D36-31B7-4FD7-9350-EECA7B38D519@freebsd.org>
Message-ID: <20100321173545.O3020@delplex.bde.org>
References: <006f01cac6d8$5fc03cb0$1f40b610$@net>
	<2FEF7D36-31B7-4FD7-9350-EECA7B38D519@freebsd.org>
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="0-1105507168-1269155843=:3020"
Cc: freebsd-net@freebsd.org, Chris Harrer <cjharrer@comcast.net>
Subject: Re: Bug in tcp_output?
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 21 Mar 2010 07:17:28 -0000

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--0-1105507168-1269155843=:3020
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

On Sat, 20 Mar 2010, Rui Paulo wrote:

> On 18 Mar 2010, at 20:19, Chris Harrer wrote:
>>
>> In the following block of code, running on a x86_64 platform, I believe =
that
>> cwin should be declared as an int:
>> ...
>>               else {
>>
>>                       long cwin;  =DF-- Should be an int
>> ...
>>                       if (len > 0) {
>>
>>                               cwin =3D tp->snd_cwnd -
>>
>>                                      (tp->snd_nxt - tp->sack_newdata) -
>>
>>                                      sack_bytes_rxmt;
>>
>>                               if (cwin < 0)
>>
>>                                      cwin =3D 0;
>>
>>                               len =3D lmin(len, cwin);
>>
>>                       }
>>
>>               }
>>
>>        }
>>
>>
>>
>> Consider the case where:
>>
>> sack_rxmit =3D 0
>>
>> sack_bytes_rxmt =3D 0x2238
>>
>> off =3D 0
>>
>> len =3D0xa19c
>>
>> tp->snd_cwnd =3D 0x2238
>>
>> tp->snd_nxt =3D 0xdd6d7974
>>
>> tp->sack_newdata =3D 0xdd6d6858
>>
>> In this case cwin evaluates to 0x00000000ffffe37c, which is not <0, but
>> instead huge.  This causes the remaining data on the socket=92s so->so_s=
nd
>> buffer to be sent to the network causing more problems at the receiver w=
hich
>> is already dropping frames.
>
> I see. This is most likely a bug. Can you send-pr so this doesn't get los=
t?

What bug do you see?  This is most likely not a bug.  I only see the
following bugs
- the suggestion to increase the fragility of the code by changing cwin to
   int
- lots of whitespace lossage
- the style bug in the declaration of cwin (nested declaration)
- lots fragile but working code.  It depends on the machine being a normal
   2's complement one.  It would fail on normal 1's complement machines and
   on abnormal 2's complement ones, but so would many other things in the
   kernel.
- type and arithmetic errors that are not made at runtime resulting in a
   value that wouldn't work, though the runtime value would.

Relevant code quoted again, with the whitespace fixed:

>>                               cwin =3D tp->snd_cwnd -
>>                                      (tp->snd_nxt - tp->sack_newdata) -
>>                                      sack_bytes_rxmt;

On 64-bit machines, with the above values, this is:

 =09=09=09=09rhs =3D (u_long)0x2238UL -
 =09=09=09=09      ((tcp_seq)0xdd6d7974 -
 =09=09=09=09       (tcp_seq)0xdd6d6858) -
 =09=09=09=09      (int)0x2238;
 =09=09=09=09    =3D (u_long)0x2238UL -
 =09=09=09=09      ((uint32_t)0xdd6d7974 -
 =09=09=09=09       (uint32_t)0xdd6d6858) -
 =09=09=09=09      (int)0x2238;
 =09=09=09=09    =3D (u_long)0x2238UL -
 =09=09=09=09      (u_int)0x111c -
 =09=09=09=09      (int)0x2238;
 =09=09=09=09    =3D (u_long)0x111c -
 =09=09=09=09      (int)0x2238;
 =09=09=09=09    =3D (u_long)0x111c -
 =09=09=09=09      (u_long)0x2238;
 =09=09=09=09    =3D (u_long)0xffffffffffffeee4;
 =09=09=09=09cwin =3D (long)rhs;
 =09=09=09=09     =3D -(long)0x111c;

I might have made arithmetic errors too, but I'm sure that I got the
conversions essentially correct.  On machines with 64-bit u_longs,
almost everything is evaluated modulo 2^64.  This gives a large positive
value, but not one with the top bits set up to only the 31st as would
happen on machines with 32-bit u_longs.  Then the final conversion to
long gives a negative value.

This is fragile, but it is a standard 2's complement hack.  It would
fail mainly on normal ones complement machines when the rhs is
(u_long)0xFF...FF.  Then the lhs is probably negative 0, which is
not less than 0.

The fragility is essentially the same on machines with 32-bit u_longs.
Almost everything is evaluated modulo 2^32...

Using 64-bit u_longs for tp->snd_cwnd (and thus for almost the entire
calculation) is exessive but doesn't cause any problems.

Using a signed type for sack_bytes_rxmt asks for sign extension bugs but
doesn't get them.  Here it is promoted to a u_long so there are no
sign extension bugs for it here.

Using a signed type for cwin is essential for the comparison of cwin
with 0 to work.  This signed type should have the same size as the rhs
to avoid even more fragility (if it were int, then you would have to
worry about the value being changed to a non-working value by the
implementation-defined conversion of the rhs to cwin not just for
values larger than LONG_MAX but also for ones larger than INT_MAX.
`int' should work in practice.  This and other things depend on the
difference of the tcp_seq's not being anywhere near as large as
0x7fffffff).

Bruce
--0-1105507168-1269155843=:3020--