From owner-freebsd-arch@FreeBSD.ORG  Fri Feb  6 08:56:25 2015
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 46A20A0E
 for <freebsd-arch@freebsd.org>; Fri,  6 Feb 2015 08:56:25 +0000 (UTC)
Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au
 [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id E426BDAC
 for <freebsd-arch@freebsd.org>; Fri,  6 Feb 2015 08:56:24 +0000 (UTC)
Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au
 (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197])
 by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 7EB47D66505;
 Fri,  6 Feb 2015 19:56:15 +1100 (AEDT)
Date: Fri, 6 Feb 2015 19:56:15 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Anuranjan Shukla <anshukla@juniper.net>
Subject: Re: Buggy sbspace() on 64bit builds?
In-Reply-To: <D0F95E21.2489D%anshukla@juniper.net>
Message-ID: <20150206183036.S1246@besplex.bde.org>
References: <D0F95E21.2489D%anshukla@juniper.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=Za4kaKlA c=1 sm=1 tr=0
 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=kj9zAlcOel0A:10
 a=JzwRw_2MAAAA:8 a=CIKehoALdAUTTtdk9aIA:9 a=CjuIK1q_8ugA:10
Cc: Simon Gerraty <sjg@juniper.net>,
 "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 06 Feb 2015 08:56:25 -0000

On Fri, 6 Feb 2015, Anuranjan Shukla wrote:

> The way sbspace() is done today, it stores the result of subtraction of
> socket buffer variables (u_int) in longs, and returns a long. If one of
> the subtractions results in a -ve result (the other being positive), it's
> seen as a large +ve and sbspace() ends up returning the wrong value.

Old versions used bogus casts to work in practice.

>    I'm not sure if this is enough of a corner case for consumers at large
> to experience it, but at Juniper some of our implementation uses sbspace
> directly and trips up on this for amd64 builds. Any thoughts on what a fix
> should be for this?
>
> ---------------------------
> long
> sbspace(struct sockbuf *sb)
> {
>        long bleft;
>        long mleft;
>
>        if (sb->sb_flags & SB_STOP)
>                return(0);
>        bleft = sb->sb_hiwat - sb->sb_cc;

This seems to be much more than a corner case.  It is normal for high
water marks to be exceeded (perhaps not so normal for sockets).  Whenever
the high water mark is exceeded, the subtraction overflows to a large
u_int value.  Then on 32-bit arches, assignment to long overflows again
but gives the correct value, but on 64-bit arches it doesn't overflow
and the result is a large positive long value.

>        mleft = sb->sb_mbmax - sb->sb_mbcnt;
>        return((bleft < mleft) ? bleft : mleft);
>
> }

This isn't quite the same as in -current from 2 months ago.  -current
also has an ifdef and different style bugs.

The working version in FreeBSD-5 is:

X /*
X  * How much space is there in a socket buffer (so->so_snd or so->so_rcv)?
X  * This is problematical if the fields are unsigned, as the space might
X  * still be negative (cc > hiwat or mbcnt > mbmax).  Should detect
X  * overflow and return 0.  Should use "lmin" but it doesn't exist now.
X  */
X #define	sbspace(sb) \
X     ((long) imin((int)((sb)->sb_hiwat - (sb)->sb_cc), \
X 	 (int)((sb)->sb_mbmax - (sb)->sb_mbcnt)))

-current has the same comment with the last 2 sentences removed.  These
sentences were a poor description of the hackish fix.  They were removed
together with changing the hackish fix to a bug.  The warning about the
bug remains.

All of the sb_ fields in the above are still u_int.  This implements the
intermediate C programmer mistake of using unsigned types for values that
are known to be nonnegative.  This gives overflow and sign extension bugs
here (strictly, non-overflow that is harmful, and non-sign extension that
is harmful).  Plain int should be enough for holding nonnegative values
for anyone.  It wasn't with 16-bit ints, but FreeBSD and possibly BSD
never supported any system with 16-bit ints.  32-bit ints are enough here.

The inner bogus cast compensate for the type botch by converting to int.
It has to know that the field types are at most unsigned with the same
size, and not get sign extension bugs as in the current version by
converting to a larger size.  This depends on no overflow in the
expression occuring when everything is done with plain ints.  This is
satisified since 32-bit ints are enough for anyone.  Buffer bloat starts
with buffers much smaller than 2G.

Then imin() is used to compare the values.  imin() is hard to use since
it only works for args that are either int or representable as int, but
here they are known to be _mis_representable as int if we didn't already
cast them; imin() would do the same casts automatically, and this is
what is wanted here.  We do the conversions explicitly because working
around the type errors is hard enough to understand even when it is
explicit.

Finally, we cast to long because that is what sbspace() always returned.
This was only needed to support 16-bit ints.  Old BSD code used longs
excessively, and this has not been fixed in many internal APIs.

Old code using corrected API:

X #define	sbspace(sb) \
X 	lmin((long)(sb)->sb_hiwat - (sb)->sb_cc, \
X 	   (long)(sb)->sb_mbmax - (sb)->sb_mbcnt)

Other bugs in the old code:
- the comment about lmin() not existing has been false for 20+ years
- using lmin() makes only a cosmetic difference.  Delicate casts are
   still needed.  Implicit ones would work, but I wrote one explicit
   one.  On 64-bit arches, this just works, but on 32-bit arches we
   start with u_int and truncate to longs; this depends on the u_int
   values being much smaller than LONG_MAX, just like they are much
   smaller than INT_MAX.
- overflow is technically impossible, so the comment shouldn't mention it
- after detecting the technically impossible overflow, there is no need
   to return 0.  sbspace() returns a count, and a negative count works
   just as well as 0, sincd the API is not broken by using unsigned types.

In the current version, changing the local variables from long to int
would restore the delicate conversions, slightly more obfuscated by
using redundant explicit assignments instead of redundant explicit casts.
Or more clearly, cast all the field values to int.  There is no point in
keeping the variables long and casting to long instead -- on 32-bit machines,
this is equally delicate.  But keep the long return for compatibility.

Or using only unsigned types and overflow checking, write more complicated
code, like people had to do for counting before negative numbers were
invented: after also fixing some style bugs:

long
sbspace(struct sockbuf *sb)
{
 	u_int bleft, mleft;

 	if (sb->sb_flags & SB_STOP)
 		return (0);
 	bleft = sb->sb_hiwat < sb->sb_cc ? 0 : sb->sb_hiwat - sb->sb_cc;
 	mleft = sb->sb_mbmax < sb->sb_mbcnt ? 9 : sb->sb_mbmax - sb->sb_mbcnt;
 	return (min(bleft, mleft));
}

This still needs range analysis to ensure that the u_int returned by min()
is representable as a long.  This follows from everything being much smaller
than INT32_MAX, so we should have used int all along.

Recent bugs from overflow in 'int ticks' were not quite the opposite of this.
'ticks' really does need to be unsigned, but some differences of it probably
need be signed.  It is signed because that is what it was originally and
that is probably still needed to avoid sign extension bugs for differences
of it, and its overflow used to be benign.  Now -fwrapv makes its overflow
benign again and hides overflow bugs elswhere.

I used to think that overflow in conversions gives undefined behaviour.
Actually, it gives defined behaviour with an implementation-defined result.
The bogus casts in the above depend on the result being reasonable for
conversion from a large unsigned value to a signed integer (int or long)
of the same size.  A flag like -fwrapv might be needed to stop the
compiler defining its result as weird.  In the 1's complement case, the
reasonable result for a difference of -1U is -0 = 0, but we would prefer
a result of -1.  This is not a problem since both -0 and -1 mean no space.

Bruce