From owner-freebsd-arch@FreeBSD.ORG Fri Feb 6 08:56:25 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 46A20A0E for ; Fri, 6 Feb 2015 08:56:25 +0000 (UTC) Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id E426BDAC for ; Fri, 6 Feb 2015 08:56:24 +0000 (UTC) Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197]) by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 7EB47D66505; Fri, 6 Feb 2015 19:56:15 +1100 (AEDT) Date: Fri, 6 Feb 2015 19:56:15 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Anuranjan Shukla Subject: Re: Buggy sbspace() on 64bit builds? In-Reply-To: Message-ID: <20150206183036.S1246@besplex.bde.org> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=Za4kaKlA c=1 sm=1 tr=0 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=CIKehoALdAUTTtdk9aIA:9 a=CjuIK1q_8ugA:10 Cc: Simon Gerraty , "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Feb 2015 08:56:25 -0000 On Fri, 6 Feb 2015, Anuranjan Shukla wrote: > The way sbspace() is done today, it stores the result of subtraction of > socket buffer variables (u_int) in longs, and returns a long. If one of > the subtractions results in a -ve result (the other being positive), it's > seen as a large +ve and sbspace() ends up returning the wrong value. Old versions used bogus casts to work in practice. > I'm not sure if this is enough of a corner case for consumers at large > to experience it, but at Juniper some of our implementation uses sbspace > directly and trips up on this for amd64 builds. Any thoughts on what a fix > should be for this? > > --------------------------- > long > sbspace(struct sockbuf *sb) > { > long bleft; > long mleft; > > if (sb->sb_flags & SB_STOP) > return(0); > bleft = sb->sb_hiwat - sb->sb_cc; This seems to be much more than a corner case. It is normal for high water marks to be exceeded (perhaps not so normal for sockets). Whenever the high water mark is exceeded, the subtraction overflows to a large u_int value. Then on 32-bit arches, assignment to long overflows again but gives the correct value, but on 64-bit arches it doesn't overflow and the result is a large positive long value. > mleft = sb->sb_mbmax - sb->sb_mbcnt; > return((bleft < mleft) ? bleft : mleft); > > } This isn't quite the same as in -current from 2 months ago. -current also has an ifdef and different style bugs. The working version in FreeBSD-5 is: X /* X * How much space is there in a socket buffer (so->so_snd or so->so_rcv)? X * This is problematical if the fields are unsigned, as the space might X * still be negative (cc > hiwat or mbcnt > mbmax). Should detect X * overflow and return 0. Should use "lmin" but it doesn't exist now. X */ X #define sbspace(sb) \ X ((long) imin((int)((sb)->sb_hiwat - (sb)->sb_cc), \ X (int)((sb)->sb_mbmax - (sb)->sb_mbcnt))) -current has the same comment with the last 2 sentences removed. These sentences were a poor description of the hackish fix. They were removed together with changing the hackish fix to a bug. The warning about the bug remains. All of the sb_ fields in the above are still u_int. This implements the intermediate C programmer mistake of using unsigned types for values that are known to be nonnegative. This gives overflow and sign extension bugs here (strictly, non-overflow that is harmful, and non-sign extension that is harmful). Plain int should be enough for holding nonnegative values for anyone. It wasn't with 16-bit ints, but FreeBSD and possibly BSD never supported any system with 16-bit ints. 32-bit ints are enough here. The inner bogus cast compensate for the type botch by converting to int. It has to know that the field types are at most unsigned with the same size, and not get sign extension bugs as in the current version by converting to a larger size. This depends on no overflow in the expression occuring when everything is done with plain ints. This is satisified since 32-bit ints are enough for anyone. Buffer bloat starts with buffers much smaller than 2G. Then imin() is used to compare the values. imin() is hard to use since it only works for args that are either int or representable as int, but here they are known to be _mis_representable as int if we didn't already cast them; imin() would do the same casts automatically, and this is what is wanted here. We do the conversions explicitly because working around the type errors is hard enough to understand even when it is explicit. Finally, we cast to long because that is what sbspace() always returned. This was only needed to support 16-bit ints. Old BSD code used longs excessively, and this has not been fixed in many internal APIs. Old code using corrected API: X #define sbspace(sb) \ X lmin((long)(sb)->sb_hiwat - (sb)->sb_cc, \ X (long)(sb)->sb_mbmax - (sb)->sb_mbcnt) Other bugs in the old code: - the comment about lmin() not existing has been false for 20+ years - using lmin() makes only a cosmetic difference. Delicate casts are still needed. Implicit ones would work, but I wrote one explicit one. On 64-bit arches, this just works, but on 32-bit arches we start with u_int and truncate to longs; this depends on the u_int values being much smaller than LONG_MAX, just like they are much smaller than INT_MAX. - overflow is technically impossible, so the comment shouldn't mention it - after detecting the technically impossible overflow, there is no need to return 0. sbspace() returns a count, and a negative count works just as well as 0, sincd the API is not broken by using unsigned types. In the current version, changing the local variables from long to int would restore the delicate conversions, slightly more obfuscated by using redundant explicit assignments instead of redundant explicit casts. Or more clearly, cast all the field values to int. There is no point in keeping the variables long and casting to long instead -- on 32-bit machines, this is equally delicate. But keep the long return for compatibility. Or using only unsigned types and overflow checking, write more complicated code, like people had to do for counting before negative numbers were invented: after also fixing some style bugs: long sbspace(struct sockbuf *sb) { u_int bleft, mleft; if (sb->sb_flags & SB_STOP) return (0); bleft = sb->sb_hiwat < sb->sb_cc ? 0 : sb->sb_hiwat - sb->sb_cc; mleft = sb->sb_mbmax < sb->sb_mbcnt ? 9 : sb->sb_mbmax - sb->sb_mbcnt; return (min(bleft, mleft)); } This still needs range analysis to ensure that the u_int returned by min() is representable as a long. This follows from everything being much smaller than INT32_MAX, so we should have used int all along. Recent bugs from overflow in 'int ticks' were not quite the opposite of this. 'ticks' really does need to be unsigned, but some differences of it probably need be signed. It is signed because that is what it was originally and that is probably still needed to avoid sign extension bugs for differences of it, and its overflow used to be benign. Now -fwrapv makes its overflow benign again and hides overflow bugs elswhere. I used to think that overflow in conversions gives undefined behaviour. Actually, it gives defined behaviour with an implementation-defined result. The bogus casts in the above depend on the result being reasonable for conversion from a large unsigned value to a signed integer (int or long) of the same size. A flag like -fwrapv might be needed to stop the compiler defining its result as weird. In the 1's complement case, the reasonable result for a difference of -1U is -0 = 0, but we would prefer a result of -1. This is not a problem since both -0 and -1 mean no space. Bruce