From owner-svn-src-projects@FreeBSD.ORG Fri Aug 13 14:10:26 2010 Return-Path: Delivered-To: svn-src-projects@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B1C101065675; Fri, 13 Aug 2010 14:10:26 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail02.syd.optusnet.com.au (mail02.syd.optusnet.com.au [211.29.132.183]) by mx1.freebsd.org (Postfix) with ESMTP id 31CD08FC0A; Fri, 13 Aug 2010 14:10:25 +0000 (UTC) Received: from c122-106-147-41.carlnfd1.nsw.optusnet.com.au (c122-106-147-41.carlnfd1.nsw.optusnet.com.au [122.106.147.41]) by mail02.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o7DEAMpg029071 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 14 Aug 2010 00:10:23 +1000 Date: Sat, 14 Aug 2010 00:10:22 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Jeff Roberson In-Reply-To: <201008130315.o7D3Ff5w077426@svn.freebsd.org> Message-ID: <20100813222107.C12891@delplex.bde.org> References: <201008130315.o7D3Ff5w077426@svn.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: svn-src-projects@FreeBSD.org, src-committers@FreeBSD.org Subject: Re: svn commit: r211266 - projects/ofed/head/sys/amd64/include X-BeenThere: svn-src-projects@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the src " projects" tree" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Aug 2010 14:10:26 -0000 On Fri, 13 Aug 2010, Jeff Roberson wrote: > Log: > - Allow endian related macros to do compile time conversion of constants. Er, they already did, at least for non-broken compilers: % #include % % unsigned long long % foo() % { % return (__bswap64(0x1000000000000000ULL) + % __bswap32(0x20000000) + % __bswap16(0x4000)); % } On amd64, when compiled by gcc-3.3.3 through gcc-4.2.1 -O, this evaluates everything at compile time to the expected return value of 0x70, while with gcc -O0 it correctly evaluates nothing at compile time and emits calls to the 3 bswap() functions. The -O0 case is unimportant but it is better for it to make the function calls so that you can put breakpoints in the calls. I think this commit breaks the behaviour with -O0 gby changing all the inline functions to macros. I think cognet@ has cleaner diffs for this. The constant part should be entirely MI, but is now a tangle of idefs not even duplicated ad nauseum. IIRC, the only unfinished detail in these diffs is that the ifdef tangle threatens to becomes worse and wasn't completely unraveled. > Modified: projects/ofed/head/sys/amd64/include/endian.h > ============================================================================== > --- projects/ofed/head/sys/amd64/include/endian.h Fri Aug 13 03:15:02 2010 (r211265) > +++ projects/ofed/head/sys/amd64/include/endian.h Fri Aug 13 03:15:41 2010 (r211266) > @@ -69,10 +69,16 @@ extern "C" { > > #if defined(__GNUCLIKE_ASM) && defined(__GNUCLIKE_BUILTIN_CONSTANT_P) > > -#define __byte_swap_int_var(x) \ > -__extension__ ({ register __uint32_t __X = (x); \ > - __asm ("bswap %0" : "+r" (__X)); \ > - __X; }) > +static inline __uint32_t > +__byte_swap_int_var(__uint32_t x) `x' is in the application namespace, so it cannot be used as a function parameter or variable name in a central header like this. > +{ > + register __uint32_t _x; > + > + Extra blank line. > + _x = x; Converting x to a variable whose name should have been x's all along doesn't fix the namespace pollution. > + __asm ("bswap %0" : "+r" (_x)); > + return (_x); > +} The formatting of the macro is bad, but the macro is good for avoiding namespace pollution. The extra variable also makes no sense except the macro, since in the macro it is needed to convert the parameter, while in the function that conversion has already been done by the function call protocol. > @@ -81,60 +87,47 @@ __extension__ ({ register __uint32_t __X > (((x) & 0x00ff0000) >> 8) | \ > (((x) & 0x0000ff00) << 8) | \ > (((x) & 0x000000ff) << 24)) > -#define __byte_swap_int(x) (__builtin_constant_p(x) ? \ > +#define __bswap32(x) (__builtin_constant_p(x) ? \ > __byte_swap_int_const(x) : __byte_swap_int_var(x)) > > #else /* __OPTIMIZE__ */ > > -#define __byte_swap_int(x) __byte_swap_int_var(x) > +#define __bswap32(x) __byte_swap_int_var(x) > > #endif /* __OPTIMIZE__ */ > > -#define __byte_swap_long_var(x) \ > -__extension__ ({ register __uint64_t __X = (x); \ > - __asm ("bswap %0" : "+r" (__X)); \ > - __X; }) > +static inline __uint64_t > +__byte_swap_long_var(__uint64_t x) > +{ > + register __uint64_t _x; > + > + _x = x; > + __asm ("bswap %0" : "+r" (_x)); > + return (_x); > +} As above, except there is no extra blank line. > > #ifdef __OPTIMIZE__ > > #define __byte_swap_long_const(x) \ > - (((x >> 56) | \ > - ((x >> 40) & 0xff00) | \ > - ((x >> 24) & 0xff0000) | \ > - ((x >> 8) & 0xff000000) | \ > - ((x << 8) & (0xfful << 32)) | \ > - ((x << 24) & (0xfful << 40)) | \ > - ((x << 40) & (0xfful << 48)) | \ > - ((x << 56)))) > + ((__uint64_t)(((__uint64_t)x >> 56) | \ > + (((__uint64_t)x >> 40) & 0xff00) | \ > + (((__uint64_t)x >> 24) & 0xff0000) | \ > + (((__uint64_t)x >> 8) & 0xff000000) | \ > + (((__uint64_t)x << 8) & (0xfful << 32)) | \ > + (((__uint64_t)x << 24) & (0xfful << 40)) | \ > + (((__uint64_t)x << 40) & (0xfful << 48)) | \ > + (((__uint64_t)x << 56)))) That sure is ugly, and still quite broken, since all the x's in it are missing parentheses. Style bugs in it include: - casting to __uint64_t is excessive. This function logically acts on u_longs, though its name says plain long. The literal constants already use a UL suffix (written as ul so it is harder to read). - casting for the right shifts is mostly not needed. Except for the shift by 56, any replicated sign bit is stripped by the mask. Here is a clean version of the above: % static __inline __uint64_t % __bswap64(__uint64_t _x) % { % % return ((_x >> 56) | ((_x >> 40) & 0xff00) | ((_x >> 24) & 0xff0000) | % ((_x >> 8) & 0xff000000) | ((_x << 8) & ((__uint64_t)0xff << 32)) | % ((_x << 24) & ((__uint64_t)0xff << 40)) | % ((_x << 40) & ((__uint64_t)0xff << 48)) | ((_x << 56))); % } i386 already has precisely this. The main point here is that this inline function works right for constants (it evaluates the inline at compile time, but only when optimizing), so if you get out of the compiler's way by not confusing it and you with an ifdef tangle, then this version will just work for both constants and variables. This is what already happens on i386. But on amd64 there is a complication: gcc is still too stupid to automatically convert the above into a single bswap instruction if _x is a variable, and since we want to use a single bswap instruction if possible then we have to write an asm with the bswap, and use an ifdef tangle to select... Secondary points: a function is much easier to use than a macro here, since the expression must reference the parameter many times and it is easier to spell the parameter as _x than as ((__uint64_t)(_x)). This depends on converting the parameter to a single identifier of the correct type. ANy macro version should do the same, using a temporary variable and a statment- expression, like some macros in this file used to do. This depends on gcc working right for constants that are converted to temporary variables, just like it does for inline functions with constant args. Cleanup points: the above __bswap64() is MI. It shouldn't be duplicated (or copied with errors like the amd64 macro version) into many MD files. But MD files need to be able to override it, hopefully without a very large ifdef tangle. > ... > -static __inline __uint16_t > -__bswap16(__uint16_t _x) > -{ > - return (_x << 8 | _x >> 8); > -} > +#define __bswap16(_x) (__uint16_t)((_x) << 8 | (_x) >> 8) This is less than backwards: - the macro version is broken, like __byte_swap_long_const(x) was, since it is missing conversions of the parameter to uint16_t. The parameter should have type uint16_t, but the strictness of this requirement is not clear. A bogus parameter of (int16_t)0x8000 now gives 0xff80 after replication of the sign bit by the right shift, instead of the expected value of 0x80. Casting the final value to uint16_t only prevents 16 more bits of replcation by sign extension ((int16_t)0x8000 -> (int)0xffff80000 right shifts to (int)0xffffff80). - the inline function version already worked perfectly for both constants and variables, just like the inline 64-bit version works perfectly for both constants and variables on arches that don't have a better single instruction like bswap. - the parameter needed to be named _x in the inline function to avoid namespace pollution. Macro parameters are in their own namespace, so naming them with leading underscores is just an obfuscation. Further investigation shows that rev.1.9 of amd64 endian.h by ed@ submitted by Cristoph Mallon cleaned up __bswap16() to use the inline function in C in all cases, instead of using an inline function with asm for non-constants, a macro like the new one above for constants (but without the sign extension bugs), and horrible ifdefs to join the macros. So for __bswap16() it became exactly like I want, except amd64 wants an MI __bswap16() whioch shouldn't be implemented here. Bruce