From owner-svn-src-projects@FreeBSD.ORG  Fri Aug 13 14:10:26 2010
Return-Path: <owner-svn-src-projects@FreeBSD.ORG>
Delivered-To: svn-src-projects@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B1C101065675;
	Fri, 13 Aug 2010 14:10:26 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail02.syd.optusnet.com.au (mail02.syd.optusnet.com.au
	[211.29.132.183])
	by mx1.freebsd.org (Postfix) with ESMTP id 31CD08FC0A;
	Fri, 13 Aug 2010 14:10:25 +0000 (UTC)
Received: from c122-106-147-41.carlnfd1.nsw.optusnet.com.au
	(c122-106-147-41.carlnfd1.nsw.optusnet.com.au [122.106.147.41])
	by mail02.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	o7DEAMpg029071
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sat, 14 Aug 2010 00:10:23 +1000
Date: Sat, 14 Aug 2010 00:10:22 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Jeff Roberson <jeff@FreeBSD.org>
In-Reply-To: <201008130315.o7D3Ff5w077426@svn.freebsd.org>
Message-ID: <20100813222107.C12891@delplex.bde.org>
References: <201008130315.o7D3Ff5w077426@svn.freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: svn-src-projects@FreeBSD.org, src-committers@FreeBSD.org
Subject: Re: svn commit: r211266 - projects/ofed/head/sys/amd64/include
X-BeenThere: svn-src-projects@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "SVN commit messages for the src &quot; projects&quot;
	tree" <svn-src-projects.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-projects>, 
	<mailto:svn-src-projects-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-projects>
List-Post: <mailto:svn-src-projects@freebsd.org>
List-Help: <mailto:svn-src-projects-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-projects>, 
	<mailto:svn-src-projects-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 13 Aug 2010 14:10:26 -0000

On Fri, 13 Aug 2010, Jeff Roberson wrote:

> Log:
>   - Allow endian related macros to do compile time conversion of constants.

Er, they already did, at least for non-broken compilers:

% #include <machine/endian.h>
% 
% unsigned long long
% foo()
% {
% 	return (__bswap64(0x1000000000000000ULL) +
% 	    __bswap32(0x20000000) +
% 	    __bswap16(0x4000));
% }

On amd64, when compiled by gcc-3.3.3 through gcc-4.2.1 -O, this evaluates
everything at compile time to the expected return value of 0x70, while
with gcc -O0 it correctly evaluates nothing at compile time and emits
calls to the 3 bswap() functions.  The -O0 case is unimportant but it is
better for it to make the function calls so that you can put breakpoints
in the calls.  I think this commit breaks the behaviour with -O0 gby
changing all the inline functions to macros.

I think cognet@ has cleaner diffs for this.  The constant part should be
entirely MI, but is now a tangle of idefs not even duplicated ad nauseum.
IIRC, the only unfinished detail in these diffs is that the ifdef tangle
threatens to becomes worse and wasn't completely unraveled.

> Modified: projects/ofed/head/sys/amd64/include/endian.h
> ==============================================================================
> --- projects/ofed/head/sys/amd64/include/endian.h	Fri Aug 13 03:15:02 2010	(r211265)
> +++ projects/ofed/head/sys/amd64/include/endian.h	Fri Aug 13 03:15:41 2010	(r211266)
> @@ -69,10 +69,16 @@ extern "C" {
>
> #if defined(__GNUCLIKE_ASM) && defined(__GNUCLIKE_BUILTIN_CONSTANT_P)
>
> -#define __byte_swap_int_var(x) \
> -__extension__ ({ register __uint32_t __X = (x); \
> -   __asm ("bswap %0" : "+r" (__X)); \
> -   __X; })
> +static inline __uint32_t
> +__byte_swap_int_var(__uint32_t x)

`x' is in the application namespace, so it cannot be used as a function
parameter or variable name in a central header like this.

> +{
> +	register __uint32_t _x;
> +
> +

Extra blank line.

> +	_x = x;

Converting x to a variable whose name should have been x's all along
doesn't fix the namespace pollution.

> +	__asm ("bswap %0" : "+r" (_x));
> +	return (_x);
> +}

The formatting of the macro is bad, but the macro is good for avoiding
namespace pollution.

The extra variable also makes no sense except the macro, since in the
macro it is needed to convert the parameter, while in the function that
conversion has already been done by the function call protocol.

> @@ -81,60 +87,47 @@ __extension__ ({ register __uint32_t __X
> 	 (((x) & 0x00ff0000) >>  8) | \
> 	 (((x) & 0x0000ff00) <<  8) | \
> 	 (((x) & 0x000000ff) << 24))
> -#define	__byte_swap_int(x) (__builtin_constant_p(x) ? \
> +#define	__bswap32(x)	(__builtin_constant_p(x) ? \
> 	__byte_swap_int_const(x) : __byte_swap_int_var(x))
>
> #else	/* __OPTIMIZE__ */
>
> -#define	__byte_swap_int(x) __byte_swap_int_var(x)
> +#define	__bswap32(x)	__byte_swap_int_var(x)
>
> #endif	/* __OPTIMIZE__ */
>
> -#define __byte_swap_long_var(x) \
> -__extension__ ({ register __uint64_t __X = (x); \
> -   __asm ("bswap %0" : "+r" (__X)); \
> -   __X; })
> +static inline __uint64_t
> +__byte_swap_long_var(__uint64_t x)
> +{
> +	register __uint64_t _x;
> +
> +	_x = x;
> +	__asm ("bswap %0" : "+r" (_x));
> +	return (_x);
> +}

As above, except there is no extra blank line.

>
> #ifdef __OPTIMIZE__
>
> #define	__byte_swap_long_const(x) \
> -	(((x >> 56) | \
> -	 ((x >> 40) & 0xff00) | \
> -	 ((x >> 24) & 0xff0000) | \
> -	 ((x >> 8) & 0xff000000) | \
> -	 ((x << 8) & (0xfful << 32)) | \
> -	 ((x << 24) & (0xfful << 40)) | \
> -	 ((x << 40) & (0xfful << 48)) | \
> -	 ((x << 56))))
> +	((__uint64_t)(((__uint64_t)x >> 56) |		\
> +	 (((__uint64_t)x >> 40) & 0xff00) |		\
> +	 (((__uint64_t)x >> 24) & 0xff0000) |		\
> +	 (((__uint64_t)x >> 8) & 0xff000000) |		\
> +	 (((__uint64_t)x << 8) & (0xfful << 32)) |	\
> +	 (((__uint64_t)x << 24) & (0xfful << 40)) |	\
> +	 (((__uint64_t)x << 40) & (0xfful << 48)) |	\
> +	 (((__uint64_t)x << 56))))

That sure is ugly, and still quite broken, since all the x's in it are
missing parentheses.  Style bugs in it include:
- casting to __uint64_t is excessive.  This function logically acts on
   u_longs, though its name says plain long.  The literal constants already
   use a UL suffix (written as ul so it is harder to read).
- casting for the right shifts is mostly not needed.  Except for the shift
   by 56, any replicated sign bit is stripped by the mask.

Here is a clean version of the above:

% static __inline __uint64_t
% __bswap64(__uint64_t _x)
% {
% 
% 	return ((_x >> 56) | ((_x >> 40) & 0xff00) | ((_x >> 24) & 0xff0000) |
% 	    ((_x >> 8) & 0xff000000) | ((_x << 8) & ((__uint64_t)0xff << 32)) |
% 	    ((_x << 24) & ((__uint64_t)0xff << 40)) | 
% 	    ((_x << 40) & ((__uint64_t)0xff << 48)) | ((_x << 56)));
% }

i386 already has precisely this.  The main point here is that this inline
function works right for constants (it evaluates the inline at compile
time, but only when optimizing), so if you get out of the compiler's
way by not confusing it and you with an ifdef tangle, then this version
will just work for both constants and variables.  This is what already
happens on i386.  But on amd64 there is a complication: gcc is still too
stupid to automatically convert the above into a single bswap instruction
if _x is a variable, and since we want to use a single bswap instruction
if possible then we have to write an asm with the bswap, and use an ifdef
tangle to select...

Secondary points: a function is much easier to use than a macro here, since
the expression must reference the parameter many times and it is easier to
spell the parameter as _x than as ((__uint64_t)(_x)).  This depends on
converting the parameter to a single identifier of the correct type.  ANy
macro version should do the same, using a temporary variable and a statment-
expression, like some macros in this file used to do.  This depends on gcc
working right for constants that are converted to temporary variables, just
like it does for inline functions with constant args.

Cleanup points: the above __bswap64() is MI.  It shouldn't be duplicated
(or copied with errors like the amd64 macro version) into many MD
files.  But MD files need to be able to override it, hopefully without
a very large ifdef tangle.

> ...
> -static __inline __uint16_t
> -__bswap16(__uint16_t _x)
> -{
> -	return (_x << 8 | _x >> 8);
> -}
> +#define	__bswap16(_x)	(__uint16_t)((_x) << 8 | (_x) >> 8)

This is less than backwards:
- the macro version is broken, like __byte_swap_long_const(x) was, since
   it is missing conversions of the parameter to uint16_t.  The parameter
   should have type uint16_t, but the strictness of this requirement is
   not clear.  A bogus parameter of (int16_t)0x8000 now gives 0xff80
   after replication of the sign bit by the right shift, instead of the
   expected value of 0x80.  Casting the final value to uint16_t only
   prevents 16 more bits of replcation by sign extension
   ((int16_t)0x8000 -> (int)0xffff80000 right shifts to (int)0xffffff80).
- the inline function version already worked perfectly for both constants
   and variables, just like the inline 64-bit version works perfectly
   for both constants and variables on arches that don't have a better
   single instruction like bswap.
- the parameter needed to be named _x in the inline function to avoid
   namespace pollution.  Macro parameters are in their own namespace, so
   naming them with leading underscores is just an obfuscation.

Further investigation shows that rev.1.9 of amd64 endian.h by ed@
submitted by Cristoph Mallon cleaned up __bswap16() to use the inline
function in C in all cases, instead of using an inline function with
asm for non-constants, a macro like the new one above for constants
(but without the sign extension bugs), and horrible ifdefs to join the
macros.  So for __bswap16() it became exactly like I want, except amd64
wants an MI __bswap16() whioch shouldn't be implemented here.

Bruce