From owner-svn-src-head@FreeBSD.ORG Wed Feb 29 00:45:37 2012 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 48EC61065676; Wed, 29 Feb 2012 00:45:37 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail15.syd.optusnet.com.au (mail15.syd.optusnet.com.au [211.29.132.196]) by mx1.freebsd.org (Postfix) with ESMTP id D3E9E8FC0C; Wed, 29 Feb 2012 00:45:36 +0000 (UTC) Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au (c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136]) by mail15.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q1T0jVJM026388 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 29 Feb 2012 11:45:33 +1100 Date: Wed, 29 Feb 2012 11:45:31 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Tijl Coosemans In-Reply-To: <201202281939.q1SJdtYr084858@svn.freebsd.org> Message-ID: <20120229105104.K1479@besplex.bde.org> References: <201202281939.q1SJdtYr084858@svn.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r232266 - in head/sys: amd64/include i386/include pc98/include x86/include X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Feb 2012 00:45:37 -0000 On Tue, 28 Feb 2012, Tijl Coosemans wrote: > Log: > Copy amd64 endian.h to x86 and merge with i386 endian.h. Replace > amd64/i386/pc98 endian.h with stubs. > > In __bswap64_const(x) the conflict between 0xffUL and 0xffULL has been > resolved by reimplementing the macro in terms of __bswap32(x). As a side > effect __bswap64_var(x) is now implemented using two bswap instructions on > i386 and should be much faster. __bswap32_const(x) has been reimplemented > in terms of __bswap16(x) for consistency. The files now have enough in common for me to be happy with centralizing them. The trip through would have to be avoided to complete the cleanup. I don't know how to do that. Maybe install can produce a direct symlink. We should try harder to not install both x86/foo.h and machine/foo.h, so pathnames and don't start being used in applications and then having to be supported for compatibility. The patch is unreadable due to significant reimplementations, even without the move. So I looked at the resulting x86/endian.h: % ... % #ifdef __cplusplus % extern "C" { % #endif This ifdef is old style bug: - it is a hard-coded spelling of __BEGIN_DECLS - it is placed around the whole file. Most of the file consists of macro definitions. __BEGIN_DECLS would obviously make no sense around macro definitions. - the rest of the file consists of inline function declarations. Since these aren't extern, 'extern "C"' makes no sense for them, though it isn't so clear that __BEGIN_DECLS makes no sense for them. - the log message for the commit that added it is not good, but at least says that C++ support is added. % #if defined(__GNUCLIKE_ASM) && defined(__GNUCLIKE_BUILTIN_CONSTANT_P) This ifdef is still misplaced. Defining the 'const' versions doesn't require the gcc builtin, and it is further from requiring gcc asm. % % #define __bswap64_const(_x) \ % (((_x) >> 56) | \ % (((_x) >> 40) & (0xffULL << 8)) | \ % (((_x) >> 24) & (0xffULL << 16)) | \ % (((_x) >> 8) & (0xffULL << 24)) | \ % (((_x) << 8) & (0xffULL << 32)) | \ % (((_x) << 24) & (0xffULL << 40)) | \ % (((_x) << 40) & (0xffULL << 48)) | \ % ((_x) << 56)) I think this can be simplified in the same way as the `var' case should have been simplified, by building it up out of 2 __bswap32_const()s... % % #define __bswap32_const(_x) \ % (((_x) >> 24) | \ % (((_x) & (0xff << 16)) >> 8) | \ % (((_x) & (0xff << 8)) << 8) | \ % ((_x) << 24)) ... and this can be built out of 2 __bswap16_const()s', though it is short enough already. Building up would be essential for __bswap_2^N and good for N > 64. % % #define __bswap16_const(_x) (__uint16_t)((_x) << 8 | (_x) >> 8) % % static __inline __uint64_t % __bswap64_var(__uint64_t __x) % { % % return __bswap64_const(__x); % } This seems to be a pessimization for amd64, and doesn't match your claim that it now uses 2 bswap instructions on i386. It is the old i386 version. amd64 used to use a single 64-bit bswap in asm here. Last time I looked (only on i386), gcc was incapable of turning either __bswap64_const() or __bswap32_const() into 2 or 1 bswap instructions, but generated pessimial code using shifts and masks. Only clang generated the 2 or 1 bswaps. There is a PR about bswap64 being pessimal on i386. The following seems to be optimal for gcc on i386. It uses the building-up method. See the PR for other details. @ #define _KERNEL @ #define I486_CPU @ @ #include @ #include @ @ uint64_t x, y; @ @ int @ main(void) @ { @ /* @ * Follow is a better __bswap64(). It doesn't even need const and @ * var subcases. @ */ @ y = __bswap32(x) | (uint64_t)__bswap32(x >> 32) << 32; @ return (0); @ } Expanding on its comment, we probably don't even need the const subcase for consts, but can just use the above expression for consts too. However, this only works right for i386. For amd64, there are the following cases: - const case, compiler either gcc or clang: the above expression should work - variable case, gcc: needs the inline with a 64-bit bswap, unless gcc has become smarter - variable case, clang: the above expression won't work if __bswap32() uses an inline asm with a 32-bit bswap like it does now, since clang will be forced to generate 2 32-bit bswaps. To recover to single 64-bit bswap, either write it in inline asm for clang too, or maybe use the general 64-bit 'const' expression (written like it is now, not as in my version above). I think clang translates the big const expression into a single 64-bit bswap, not 2 32-bit ones like it does on i386. % % Extra blank line (old style bug duplicated). % ... % #define __bswap64(_x) \ % (__builtin_constant_p(_x) ? \ % __bswap64_const((__uint64_t)(_x)) : __bswap64_var(_x)) Can be replaced by a "building-up" expression except for complications with clang on amd64. % % #define __bswap32(_x) \ % (__builtin_constant_p(_x) ? \ % __bswap32_const((__uint32_t)(_x)) : __bswap32_var(_x)) Can be replaced by a "building-up" expression except for complications for gcc. (We already killed the asm for __bswap16_var(), so the optimization problem now is to turn the general expression for __bswap32() into a single 32-bit bswap instruction. Only clang does this automatically. % % #define __bswap16(_x) \ % (__builtin_constant_p(_x) ? \ % __bswap16_const((__uint16_t)(_x)) : __bswap16_var(_x)) Need a primitive somewhere -- no building up for this :-). % #ifdef __cplusplus % } % #endif Brackets the 'extern "C"' style bug. but looks uglier with a brace all by itself. Bruce