From owner-freebsd-toolchain@FreeBSD.ORG Thu Jul 11 15:28:35 2013 Return-Path: Delivered-To: freebsd-toolchain@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id CA0589B; Thu, 11 Jul 2013 15:28:35 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au [211.29.132.53]) by mx1.freebsd.org (Postfix) with ESMTP id 51C0B146B; Thu, 11 Jul 2013 15:28:34 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id C96DBD409AA; Fri, 12 Jul 2013 01:28:28 +1000 (EST) Date: Fri, 12 Jul 2013 01:28:26 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: David Chisnall Subject: Re: CURRENT: CLANG 3.3 and -stad=c++11 and -stdlib=libc++: isnan()/isninf() oddity In-Reply-To: Message-ID: <20130711232150.G85090@besplex.bde.org> References: <20130710155809.0f589c22@thor.walstatt.dyndns.org> <20130710183315.725dfde0@thor.walstatt.dyndns.org> <20130710203200.5359fd18@thor.walstatt.dyndns.org> <51DDC04B.6040209@FreeBSD.org> <20957.49978.73666.392417@khavrinen.csail.mit.edu> <20130711130043.R920@besplex.bde.org> <20130711202908.L84170@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=Q6eKePKa c=1 sm=1 a=GSpoo125mhwA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=mlUDQFikY8EA:10 a=muatdVjo9zT8fA-HE-QA:9 a=CjuIK1q_8ugA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 Cc: freebsd-toolchain@freebsd.org, Garrett Wollman , FreeBSD CURRENT , Bruce Evans , Tijl Coosemans , freebsd-standards@freebsd.org X-BeenThere: freebsd-toolchain@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Maintenance of FreeBSD's integrated toolchain List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jul 2013 15:28:35 -0000 On Thu, 11 Jul 2013, David Chisnall wrote: > On 11 Jul 2013, at 13:11, Bruce Evans wrote: > >> is also not required to be conforming C code, let alone C++ code, >> so there is only a practical requirement that it works when included >> in the C++ implementation. > > Working with the C++ implementation is the problem that we are trying to solve. > >> The compatibility that I'm talking about is with old versions of FreeBSD. >> isnan() is still in libc as a function since that was part of the FreeBSD >> ABI and too many things depended on getting it from there. It was recently >> ... > > I don't see a problem with changing the name of the function in the header and leaving the old symbol in libm for legacy code. I don't even see why old code needs the symbol. Old code should link to old compat libraries that still have it. > >>> It would also be nice to implement these macros using _Generic when compiling in C11 mode, as it will allow the compiler to produce more helpful warning messages. I would propose this implementation: >> >>> #if __has_builtin(__builtin_isnan) >> >> This won't work for me, since I develop and test msun with old compilers >> that don't support __has_builtin(). Much the same set of compilers also >> don't have enough FP builtins. > > Please look in cdefs.h, which defines __has_builtin(x) to 0 if we the compiler does not support it. It is therefore safe to use __has_builtin() in any FreeBSD header. The old compilers run on old systems that don't have that in cdefs.h (though I sometimes edit it to add compatibility cruft like that). msun sources are otherwise portable to these systems. Well, not quite. They are not fully modular and also depend on stuff in libc/include and libc/${ARCH}. I have to update or edit headers there. This hack also doesn't work with gcc in -current. gcc has __builtin_isnan but not __has_builtin(), so __has_builtin(__builtin_isnan) gives the wrong result 0. >> It also doesn't even work. clang has squillions of builtins that >> aren't really builtines so they reduce to libcalls. > > Which, again, is not a problem for code outside of libm. If libm needs different definitions of these macros then that's fine, but they should be private to libm, not installed as public headers. Yes it is. It means that nothing should use isnan() or FP_FAST_FMA* outside of libm either, since isnan() is too slow and FP_FAST_FMA* can't be trusted. Even the implementation can't reliably tell if __builtin_isnan is usuable or better than alternatives. >> The msun implementation knows that isnan() and other classification >> macros are too slow to actually use, and rarely uses them. > > Which makes any concerns that only apply to msun internals irrelevant from the perspective of discussing what goes into this header. No, the efficiency of isnan() is more important for externals, because the internals already have work-arounds. >>> #define isnan(x) __builtin_isnan(x) >>> #else >>> static __inline int >>> __isnanf(float __x) >>> { >>> return (__x != __x); >>> } >> >> Here we can do better in most cases by hard-coding this without the ifdef. > > They will generate the same code. Clang expands the builtin in the LLVM IR to a fcmp uno, so will generate the correct code even when doing fast math optimisations. On some arches the same, and not affected by -ffast-math. But this is not necessarily the fastest code, so it is a performance bug if clang akways generates the same code for the builtin. Bit tests are faster in some cases, and may be required to prevent exceptions for signaling NaNs. -ffast-math could reasonably optimize x != x to "false". It already assumes that things like overflow and NaN results can't happen, so why not optimize further by assuming that NaN inputs can't happen? >> Generic stuff doesn't seem to work right for either isnan() or >> __builtin_isnan(), though it could for at least the latter. According >> to a quick grep of strings $(which clang), __builtin_classify() is >> generic but __builtin_isnan*() isn't (the former has no type suffixes >> but the latter does, and testing shows that the latter doesn't work >> without the suffices). > > I'm not sure what you were testing: Mostly isnan() without including , and gcc. I was confused by gcc converting floats to doubles. > $ cat isnan2.c > > int test(float f, double d, long double l) > { > return __builtin_isnan(f) | > __builtin_isnan(d) | > __builtin_isnan(l); > } > $ clang isnan2.c -S -emit-llvm -o - -O1 > ... > %cmp = fcmp uno float %f, 0.000000e+00 > %cmp1 = fcmp uno double %d, 0.000000e+00 > %or4 = or i1 %cmp, %cmp1 > %cmp2 = fcmp uno x86_fp80 %l, 0xK00000000000000000000 > ... > > As you can see, it parses them as generics and generates different IR for each. I don't believe that there's a way that these would be translated back into libcalls in the back end. Yes, most cases work right. gcc converts f to double and compares the result, but that mostly works. It would be just a pessimization except te conversion gives an exception for signaling NaNs. I hope I remember correctly that the comparision doesn't give this exception. When the above is changed to use __builtin_isnanf() on the float, gcc does the right thing (no conversion). The almost-correct handling of the float is apparently not completely accidental, since for the long double gcc doesn't convert to double before comparing. With gcc, you also don't have to use __builtin_isnan*() to get the builtins. gcc doesn't warn about the undeclared isnan(), and generates the same code as the above. You can also use isnanf() on the float to avoid the conversion. isnanf() is nonstandard, so gcc warns about it. If isnan() is declared as a function as in , then gcc forgets that it is standard and generates a conversion for the long double case too. This almost works, like the conversion for the float case, since changing the precision doesn't affect the classification With clang, you only get the builtins if you use them explicitly. clang generates verbose warnings for undeclared isnan() and always calls the named function. Conversions occur according to the prototype, if any. The above was tested only on amd64. isinf() is even more interesting and broken. Now gcc doesn't convert automatically to builtins. It does the reverse, in broken ways when __builtin_isinf() is used on floats and long doubles: - it converts the float to double to create the arg. Mostly works, as above. - it doesn't convert the long double to double, but copies it to the stack pessimally through integer registers. The pessimization can be fixed by compiling with -march=core2. isinf() is called twice with a double arg and once with a long double arg. This shows that gcc doesn't really understand its own builtin, but is just passing the values with the default promotions that occur when there is no prototype in scope. When __builtin_isinff() is used on the float arg and __builtin_isinfl() is used on the long double arg, gcc passes the args correctly except for pessimizing the long double. It converts __builtin_isinff() to isinff(), etc. This is very broken, since isinff() is in the application namespace. Apart from that, it causes the following problems: - isinff(), isinf() and isinfl() can't easily be removed from libraries - in general, there is no way to know the name of a libcall functions that might be generated for builtins that don't really exist. gcc also converts __builtin_fma() to fma(). That works and is not a namespace error since fma() is a standard API that must exist as a function. clang does different things wrong for __builtin_isinf(). It gets the type stuff right, as for __builtin_isnan(). It never generates libcalls, so it doesn't need a differently named set of extern functions. But it has the major pessimization of calling extern functions for fabs(), fabsf() and fabsl(). The latter is weird, since clang optimizes explicit fabs*() calls almost perfectly on amd64 using SSE bit operations for floats and doubles and i387 hardware fabs for long doubles (bit operations also work for long doubles, but are less convenient and much slower when done wrong). I probably knew some of the previous paragraph when I wrote a local isinf() for catrig*.c. catrig*.c is unlike most of msun. It uses classification macros a lot. Efficiency investigations showed that these work OK provided they always use efficient builtins and in particular, never use the standard macros (except isnan() = itself -> builtin for gcc). isinf(x) is just (fabs(x) == INFINITY), like clang generates for __builtin_isinf(), but much better since the fabs() is explicit so clang doesn't neglect to optimize it. This works well for gcc too (3.x and 4.2.1). Bruce