Date: Sun, 14 May 2017 02:19:24 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Dimitry Andric <dimitry@andric.com> Cc: sgk@troutmask.apl.washington.edu, freebsd-hackers@freebsd.org, numerics@freebsd.org Subject: Re: catrig[fl].c and inexact Message-ID: <20170514020559.F1038@besplex.bde.org> In-Reply-To: <F5F8736B-D7E1-48AD-BC6C-8C74AF0A3272@andric.com> References: <20170512215654.GA82545@troutmask.apl.washington.edu> <20170513103208.M845@besplex.bde.org> <20170513060803.GA84399@troutmask.apl.washington.edu> <F5F8736B-D7E1-48AD-BC6C-8C74AF0A3272@andric.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 13 May 2017, Dimitry Andric wrote: > On 13 May 2017, at 08:08, Steve Kargl <sgk@troutmask.apl.washington.edu> wrote: >> >> On Sat, May 13, 2017 at 11:35:49AM +1000, Bruce Evans wrote: >>> On Fri, 12 May 2017, Steve Kargl wrote: > ... >>> required for the standard magic. I planned to fix all this magic using >>> macros like raise_inexact(). >> >> If you plan to fix the magic with raise_inexact, then please >> test with a suite of compilers. AFAICT, clang is optimizing >> out the code. I haven't written a testcase to demonstrate this >> as I have other irons in the fire. > > Using the full catrig.c and -O3, I tried gcc 4.2.1, 4.7.4, 4.8.5, 4.9.4, > 5.4.0, 6.3.0 and 7.0.1, in addition to clang 3.4.1, 3.8.0, 3.9.1, 4.0.0 > and 5.0.0. All versions of gcc produced something similar to the > following for i386: Yes, all compilers I tried (only gcc-3.3.3, gcc-4.2.1 and clang-3.9.0) generate the intended code, but clang-3.9.0 also generates a -Wunused warning about the variable that it has just used to generated the intended code! > # /usr/src/lib/msun/src/catrig.c:318: raise_inexact(); > flds tiny # tiny > fadds .LC2 # > fstps 120(%esp) # junk I don't know how to ask for the best code, which is more like flds tiny fadds one ffree %st(0) # or fstp %st(0) -- MD optimization but the best code runs insignificantly faster in practice. > and for amd64: > [...] > .L34: > .LBB33: > # /usr/src/lib/msun/src/catrig.c:318: raise_inexact(); > movss tiny(%rip), %xmm0 # tiny, tiny.0_28 > addss .LC13(%rip), %xmm0 #, _29 > movss %xmm0, 188(%rsp) # _29, junk Discarding the result is easier for amd64 (just omit the store). The volatile hack forces the store. > E.g., these all look good, at least with regards to not optimizing out > the desired addition. > > The only compiler I could find that does optimize everything away (at > least in the simplified test case), is the Intel compiler: > > https://godbolt.org/g/g1UT2m Urk. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170514020559.F1038>