Date: Sun, 4 Sep 2016 17:48:59 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Bruce Evans <bde@FreeBSD.org> Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r305382 - in head/lib/msun: amd64 i387 Message-ID: <20160904144859.GC83214@kib.kiev.ua> In-Reply-To: <201609041222.u84CMEdM033135@repo.freebsd.org> References: <201609041222.u84CMEdM033135@repo.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Sep 04, 2016 at 12:22:14PM +0000, Bruce Evans wrote: > Author: bde > Date: Sun Sep 4 12:22:14 2016 > New Revision: 305382 > URL: https://svnweb.freebsd.org/changeset/base/305382 > > Log: > Add asm versions of fmod(), fmodf() and fmodl() on amd64. Add asm > versions of fmodf() amd fmodl() on i387. > > fmod is similar to remainder, and the C versions are 3 to 9 times > slower than the asm versions on x86 for both, but we had the strange > mixture of all 6 variants of remainder in asm and only 1 of 6 > variants of fmod in asm. > > Added: > head/lib/msun/amd64/e_fmod.S (contents, props changed) > head/lib/msun/amd64/e_fmodf.S (contents, props changed) > head/lib/msun/amd64/e_fmodl.S (contents, props changed) > head/lib/msun/i387/e_fmodf.S (contents, props changed) It seems that wrong version of i387/f_fmodf.S, it is identical to the amd64 version. > Added: head/lib/msun/amd64/e_fmod.S > ============================================================================== > --- /dev/null 00:00:00 1970 (empty, because file is newly added) > +++ head/lib/msun/amd64/e_fmod.S Sun Sep 4 12:22:14 2016 (r305382) > +ENTRY(fmod) > + movsd %xmm0,-8(%rsp) > + movsd %xmm1,-16(%rsp) > + fldl -16(%rsp) > + fldl -8(%rsp) > +1: fprem > + fstsw %ax > + testw $0x400,%ax > + jne 1b > + fstpl -8(%rsp) > + movsd -8(%rsp),%xmm0 > + fstp %st > + ret > +END(fmod) I see that this is not a new approach in the amd64 subdirectory, to use x87 FPU on amd64. Please note that it might have non-obvious effects on the performance, in particular, on the speed of the context switches and handling of #NM exception. Newer Intel and possibly AMD CPUs have an optimization which allows coprocessor code to save and restore state to not save and restore state which was not changed. In other words, for typical amd64 binary which uses %xmm register file but did not touched %st nor %ymm, only %xmm bits are spilled and then loaded. Touching %st defeats the optimization, possible for the whole lifetime of the thread. This feature (XSAVEOPT) is available at least starting from Haswell microarchitecture, not sure about IvyBridge.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160904144859.GC83214>