Date: Fri, 11 May 2018 11:38:29 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Konstantin Belousov <kostikbel@gmail.com> Cc: Bruce Evans <brde@optusnet.com.au>, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r333461 - head/sys/amd64/amd64 Message-ID: <20180511103724.O887@besplex.bde.org> In-Reply-To: <20180510193816.GF6887@kib.kiev.ua> References: <201805101501.w4AF1iI0039082@repo.freebsd.org> <20180511012309.V3949@besplex.bde.org> <20180510193816.GF6887@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 10 May 2018, Konstantin Belousov wrote: > On Fri, May 11, 2018 at 03:31:46AM +1000, Bruce Evans wrote: >> On Thu, 10 May 2018, Konstantin Belousov wrote: >> >>> Log: >>> Make fpusave() and fpurestore() on amd64 ifuncs. >>> >>> From now on, linking amd64 kernel requires either lld or newer ld.bfd. >> >> This breaks building with gcc: >> >> XX cc1: warnings being treated as errors >> XX ../../../amd64/amd64/fpu.c:195: warning: 'ifunc' attribute directive ignored [-Wattributes] >> ... >> XX ./machine/fpu.h:62: warning: previous declaration of 'fpurestore' was here > Yes. Nothing unexpected. Thus it is not suitable for commit. >> After building fpu.o with clang, linking doesn't reqire a newer ld, but >> booting does -- after linking with an older ld, the boot panics with a >> page fault near fork(). > Yes. I noted this in the commit message. > > emaste will commit the check shortly which would prevent using inadequate > linker for building kernel. Now it is broken even for clang: XX make: "../../../conf/../../../conf/kern.pre.mk" line 125: amd64 kernel requires linker ifunc support This uses a LINKER_FEATURES, but LINKER_FEATURES is not defined anywhere in the sys tree. It is defined in the src amd host tree in bsd.linker.mk. Kernel makefiles have been broken by adding a lot of dependencies on files not in the sys tree, but bsd.linker.mk doesn't seem to be included. Even "make -V LINKER_FEATURES" to determine what LINKER_FEATURES is is broken (it gives the above error). After backing out r333470, "make -V LINKER_FEATURES" works and gives " build-id filter retpoline" with both clang and gcc. LINKER_FEATURES seems to be hard-coded somewhere. Only the ld that will be used by the build knows its features. The ld is hard to find. My gcc is a shell script with lots of -B paths that eventually find ld. Links should be done by ${CC} to find this ld. But the kernel uses ${LD}. When this became incompatible earlier this year after working for about 15 years despite its logical incompatibility, I worked around the problem by adding "makeoptions LD=<path to my ld>". My ld was pre-lld until yesterday. For this test, it is a symlink to the host's ld. This supports ifuncs, but LINKER_FEATURES says that it doesn't. The host (freefall) also doesn't support ifuncs according to "make -V LINKER_FEATURES" in src/bin/cat. Only bsd.pre.mk was broken in r333470. kern.mk doesn't have this check. It is a layering bug to not put such checks in kern.mk. >> ... >> This looks like a small or null pessimization. The branch in the old >> version is probably faster. The following simple test on Haswell shows >> that all reasonable versions have almost the same speed when cached: > Yes, I already noted and mjg noted that ifuncs are directed through PLT. > I remember that it was not the case when I did it the first time, but then > both compiler and linker were different. The test failed to explain why the branch-free version worked slower only when written in asm (7.25 cycles instead of 7). It was because I changed the return value from (xsave ? 1 : 0) to (xsave ? 2 : 1) to better match some asm versions. gcc -O only produces branch-free code for '1 : 0', and that is what I first tested for the C version (before that, I had forgotten that brranch-free cide might be better). gcc -O2 produces branch-free code for '2 : 1'. It generates the same code as for '1 : 0', then increments this. Neither -O nor -O2 uses cmov for some reason. The increment is 1 more instruction, but doesn't increase the time of 7 cycles. This is all with gcc-3.3.3. gcc-4.2.1 -O is "smarter" and generates "sbb %eax,%eax; add $2, %eax" to generate '2 : 1'. This also desn't change the time of 7 cycles. > I tried a quick experiment with adding -fvisibility=hidden to the kernel > compilation, but it still leaves ifunc relocations on PLT (and PLT correctly > consists only of ifuncs). > > I might look some more on it later. Anyway, there are more uses, and SMAP > plus pmap applications of ifunc are more clear. I selected amd64/fpu.c > for the initial commit to check toolchain in wild since this case is > simplest and easy to diagnose if failing, I hope. You need to find about 16 million uses per second at 4 GHz for this to give a 1% optimization ("this" = 0.25 cycles). -O2 and other optimizations give an improvement in this range, but break debugging so I turn them off. Except for clang, the -finline* flags for turning off some optimizations don't even work. A typical stack trace for clang usually has <value optimized out> for almost all values even for debuggers that should be able to find the values in registers. ifuncs might also give <function name optimized out>. Debugging is already very broken by inlining small functions -- stack traces don't show the small functions or their args, but who a caller several layers high. All for optimizations in the 1-5% range for kernels, or a small fraction of that for total time. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180511103724.O887>