Date: Fri, 16 Jun 2006 13:00:42 GMT From: Bruce Evans <bde@zeta.org.au> To: freebsd-bugs@FreeBSD.org Subject: Re: kern/98460 : [kernel] [patch] fpu_clean_state() cannot be disabled for not AMD processors, those are not vulnerable to FreeBSD-SA-06:14.fpu Message-ID: <200606161300.k5GD0gOv098426@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/98460; it has been noted by GNATS. From: Bruce Evans <bde@zeta.org.au> To: Rostislav Krasny <rosti.bsd@gmail.com> Cc: freebsd-gnats-submit@freebsd.org Subject: Re: kern/98460 : [kernel] [patch] fpu_clean_state() cannot be disabled for not AMD processors, those are not vulnerable to FreeBSD-SA-06:14.fpu Date: Fri, 16 Jun 2006 22:50:01 +1000 (EST) On Fri, 16 Jun 2006, Rostislav Krasny wrote: > On Sat, 10 Jun 2006 11:26:20 +1000 (EST) > Bruce Evans <bde@zeta.org.au> wrote: > >> On Fri, 9 Jun 2006, Rostislav Krasny wrote: >> >>> On Wed, 7 Jun 2006 12:09:10 +1000 (EST) >>> Bruce Evans <bde@zeta.org.au> wrote: >>>> [on avoiding some branches] >>> >>> Could you please explain in more detail how that can be done? >> >> Just do it. The easiest way is define the new function as inline. >> This just works because the function is defined before it is used. >> >> [snipped] > > But you still check cpu_fxsr, so a branch misprediction on a good few > CPUs is still possible. The only solution is a self-modified code with > a direct jump. I made following userland example of such a code: Why are we worrying about just this and not all the other branches on cpu_fxsr, not to mention all other branches in the kernel :-)? Note that there's another one on cpu_fxsr, in the critical path for npxdna(), in fpurstor(). There are also many branches and other unnecessary overheads in the trap handling before npxdna() is called. No one seems to be concerned about these. I sometimes worry about these, and prefer my original implementation of i387 DNA handling all in assembler. It takes 12 instructions with 1 branch where in my version of FreeBSD Xdna takes 124 instructions with 23 branches (46 instructions with 10 branches in npxdna()). I don't know how common branch misprediction is in npxdna() (or in Xdna or trap() or in trap handling generally), but guess it is quite common, and fairly common for syscalls too, since traps are not very common ind individual syscalls are not very common; thus the CPU is likely to have better things to do with memory cache and branch cache resources that caching traps or individual syscalls. But if something is so little used that it doesn't stay cached then unnecessarily using it is unlikely to make a significant difference to efficiency. > [Example of self-modifying code] > I think there should be no need in mprotect() in the kernel. That > technique could be combined with an assembly version of fpu_clean_state() > from following article. See the '"FXRSTOR-centric" method': I think Linux is doing this now (perhaps more with nulling out unecessary instructions). Trap handlers can be patched even more easily and efficiently by pointing their IDT entry at a machine-dependent optimal handler, but as mentioned above FreeBSD does almost the opposite of that (it pushes everything through trap()). > http://security.freebsd.org/advisories/FreeBSD-SA-06:14-amd.txt > > That might be tricky, I know. But why one should pay a performance > penalty because of a CPU he/she didn't buy? Because the penalty is (?) too small to measure. I would be interested in any measurement that shows otherwise, and generally in any method for measuring the cost of branches in code that should not be executed very often. I often do micro-benchmakers by putting sequences of instructions in a loop, but this doesn't work right for code that is not executed very often. I haven't looked at performance counter info fo a long time. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200606161300.k5GD0gOv098426>
