Date: Tue, 17 Apr 2018 04:16:35 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Brooks Davis <brooks@freebsd.org> Cc: freebsd-arch@freebsd.org Subject: Re: Do fuswintr/suswintr make sense? Message-ID: <20180417031820.A6479@besplex.bde.org> In-Reply-To: <20180416161012.GB44509@spindle.one-eyed-alien.net> References: <20180416161012.GB44509@spindle.one-eyed-alien.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 16 Apr 2018, Brooks Davis wrote: > The fuswintr() and suswintr() are intended to be safe in interrupt > context. They are used in the profiling code and if they fail the code > falls back to triggering a trap with appropriate fields in struct > thread. This is fine as such, but amd64, arm, i386, and powerpc have > implementations that always fail. arm64, mips, riscv, and sparc64 all > add code to the trap handler to detect that this particular code has > faulted and return to the handler before doing and processing that might > result in a sleep. This optimization came from 4.4BSD. Not having it for i386 also came from 4.4BSD. NetBSD fixed it for i386 in 1994 or earlier (locore.s 1.41) > Does this optimization actually make sense in 2017, particularly > given that we're not taking advantage of it on x86 (and worse, our > implementations of return (-1) aren't inlined so they have cache > impacts)? Profiling is even more in need of optimizations in 2017. But not this one, at least on i386. [fs]uswintr() might be worth using if they looked like [fs]uword16() and the latter and were efficient. (I don't see any reason why they can't be essentially the same. If the user memory is not mapped then they will cause the same page fault, and they just have to fail instead of trying to handle the page fault.) But [fs]uword16() are now extremely inefficient on i386. The user and kernel memory are in a different address spaces. The functions are implemented using a trampoline that has to map user memory, and this is even slower than having a trampoline. But not as slow as switching the whole memory map on every crossing of the user-kernel boundary including for profiling interrupts. Since accessing 1 word at a time is too slow on i386 no matter how it is done. the correct optimization looks more like a fancier ast() than fsuswintr(): use a kernel buffer with a few hundred addresses instead of only 1, and tell the application about the addresses in a single operation. Do a full switch back to user context and run a trampoline to add 1 to many addresses there, since the addresses are expected to be on many different pages. (The buffer now is { td_profil_addr; td_profil_ticks; }. It can hold several ticks but only at 1 address. Multiple ticks occur mainly when the system can't keep up; then the address is clobbered but the ticks count is charged to a new address.) Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180417031820.A6479>