Date: Sun, 23 Feb 2014 23:14:54 -0500 From: Mark Johnston <markj@freebsd.org> To: freebsd-dtrace@freebsd.org Subject: [patch] fasttrap process scratch space Message-ID: <20140224041454.GB2720@raichu>
next in thread | raw e-mail | index | archive | help
Hello, For those not familiar with MD parts of fasttrap, one of the things it has to do is ensure that any userland instruction that it replaces with a breakpoint gets executed in the traced process' context. For several common classes of instructions, fasttrap will emulate the instruction in the breakpoint handler; when it can't do that, it copies the instruction out to some scratch space in the process' address space and sets the PC of the interrupted thread to the address of that instruction, which is followed by a jump to the instruction following the breakpoint. There's a helpful block comment titled "Generic Instruction Tracing" around line 1585 of the x86 fasttrap_isa.c which describes the details of this. This functionality currently doesn't work on FreeBSD, mainly because we don't necessarily have any (per-thread) scratch space available for use in the process' address space. In illumos/Solaris, a small (< 64 byte) block is reserved in each thread's TLS for use by DTrace. It turns out that doing the same thing on FreeBSD is quite easy: http://people.freebsd.org/~markj/patches/fasttrap_scratch_hacky.diff Specifically, we need to ensure that TLS (allocated by the runtime linker) is executable and that we properly extract the offset to the scratch space from the FS segment register. I think this is somewhat hacky though, as it creates a dependency on libthr and rtld internals. A second approach is to have fasttrap dynamically allocate scratch space within the process' address space using vm_map_insert(9). My understanding is that Apple's DTrace implementation does this, and I've implemented this approach for FreeBSD here (which was done without referencing Apple code): http://people.freebsd.org/~markj/patches/fasttrap-scratch-space/fasttrap-scratch-space-1.diff The idea is to map pages of executable memory into the user process as needed, and carve them into scratch space chunks for use by individual threads. If a thread in fasttrap_pid_probe() needs scratch space, it calls a new function, fasttrap_scraddr(). If the thread already has scratch space allocated to it, it's used. Otherwise, if any free scratch space chunks are available in an already-mapped page, one of them is allocated to the thread and used. Otherwise, a new page is mapped using vm_map_insert(9). Threads hold onto their scratch space until they exit. That is, scratch space is never unmapped from the process, even if the controlling dtrace(1) process detaches. I added a handler for thread_dtor event which re-adds any scratch space held by the thread to the free list for that process. Per-process scratch space state is held in the fasttrap process handle (fasttrap_proc_t), since that turns out to be much easier than keeping it in the struct proc. Does anyone have any thoughts or comments on the approach or the patch? Any review or testing would be very much appreciated. For testing purposes, it's helpful to know that tracing memcpy() on amd64 will result in use of this scratch space code, as it starts with a "mov %rdi,%rax" on my machine at least. My main test case has been to run something like # dtrace -n 'pid$target:libc.so.7::entry {@[probefunc] = count()}' -p $(pgrep firefox) Attempting to trace all functions still results in firefox dying with SIGTRAP, but we're getting there. :) Thanks, -- -Mark
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140224041454.GB2720>