Date: Tue, 25 Feb 2014 09:46:54 -0800 From: Adam Leventhal <ahl@delphix.com> To: Mark Johnston <markj@freebsd.org> Cc: freebsd-dtrace@freebsd.org Subject: Re: [patch] fasttrap process scratch space Message-ID: <CALWP=0b=wJEq_u=1tXou4%2BNpyNamsA6HEHt6zSwC_610XchDEA@mail.gmail.com> In-Reply-To: <20140225015903.GB64934@raichu> References: <20140224041454.GB2720@raichu> <CALWP=0bnihYO1%2B4Zy9mtuNBtETh0WTBqj=jFP4z2JgqDDnDsMg@mail.gmail.com> <20140225015903.GB64934@raichu>
next in thread | previous in thread | raw e-mail | index | archive | help
Hey Mark, > That makes sense. Obviously, the same is true on FreeBSD, but what > mostly worried me is the lack of any way to determine whether the > traced process does in fact have scratch space available in its TLS. > What if the executable is statically-linked with a libthr without my > change? In Solaris we decided not to support statically linked binaries (in Solaris 10 I think), so didn't need to worry about this case. (For background, we wanted to have the flexibility to change the syscall interfaces.) > What if I'm attempting to trace a Linux binary running in > the compatibility layer? In Solaris "Branded Zones" we still had libc living in a linkmap in the Linux binary. http://dtrace.org/blogs/ahl/2005/12/13/dtrace-for-linux/ > In these cases, the program will just crash > once it tries to execute instructions in non-executable memory (or it'll > corrupt the thread control block), and I don't see any way to detect or > prevent that in fasttrap. You could consider "blessing" dynamically linked processes in some communication between rtld/libthr and the kernel. > FreeBSD's DTrace implementation also tries to be somewhat > compartmentalized so that it's possible to remove or add DTrace support > without too much work. To my knowledge, it's all currently implemented > using kernel modules and some userland executables and libraries. > Requiring libthr and rtld support would take us in the opposite > direction. I understand and appreciate that design goal -- it makes sense. Fasttrap has the most "sprawl" of any of the DTrace components... > That's true. It seemed to me that having to map 4 KB for every 64 > threads in the process is not too much overhead, but it'd certainly be > preferable to avoid it. In your much wider experience with userland > DTrace, do you know of use cases where this might be likely to cause > problems? Agreed that the actual memory overhead is insignificant. My concern about the actual mappings would be affecting the placement of other mappings and how that might impact a program's execution in terms of chasing away a bug. >> Hope that's helpful. > > It is, I appreciate the explanation. I didn't mean to imply that the > solution used in Solaris is inferior to the approch I followed; I just > felt that it's not so well-suited to FreeBSD. And I didn't infer as much. I agree that each has different goals and design constraints. When presented with the same options it's understandable that FreeBSD might choose a different approach. I wanted to give my perspective on those options and how we had weighed them. Adam > Thanks! > -Mark > >> >> Adam >> >> On Sun, Feb 23, 2014 at 8:14 PM, Mark Johnston <markj@freebsd.org> wrote: >> > Hello, >> > >> > For those not familiar with MD parts of fasttrap, one of the things it >> > has to do is ensure that any userland instruction that it replaces with >> > a breakpoint gets executed in the traced process' context. For several >> > common classes of instructions, fasttrap will emulate the instruction in >> > the breakpoint handler; when it can't do that, it copies the instruction >> > out to some scratch space in the process' address space and sets the PC >> > of the interrupted thread to the address of that instruction, which is >> > followed by a jump to the instruction following the breakpoint. There's >> > a helpful block comment titled "Generic Instruction Tracing" around line >> > 1585 of the x86 fasttrap_isa.c which describes the details of this. >> > >> > This functionality currently doesn't work on FreeBSD, mainly because we >> > don't necessarily have any (per-thread) scratch space available for use >> > in the process' address space. In illumos/Solaris, a small (< 64 byte) >> > block is reserved in each thread's TLS for use by DTrace. It turns out >> > that doing the same thing on FreeBSD is quite easy: >> > >> > http://people.freebsd.org/~markj/patches/fasttrap_scratch_hacky.diff >> > >> > Specifically, we need to ensure that TLS (allocated by the runtime >> > linker) is executable and that we properly extract the offset to the >> > scratch space from the FS segment register. I think this is somewhat >> > hacky though, as it creates a dependency on libthr and rtld internals. >> > >> > A second approach is to have fasttrap dynamically allocate scratch space >> > within the process' address space using vm_map_insert(9). My >> > understanding is that Apple's DTrace implementation does this, and I've >> > implemented this approach for FreeBSD here (which was done without >> > referencing Apple code): >> > >> > http://people.freebsd.org/~markj/patches/fasttrap-scratch-space/fasttrap-scratch-space-1.diff >> > >> > The idea is to map pages of executable memory into the user process as >> > needed, and carve them into scratch space chunks for use by individual >> > threads. If a thread in fasttrap_pid_probe() needs scratch space, it >> > calls a new function, fasttrap_scraddr(). If the thread already has >> > scratch space allocated to it, it's used. Otherwise, if any free scratch >> > space chunks are available in an already-mapped page, one of them is >> > allocated to the thread and used. Otherwise, a new page is mapped using >> > vm_map_insert(9). >> > >> > Threads hold onto their scratch space until they exit. That is, scratch >> > space is never unmapped from the process, even if the controlling >> > dtrace(1) process detaches. I added a handler for thread_dtor event >> > which re-adds any scratch space held by the thread to the free list for >> > that process. Per-process scratch space state is held in the fasttrap >> > process handle (fasttrap_proc_t), since that turns out to be much easier >> > than keeping it in the struct proc. >> > >> > Does anyone have any thoughts or comments on the approach or the patch? >> > Any review or testing would be very much appreciated. >> > >> > For testing purposes, it's helpful to know that tracing memcpy() on >> > amd64 will result in use of this scratch space code, as it starts with a >> > "mov %rdi,%rax" on my machine at least. My main test case has been to >> > run something like >> > >> > # dtrace -n 'pid$target:libc.so.7::entry {@[probefunc] = count()}' -p $(pgrep firefox) >> > >> > Attempting to trace all functions still results in firefox dying with >> > SIGTRAP, but we're getting there. :) >> > >> > Thanks, >> > -- >> > -Mark >> > _______________________________________________ >> > freebsd-dtrace@freebsd.org mailing list >> > https://lists.freebsd.org/mailman/listinfo/freebsd-dtrace >> > To unsubscribe, send any mail to "freebsd-dtrace-unsubscribe@freebsd.org" >> >> >> >> -- >> Adam Leventhal >> CTO, Delphix >> http://blog.delphix.com/ahl > > -- > -Mark -- Adam Leventhal CTO, Delphix http://blog.delphix.com/ahl
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CALWP=0b=wJEq_u=1tXou4%2BNpyNamsA6HEHt6zSwC_610XchDEA>