Date: Sat, 21 Nov 2015 20:41:13 -0800 From: Artem Belevich <art@freebsd.org> To: Mark Johnston <markj@freebsd.org> Cc: freebsd-arch@freebsd.org Subject: Re: zero-cost SDT probes Message-ID: <CAFqOu6h2_4=m81yn=6xXxF58hgM8ydN35AgFRT1_1jhUFmnuog@mail.gmail.com> In-Reply-To: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com> References: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Nov 21, 2015 at 6:45 PM, Mark Johnston <markj@freebsd.org> wrote: > Hi, > > For the past while I've been experimenting with various ways to > implement "zero-cost" SDT DTrace probes. Basically, at the moment an SDT > probe site expands to this: > > if (func_ptr != NULL) > func_ptr(<probe args>); > > I wonder how much of an overhead that currently adds. Do you have any benchmark numbers comparing performance of no SDT, current SDT implementation and "zero-cost" one. --Artem When the probe is enabled, func_ptr is set to dtrace_probe(); otherwise > it's NULL. With zero-cost probes, the SDT_PROBE macros expand to > > func(<probe args>); > > When the kernel is running, each probe site has been overwritten with > NOPs. When a probe is enabled, one of the NOPs is overwritten with a > breakpoint, and the handler uses the PC to figure out which probe fired. > This approach has the benefit of incurring less overhead when the probe > is not enabled; it's more complicated to implement though, which is why > this hasn't already been done. > > I have a working implementation of this for amd64 and i386[1]. Before > adding support for the other arches, I'd like to get some idea as to > whether the approach described below is sound and acceptable. > > The main difficulty is in figuring out where the probe sites actually > are once the kernel is running. In my patch, a probe site is a call to > an externally-defined function which is defined in an > automatically-generated C file. At link time, we first perform a partial > link of all the kernel's object files. Then, a script uses the relocations > against the still-undefined probe functions to generate > 1) stub functions for the probes, so that the kernel can actually be > linked, and > 2) a linker set containing the offsets of each probe site relative to > the beginning of the text section. > The result is linked with the partially-linked kernel to generate the > final kernel file. > > During boot, we iterate over the linker set, using the offsets plus the > address of btext to overwrite probe sites with NOPs. SDT probes in kernel > modules are handled differently (and more simply): the kernel linker just > has special handling for relocations against symbols named __dtrace_sdt_*; > this is how illumos/Solaris implements all of this. > > My uncertainty revolves around the use of relocations in the > partially-linked kernel to determine the address of probe sites in the > running kernel. With the GNU ld in base, this happens to work because > the final link doesn't modify the text section. Is this something I can > rely upon? Will this assumption be false with the advent of lld and LTO? > Are there other, cleaner ways to implement what I described above? > > Thanks, > -Mark > > [1] https://people.freebsd.org/~markj/patches/sdt-zerocost/ > _______________________________________________ > freebsd-arch@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFqOu6h2_4=m81yn=6xXxF58hgM8ydN35AgFRT1_1jhUFmnuog>