Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 21 Nov 2015 20:41:13 -0800
From:      Artem Belevich <art@freebsd.org>
To:        Mark Johnston <markj@freebsd.org>
Cc:        freebsd-arch@freebsd.org
Subject:   Re: zero-cost SDT probes
Message-ID:  <CAFqOu6h2_4=m81yn=6xXxF58hgM8ydN35AgFRT1_1jhUFmnuog@mail.gmail.com>
In-Reply-To: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com>
References:  <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Nov 21, 2015 at 6:45 PM, Mark Johnston <markj@freebsd.org> wrote:

> Hi,
>
> For the past while I've been experimenting with various ways to
> implement "zero-cost" SDT DTrace probes. Basically, at the moment an SDT
> probe site expands to this:
>
> if (func_ptr != NULL)
>         func_ptr(<probe args>);
>
>
I wonder how much of an overhead that currently adds. Do you have any
benchmark numbers comparing performance of no SDT, current SDT
implementation and "zero-cost" one.

--Artem

When the probe is enabled, func_ptr is set to dtrace_probe(); otherwise
> it's NULL. With zero-cost probes, the SDT_PROBE macros expand to
>
> func(<probe args>);
>
> When the kernel is running, each probe site has been overwritten with
> NOPs. When a probe is enabled, one of the NOPs is overwritten with a
> breakpoint, and the handler uses the PC to figure out which probe fired.
> This approach has the benefit of incurring less overhead when the probe
> is not enabled;

it's more complicated to implement though, which is why
> this hasn't already been done.
>
> I have a working implementation of this for amd64 and i386[1]. Before
> adding support for the other arches, I'd like to get some idea as to
> whether the approach described below is sound and acceptable.
>
> The main difficulty is in figuring out where the probe sites actually
> are once the kernel is running. In my patch, a probe site is a call to
> an externally-defined function which is defined in an
> automatically-generated C file. At link time, we first perform a partial
> link of all the kernel's object files. Then, a script uses the relocations
> against the still-undefined probe functions to generate
> 1) stub functions for the probes, so that the kernel can actually be
>    linked, and
> 2) a linker set containing the offsets of each probe site relative to
>    the beginning of the text section.
> The result is linked with the partially-linked kernel to generate the
> final kernel file.
>
> During boot, we iterate over the linker set, using the offsets plus the
> address of btext to overwrite probe sites with NOPs. SDT probes in kernel
> modules are handled differently (and more simply): the kernel linker just
> has special handling for relocations against symbols named __dtrace_sdt_*;
> this is how illumos/Solaris implements all of this.
>
> My uncertainty revolves around the use of relocations in the
> partially-linked kernel to determine the address of probe sites in the
> running kernel. With the GNU ld in base, this happens to work because
> the final link doesn't modify the text section. Is this something I can
> rely upon? Will this assumption be false with the advent of lld and LTO?
> Are there other, cleaner ways to implement what I described above?
>
> Thanks,
> -Mark
>
> [1] https://people.freebsd.org/~markj/patches/sdt-zerocost/
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFqOu6h2_4=m81yn=6xXxF58hgM8ydN35AgFRT1_1jhUFmnuog>