Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 25 Nov 2015 15:25:24 -0800
From:      Mark Johnston <markj@FreeBSD.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        freebsd-arch@FreeBSD.org
Subject:   Re: zero-cost SDT probes
Message-ID:  <20151125232524.GB67865@wkstn-mjohnston.west.isilon.com>
In-Reply-To: <20151125131533.GB3448@kib.kiev.ua>
References:  <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com> <20151123113511.GX58629@kib.kiev.ua> <20151125001136.GB70878@wkstn-mjohnston.west.isilon.com> <20151125131533.GB3448@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Nov 25, 2015 at 03:15:33PM +0200, Konstantin Belousov wrote:
> On Tue, Nov 24, 2015 at 04:11:36PM -0800, Mark Johnston wrote:
> > If I understood correctly, each probe site would require a separate page
> > in KVA to be able to enable and disable individual probes in the manner
> > that I described in a previous reply. Today, a kernel with lock inlining
> > has thousands of probe sites; wouldn't the requirement of allocating KVA
> > for each of them be prohibitive on 32-bit architectures?
> 
> Several variations of the approach allow to control each probe site
> individually, while still avoiding jumps and reducing the cache consumption.
> And, of course, the biggest advantage is avoiding the need to change the
> text at runtime.
> 
> E.g., you could have a byte allocated somewhere for each probe, with usual
> boolean values true/false for enabled/disabled state.  Also, somewhere,
> you have two KVA pages allocated, say, starting at address p, the first
> page is mapped, the second page is not.  The pages are shared between all
> probes.  Then, the following code sequence would trigger the page fault
> only for enabled probe:
> 	movzbl	this_probe_enable_byte, %eax
> 	movl	(p + PAGE_SIZE - 4)(%eax), %eax
> This approach is quite portable and can be expressed in C.
> 
> If expected count of probes is thousands, as you mentioned, then you
> would pay only for several KB of memory for enable control bytes.
> 
> Another variant is possible with the use of INTO instruction, which
> has relatively low latency when not trapping, according to the Agner
> Fog tables.

I see. I think this could be made to work, but there's still the
complication of passing arguments to the probe. Copying them into some
block in curthread is one way to do this, but it seems more expensive
than the standard calling convention on amd64 at least.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20151125232524.GB67865>