From owner-freebsd-arch@freebsd.org  Wed Nov 25 00:09:56 2015
Return-Path: <owner-freebsd-arch@freebsd.org>
Delivered-To: freebsd-arch@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 56D6AA363D7
 for <freebsd-arch@mailman.ysv.freebsd.org>;
 Wed, 25 Nov 2015 00:09:56 +0000 (UTC)
 (envelope-from markjdb@gmail.com)
Received: from mail-vk0-x230.google.com (mail-vk0-x230.google.com
 [IPv6:2607:f8b0:400c:c05::230])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 084001B0A
 for <freebsd-arch@freebsd.org>; Wed, 25 Nov 2015 00:09:56 +0000 (UTC)
 (envelope-from markjdb@gmail.com)
Received: by vkha189 with SMTP id a189so24028368vkh.2
 for <freebsd-arch@freebsd.org>; Tue, 24 Nov 2015 16:09:55 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=sender:date:from:to:cc:subject:message-id:references:mime-version
 :content-type:content-disposition:in-reply-to:user-agent;
 bh=4U1GwqIdwNDc6515VEg7JL4AsQsqDZdgZQVmUB00pLY=;
 b=g8cu0rVEvloaMlIHz8mazZXXYGokoUZZW6idSf75XQW5oo5loLtdB7Hya+U/MD838I
 lePwvRzgu7FSYy1SV0clMbSbBtXKBRhM3cbcGKiVuf3hIlGS9Q4lOKCz0lwiMBdVXHvE
 MgFs8Y6gei/LpMZXAnZvaBFDumEcCiZy2lLmXN1i6aqxBVcuzMPZYyw0PVbmlbw1s+Ci
 JyUafozEeboMjeQ2MsLbjQ7FDczGSiRNwvf2e2PW8n55Ov7/8yY6JJ6pJl4ciS0LZv3l
 oQpxkgoXwiX6dMhNR4KcJgrwM9WoFckA8U5pANdhHWVgGekhMqZunsVKvcLam3dDjf4p
 mWzw==
X-Received: by 10.31.147.81 with SMTP id v78mr28935089vkd.58.1448410195094;
 Tue, 24 Nov 2015 16:09:55 -0800 (PST)
Received: from wkstn-mjohnston.west.isilon.com
 (c-67-182-131-225.hsd1.wa.comcast.net. [67.182.131.225])
 by smtp.gmail.com with ESMTPSA id c190sm16598416vkc.16.2015.11.24.16.09.54
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Tue, 24 Nov 2015 16:09:54 -0800 (PST)
Sender: Mark Johnston <markjdb@gmail.com>
Date: Tue, 24 Nov 2015 16:11:36 -0800
From: Mark Johnston <markj@FreeBSD.org>
To: Konstantin Belousov <kostikbel@gmail.com>
Cc: freebsd-arch@FreeBSD.org
Subject: Re: zero-cost SDT probes
Message-ID: <20151125001136.GB70878@wkstn-mjohnston.west.isilon.com>
References: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com>
 <20151123113511.GX58629@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20151123113511.GX58629@kib.kiev.ua>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Nov 2015 00:09:56 -0000

On Mon, Nov 23, 2015 at 01:35:11PM +0200, Konstantin Belousov wrote:
> On Sat, Nov 21, 2015 at 06:45:42PM -0800, Mark Johnston wrote:
> > Hi,
> > 
> > For the past while I've been experimenting with various ways to
> > implement "zero-cost" SDT DTrace probes. Basically, at the moment an SDT
> > probe site expands to this:
> > 
> > if (func_ptr != NULL)
> > 	func_ptr(<probe args>);
> > 
> > When the probe is enabled, func_ptr is set to dtrace_probe(); otherwise
> > it's NULL. With zero-cost probes, the SDT_PROBE macros expand to
> > 
> > func(<probe args>);
> > 
> > When the kernel is running, each probe site has been overwritten with
> > NOPs. When a probe is enabled, one of the NOPs is overwritten with a
> > breakpoint, and the handler uses the PC to figure out which probe fired.
> > This approach has the benefit of incurring less overhead when the probe
> > is not enabled; it's more complicated to implement though, which is why
> > this hasn't already been done.
> > 
> > I have a working implementation of this for amd64 and i386[1]. Before
> > adding support for the other arches, I'd like to get some idea as to
> > whether the approach described below is sound and acceptable.
> > 
> > The main difficulty is in figuring out where the probe sites actually
> > are once the kernel is running. In my patch, a probe site is a call to
> > an externally-defined function which is defined in an
> > automatically-generated C file. At link time, we first perform a partial
> > link of all the kernel's object files. Then, a script uses the relocations
> > against the still-undefined probe functions to generate
> > 1) stub functions for the probes, so that the kernel can actually be
> >    linked, and
> > 2) a linker set containing the offsets of each probe site relative to
> >    the beginning of the text section.
> > The result is linked with the partially-linked kernel to generate the
> > final kernel file.
> > 
> > During boot, we iterate over the linker set, using the offsets plus the
> > address of btext to overwrite probe sites with NOPs. SDT probes in kernel
> > modules are handled differently (and more simply): the kernel linker just
> > has special handling for relocations against symbols named __dtrace_sdt_*;
> > this is how illumos/Solaris implements all of this.
> > 
> > My uncertainty revolves around the use of relocations in the
> > partially-linked kernel to determine the address of probe sites in the
> > running kernel. With the GNU ld in base, this happens to work because
> > the final link doesn't modify the text section. Is this something I can
> > rely upon? Will this assumption be false with the advent of lld and LTO?
> > Are there other, cleaner ways to implement what I described above?
> 
> You could consider using a cheap instruction which is conditionally
> converted into the trap, instead. E.g., you could have global page frame
> in KVA allocated, and for the normal operations, keep the page mapped
> with backing by a scratch page. The probe would be a volatile read from
> the page.
> 
> When probes are activated, the page is unmapped, which converts the read
> into the page fault. This is similar to the write barriers implemented
> in some garbare collectors.
> 
> There are two issues with this scheme:
> - The cost of probe is relatively large, even if the low level trap
> handler is further modified to recognize the probes by special
> address access.
> - The arguments passed to the probes should be put into some predefined
> place, e.g. somwhere in the *curthread, since trap handler cannot fetch
> them using the ABI conventions.
> 
> As I mentioned above, this scheme is used by several implementations of
> the language runtimes, but there gc pauses are rare, and slightly larger
> cost of the even stopping the mutator is justified even by negligible
> cost reduction for normal flow. I am not sure if this approach worths
> the complications and overhead for probes.

If I understood correctly, each probe site would require a separate page
in KVA to be able to enable and disable individual probes in the manner
that I described in a previous reply. Today, a kernel with lock inlining
has thousands of probe sites; wouldn't the requirement of allocating KVA
for each of them be prohibitive on 32-bit architectures?