From owner-freebsd-arch@freebsd.org Sun Nov 22 16:44:50 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C3D2EA35565 for ; Sun, 22 Nov 2015 16:44:50 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (relay02.stack.nl [IPv6:2001:610:1108:5010::104]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "mailhost.stack.nl", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 8EFBD177D; Sun, 22 Nov 2015 16:44:50 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mx1.stack.nl (Postfix) with ESMTP id 873423592FA; Sun, 22 Nov 2015 17:44:46 +0100 (CET) Received: by snail.stack.nl (Postfix, from userid 1677) id 3F1D528494; Sun, 22 Nov 2015 17:44:46 +0100 (CET) Date: Sun, 22 Nov 2015 17:44:46 +0100 From: Jilles Tjoelker To: Mark Johnston Cc: freebsd-arch@FreeBSD.org Subject: Re: zero-cost SDT probes Message-ID: <20151122164446.GA22980@stack.nl> References: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Nov 2015 16:44:50 -0000 On Sat, Nov 21, 2015 at 06:45:42PM -0800, Mark Johnston wrote: > For the past while I've been experimenting with various ways to > implement "zero-cost" SDT DTrace probes. Basically, at the moment an SDT > probe site expands to this: > if (func_ptr != NULL) > func_ptr(); > When the probe is enabled, func_ptr is set to dtrace_probe(); otherwise > it's NULL. With zero-cost probes, the SDT_PROBE macros expand to > func(); > When the kernel is running, each probe site has been overwritten with > NOPs. When a probe is enabled, one of the NOPs is overwritten with a > breakpoint, and the handler uses the PC to figure out which probe fired. > This approach has the benefit of incurring less overhead when the probe > is not enabled; it's more complicated to implement though, which is why > this hasn't already been done. > I have a working implementation of this for amd64 and i386[1]. Before > adding support for the other arches, I'd like to get some idea as to > whether the approach described below is sound and acceptable. I have not run any benchmarks but I expect that this removes only a small part of the overhead of disabled probes. Saving and restoring caller-save registers and setting up parameters certainly increases code size and I-cache use. On the other hand, a branch that is always or never taken will generally cost at most 2 cycles. Avoiding this overhead would require not generating an ABI function call but a point where the probe parameters can be calculated from the registers and stack frame (like how a debugger prints local variables, but with a guarantee that "optimized out" will not happen). This requires compiler changes, though, and DTrace has generally not used DWARF-like debug information. For a fairer comparison, the five NOPs should be changed to one or two longer NOPs, since many CPUs decode at most 3 or 4 instructions per cycle. Some examples of longer NOPs are in contrib/llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp X86AsmBackend::writeNopData(). The two-byte NOP 0x66, 0x90 works on any x86 CPU. -- Jilles Tjoelker