From owner-freebsd-arch@freebsd.org Sun Nov 22 02:44:34 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A6374A3598F for ; Sun, 22 Nov 2015 02:44:34 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-vk0-x231.google.com (mail-vk0-x231.google.com [IPv6:2607:f8b0:400c:c05::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 612431999 for ; Sun, 22 Nov 2015 02:44:34 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by vkfr145 with SMTP id r145so19407668vkf.0 for ; Sat, 21 Nov 2015 18:44:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:subject:message-id:mime-version:content-type :content-disposition:user-agent; bh=Xgcntk3AjxLgbT0bGRs3UAw06gU/I78V11PZwUyAwac=; b=tjdwMeDq2COXkZrrq5PVeIYzfWbKV9dWuXeCkZsKqMT0FDPNm7UDwH+XIs+o6ATkeM moHeL3JGhaTstHK0a3OPJRMXNT7BZ3GF9dx2sOlyof7pgc5JqJDWHBivVlCg0CLSBn6V Wf0Gyhfr3WiJPAINcUTFtJfjzgcF7TGSEPXx7LuGGoOBDpx1RVgMMmpYBCQ+h0ek1ol5 yX0T62DYezJ09eWeEbnEH95rP2D+T4O+DufiXBC+G3EgDFvStiKOfpkEWd1VA6FhVqaH ghsa+6mymfMRrCbqMSuGBV6CkUlugO+4wZkmtOBAemCPlLjmW3BCtSF/vsyCa5Px7+kT 7SdA== X-Received: by 10.31.47.88 with SMTP id v85mr9438362vkv.118.1448160273473; Sat, 21 Nov 2015 18:44:33 -0800 (PST) Received: from wkstn-mjohnston.west.isilon.com (c-67-182-131-225.hsd1.wa.comcast.net. [67.182.131.225]) by smtp.gmail.com with ESMTPSA id v145sm5514650vkv.6.2015.11.21.18.44.32 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 21 Nov 2015 18:44:33 -0800 (PST) Sender: Mark Johnston Date: Sat, 21 Nov 2015 18:45:42 -0800 From: Mark Johnston To: freebsd-arch@FreeBSD.org Subject: zero-cost SDT probes Message-ID: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Nov 2015 02:44:34 -0000 Hi, For the past while I've been experimenting with various ways to implement "zero-cost" SDT DTrace probes. Basically, at the moment an SDT probe site expands to this: if (func_ptr != NULL) func_ptr(); When the probe is enabled, func_ptr is set to dtrace_probe(); otherwise it's NULL. With zero-cost probes, the SDT_PROBE macros expand to func(); When the kernel is running, each probe site has been overwritten with NOPs. When a probe is enabled, one of the NOPs is overwritten with a breakpoint, and the handler uses the PC to figure out which probe fired. This approach has the benefit of incurring less overhead when the probe is not enabled; it's more complicated to implement though, which is why this hasn't already been done. I have a working implementation of this for amd64 and i386[1]. Before adding support for the other arches, I'd like to get some idea as to whether the approach described below is sound and acceptable. The main difficulty is in figuring out where the probe sites actually are once the kernel is running. In my patch, a probe site is a call to an externally-defined function which is defined in an automatically-generated C file. At link time, we first perform a partial link of all the kernel's object files. Then, a script uses the relocations against the still-undefined probe functions to generate 1) stub functions for the probes, so that the kernel can actually be linked, and 2) a linker set containing the offsets of each probe site relative to the beginning of the text section. The result is linked with the partially-linked kernel to generate the final kernel file. During boot, we iterate over the linker set, using the offsets plus the address of btext to overwrite probe sites with NOPs. SDT probes in kernel modules are handled differently (and more simply): the kernel linker just has special handling for relocations against symbols named __dtrace_sdt_*; this is how illumos/Solaris implements all of this. My uncertainty revolves around the use of relocations in the partially-linked kernel to determine the address of probe sites in the running kernel. With the GNU ld in base, this happens to work because the final link doesn't modify the text section. Is this something I can rely upon? Will this assumption be false with the advent of lld and LTO? Are there other, cleaner ways to implement what I described above? Thanks, -Mark [1] https://people.freebsd.org/~markj/patches/sdt-zerocost/ From owner-freebsd-arch@freebsd.org Sun Nov 22 04:41:16 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C4DCEA32BDF for ; Sun, 22 Nov 2015 04:41:16 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-lb0-x235.google.com (mail-lb0-x235.google.com [IPv6:2a00:1450:4010:c04::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4E3AD12EA; Sun, 22 Nov 2015 04:41:16 +0000 (UTC) (envelope-from artemb@gmail.com) Received: by lbbkw15 with SMTP id kw15so80739460lbb.0; Sat, 21 Nov 2015 20:41:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=EEVJy4aG/EYy3oWYchSf1yIpzFZx38Vqnzn09IUxH0I=; b=KKCoc0W51olntd6bGBE8BMqTeXgZnIOginrFg+rQFsCKC59H+29GxjqWoqAhNEjlkE Vnf6OBCxLfcfPqmMm0329tP9z1i6IRpMoC9gwqs465falbKxvuftsh+AaNH4FP3b1SHv 9S7w8iWxE8nWcHoSagYKimyMn5MzcYbIz49VMpSVZj1V6QvdDvdDXRnsjUlK6wDQoudF C1R/jaA/7pVNyYCOkiVDJYntChnP6hPFHyVfTH43GGCExNRNSdRfTs80ECQlsZ0uLsyk kAkQgrnf/6U1SnmUX19Gq/jVDh/kYJyPqC7NNtn/qthFOjzwDBJ2jzf3iBPkdtUuW/kS f7hw== MIME-Version: 1.0 X-Received: by 10.112.198.106 with SMTP id jb10mr8659142lbc.111.1448167273712; Sat, 21 Nov 2015 20:41:13 -0800 (PST) Sender: artemb@gmail.com Received: by 10.25.207.1 with HTTP; Sat, 21 Nov 2015 20:41:13 -0800 (PST) In-Reply-To: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com> References: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com> Date: Sat, 21 Nov 2015 20:41:13 -0800 X-Google-Sender-Auth: JwsxpNg6fjtWZSIxnI96fXyTOug Message-ID: Subject: Re: zero-cost SDT probes From: Artem Belevich To: Mark Johnston Cc: freebsd-arch@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Nov 2015 04:41:16 -0000 On Sat, Nov 21, 2015 at 6:45 PM, Mark Johnston wrote: > Hi, > > For the past while I've been experimenting with various ways to > implement "zero-cost" SDT DTrace probes. Basically, at the moment an SDT > probe site expands to this: > > if (func_ptr != NULL) > func_ptr(); > > I wonder how much of an overhead that currently adds. Do you have any benchmark numbers comparing performance of no SDT, current SDT implementation and "zero-cost" one. --Artem When the probe is enabled, func_ptr is set to dtrace_probe(); otherwise > it's NULL. With zero-cost probes, the SDT_PROBE macros expand to > > func(); > > When the kernel is running, each probe site has been overwritten with > NOPs. When a probe is enabled, one of the NOPs is overwritten with a > breakpoint, and the handler uses the PC to figure out which probe fired. > This approach has the benefit of incurring less overhead when the probe > is not enabled; it's more complicated to implement though, which is why > this hasn't already been done. > > I have a working implementation of this for amd64 and i386[1]. Before > adding support for the other arches, I'd like to get some idea as to > whether the approach described below is sound and acceptable. > > The main difficulty is in figuring out where the probe sites actually > are once the kernel is running. In my patch, a probe site is a call to > an externally-defined function which is defined in an > automatically-generated C file. At link time, we first perform a partial > link of all the kernel's object files. Then, a script uses the relocations > against the still-undefined probe functions to generate > 1) stub functions for the probes, so that the kernel can actually be > linked, and > 2) a linker set containing the offsets of each probe site relative to > the beginning of the text section. > The result is linked with the partially-linked kernel to generate the > final kernel file. > > During boot, we iterate over the linker set, using the offsets plus the > address of btext to overwrite probe sites with NOPs. SDT probes in kernel > modules are handled differently (and more simply): the kernel linker just > has special handling for relocations against symbols named __dtrace_sdt_*; > this is how illumos/Solaris implements all of this. > > My uncertainty revolves around the use of relocations in the > partially-linked kernel to determine the address of probe sites in the > running kernel. With the GNU ld in base, this happens to work because > the final link doesn't modify the text section. Is this something I can > rely upon? Will this assumption be false with the advent of lld and LTO? > Are there other, cleaner ways to implement what I described above? > > Thanks, > -Mark > > [1] https://people.freebsd.org/~markj/patches/sdt-zerocost/ > _______________________________________________ > freebsd-arch@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > From owner-freebsd-arch@freebsd.org Sun Nov 22 06:29:50 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6CBE5A35C2E for ; Sun, 22 Nov 2015 06:29:50 +0000 (UTC) (envelope-from sjg@juniper.net) Received: from na01-bn1-obe.outbound.protection.outlook.com (mail-bn1on0147.outbound.protection.outlook.com [157.56.110.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "MSIT Machine Auth CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CB17715F0; Sun, 22 Nov 2015 06:29:48 +0000 (UTC) (envelope-from sjg@juniper.net) Received: from SN1PR05CA0041.namprd05.prod.outlook.com (10.163.68.179) by DM2PR0501MB1392.namprd05.prod.outlook.com (10.161.224.139) with Microsoft SMTP Server (TLS) id 15.1.325.17; Sun, 22 Nov 2015 06:29:40 +0000 Received: from BN1BFFO11FD001.protection.gbl (2a01:111:f400:7c10::1:197) by SN1PR05CA0041.outlook.office365.com (2a01:111:e400:5197::51) with Microsoft SMTP Server (TLS) id 15.1.331.20 via Frontend Transport; Sun, 22 Nov 2015 06:29:40 +0000 Authentication-Results: spf=softfail (sender IP is 66.129.239.18) smtp.mailfrom=juniper.net; freebsd.org; dkim=none (message not signed) header.d=none;freebsd.org; dmarc=none action=none header.from=juniper.net; Received-SPF: SoftFail (protection.outlook.com: domain of transitioning juniper.net discourages use of 66.129.239.18 as permitted sender) Received: from p-emfe01b-sac.jnpr.net (66.129.239.18) by BN1BFFO11FD001.mail.protection.outlook.com (10.58.144.64) with Microsoft SMTP Server (TLS) id 15.1.331.11 via Frontend Transport; Sun, 22 Nov 2015 06:29:39 +0000 Received: from magenta.juniper.net (172.17.27.123) by p-emfe01b-sac.jnpr.net (172.24.192.21) with Microsoft SMTP Server (TLS) id 14.3.123.3; Sat, 21 Nov 2015 22:29:38 -0800 Received: from chaos.jnpr.net (chaos.jnpr.net [172.21.16.28]) by magenta.juniper.net (8.11.3/8.11.3) with ESMTP id tAM6TbD83061; Sat, 21 Nov 2015 22:29:37 -0800 (PST) (envelope-from sjg@juniper.net) Received: from chaos (localhost [IPv6:::1]) by chaos.jnpr.net (Postfix) with ESMTP id 80EDC580A9; Sat, 21 Nov 2015 22:29:37 -0800 (PST) To: Mark Johnston CC: , Subject: Re: zero-cost SDT probes In-Reply-To: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com> References: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com> Comments: In-reply-to: Mark Johnston message dated "Sat, 21 Nov 2015 18:45:42 -0800." From: "Simon J. Gerraty" X-Mailer: MH-E 8.6; nmh 1.6; GNU Emacs 24.5.1 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <4502.1448173777.1@chaos> Date: Sat, 21 Nov 2015 22:29:37 -0800 Message-ID: <2753.1448173777@chaos> X-EOPAttributedMessage: 0 X-Microsoft-Exchange-Diagnostics: 1; BN1BFFO11FD001; 1:TyDcORSH+hqBaArDvwrMK6Zt3gXMKGL080avJ8tSW2mJB5LPEjizCBO81euiyutYrcuJaetKiFx8saSYXAIYMjRkpd+DW3K/A5I7pjIn/NSv675pndv3cWL4mvgLt9fVWuByeQvCz8WECZmIlk+a5eps2SkQ32PrqSRjMW3G4qq8hDinIkHmOBtHcfQjTvECAfPZJhJas6rTBXxlzQVY4IRRUMsNtz6GnmgtDzvQ48/33zllslhcWAzux4M7wq7Wy7zHAnWyWV2RlvpswuOd0wCTg2/PCLdFt4feozNz9PzLblk8iwC9Kpbu7b2QXZlXrpUbVNhCLReoa67TgqT74FQW/u366rJcTk1/k18UqOo= X-Forefront-Antispam-Report: CIP:66.129.239.18; CTRY:US; IPV:NLI; EFV:NLI; SFV:NSPM; SFS:(10019020)(6009001)(2980300002)(199003)(24454002)(189002)(92566002)(106466001)(87936001)(107886002)(97756001)(57986006)(5001960100002)(50226001)(50466002)(33716001)(5007970100001)(6806005)(105596002)(117636001)(50986999)(76176999)(76506005)(5008740100001)(586003)(19580395003)(110136002)(69596002)(86362001)(77096005)(11100500001)(450100001)(46406003)(47776003)(2950100001)(19580405001)(23726003)(97736004)(81156007)(189998001)(4001430100002)(42262002)(62816006); DIR:OUT; SFP:1102; SCL:1; SRVR:DM2PR0501MB1392; H:p-emfe01b-sac.jnpr.net; FPR:; SPF:SoftFail; PTR:InfoDomainNonexistent; A:1; MX:1; LANG:en; X-Microsoft-Exchange-Diagnostics: 1; DM2PR0501MB1392; 2:sImf3pDTKd1h9eExnZ3F2FkJeG5fbXcPXYc8AykJbE72BXq6zg+MLNt7x7uaCQOPxhBzegkvcf89lunPh3qVyuDFLl0+cGubjkfqrhjDkappEp0Rm91SJSg2xcBtMPVHFR4bBTyEJ4O1VioUPwuWEx5Gs2lskHpu6aZg53WoAUM=; 3:RmVMd/IrHo+KbFSTlwbrTj6IhWrhz3bQF4BkbVI8DZPXrDkbp7apE5/Pjtw54qt79Af6vECx+TyW4GcxfiESbf7tt1d23zrjUDg8HFMKQPiyNCfhMlnzG5j5zfpwlF23igWqfGEDxLEZgk/prQtUxXvj9/2NnOBSafzgCu4lqtEU/5nCx2mb6sBcbhk5fBiJmlYxcaatA7xGA5MMMAFxr6TqDRKFLHIhxTGfFBs+1LI=; 25:RYDTDs0/BvJc3NnW6UXAJ3zkKtotsm4Pf17fHOJLpsEs4nZDuPZ6uJY97nGk1PKVtIq6+rAQyj2F3FnB+vZrEAG8vcbghOggr5ra6FHb55pr6wpLpd7/MtZBcu/yys1ExL42gbMg89YV5rSjTDlJDM0j9YW1qy/R8PzTU3l04ANQEdDyWAge4oG1LYK6R9MDt+dAtRP851qfCu7syf9i/aiBNcNYOYyDy6mKaB/vys2+h1mBAol0vPiP8i/u9hXcQCZR2M6Euoq3B8bAepiD8Q== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DM2PR0501MB1392; X-Microsoft-Exchange-Diagnostics: 1; DM2PR0501MB1392; 20:fnFsAsa4ZQbp/PaievALBBDDZ8kIeenjQpe0UgG8KPH1drasbj2i8+AvdM/MbKZJnZxL4DOABAwNpHkcs/4UvLxSsNTkKb0lEQfp0mlAo8j2rqoDFIxXALlua4zB3rdPR+UJwBgrUtArURDBH/ts63lAQPdot9wbzb40U4exV9+4mBKlo+fxRrPGekb83kfna9M4NPUb8fCz4u1V6c5jbhpLScLP6LtP4ID6BHeSDp1Rbv+x5gRUGIDPiuH/mgOUYU4A7HZqbc6ZT4RaAwFT251Wn38frk/VOiD5AdmWPmhOeMV662S6SKucSXkrStSPT4Dw6ghU7cxBIJMWKiHFg96bqw3escn8PriCASuvVjOLj720cJer19Q0B5qcyFpS/ldKIhwd9MQtiCc15DbUveLKtIZxGfce6VKgc+qnDMJBpEFNiTKl/U4L/kjEuSK3SZA/vqZkggGw6DfOt+rVclPgVmDYnwJOR1B7m+FkFZ/GGsjyT6OS+0qHgo7XQ+y1; 4:MiaoC6tLPJFdRx8cJhIL8Uw9xtBnFBj16FZJHN53THfMuP87uw+9NngGvEFUa0HyV4SL/RQGI5SHN1YD/xm6j4T1MnmTP9h7CBkJlwaB3Kn/yM33Ik8oZsr+kTvyR7OGI5YxM7fVWnum15J9skl4N8aP3YqDZBVtzo02tv1aTYLNXxPWdEE3P387yOHIMV/t1jYHkYN/2cB7goG+vPuiMgVftTgT+RYzns6zkipziy+xdmxtKsxO/ZrK+O2bAs1OVzb3eHkN0MhlsirImX/9DqBAKB8drbnSKNGbahx4s9zVOKIqj4axKhhu0tG5aHYNNB2thm8oozYiUIhPMpC/chQI7mV0YuHFtV42cJSD0EyY/FOSiRs6ubar6AdQic63 X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(601004)(2401047)(8121501046)(520078)(5005006)(10201501046)(3002001); SRVR:DM2PR0501MB1392; BCL:0; PCL:0; RULEID:; SRVR:DM2PR0501MB1392; X-Forefront-PRVS: 076804FE30 X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; DM2PR0501MB1392; 23:qHAg0ryD85ddRooA2hmKPdkSCN2xG3+xxULK1EY?= =?us-ascii?Q?/YoI5+/gcyQ2b5hH9jEqWC2b1CrxtqzSHgm+35imzOXDSL/HVOvpvAWn5Lod?= =?us-ascii?Q?d0iJlCzQTvPiyH9TBhqvtnuRR5HW7e2Q0ngsC8SmODzfG12sOm7IlCyDqFW1?= =?us-ascii?Q?cgmz6m4LP5KPSNV14CvDIcxQJ8LA7bdRywq7AAG93GXUI4oycMwlgNRc9IRB?= =?us-ascii?Q?zhMwMAwA6KEzr+GK3p0JeazGceJEh0xSWLgSvg6jleiK9gDHozlWNNocKYZo?= =?us-ascii?Q?OJTMgmfq1o/gueXyBmmYarWTre8flR4MLWG6YYuWQrJixYrDnzapzpAlfkjp?= =?us-ascii?Q?3tKUmwtgNjc23pr8sZW35FoukJLlJ4qTT+B2gtL6s/BMBahMhEw1UffnqnaL?= =?us-ascii?Q?NRs1ppl+ZGQq4N05Wr8hwrGw982ytvusyt2XbN5RpZzK5VwGld+oiqad3iME?= =?us-ascii?Q?b8kzGQB0s4+OQXKAm4rd2f2Q4N5piUWn/IayQPBA/DhzEhWKK2bUiwJtYHti?= =?us-ascii?Q?rmmtBkV4qJ4sjxgPzMSYzrgSlW+P5KP8hOcMxMqKRjwmt0mU7truX9dxakcX?= =?us-ascii?Q?mQRKxit0CfmyHAPSGv6Hz3hpVXrYPgV3kBMg22grsUX6wG5ww+bGhxwcuXtT?= =?us-ascii?Q?vudfbuybUUHqK4Nv46oP27BmHMyb9XhBEzReAdD+oYqmy++K8L78LOqrYdjh?= =?us-ascii?Q?B5Pog5L+br7fNDi/rbXT/gq3LqHKR6N2okBn/T7H+CmM9YZ+LgbQYNLNc5/9?= =?us-ascii?Q?evZ1RK06tIQ3LQl4065bBDN+53eV0sM4pPfV4KdelZHtU+q0WXjY6utimYw/?= =?us-ascii?Q?puvQpF4U8y5hNhDyrC7k6h97EOCMaGz9iboO9Csm7a87QNBeeOPLp6gdP4ad?= =?us-ascii?Q?O/Alm90jYP7MvVqPs0kNmTpTY0UVLWU9PbVLN7Svru4XOGhU4nvpPv6F3cQP?= =?us-ascii?Q?GiWIHirCyqJYbNZd+4a7yehoVkw67aQ7NK9Ld3UcskdxHVnHN8ijxCuIpkO3?= =?us-ascii?Q?/9jWI/Avm2T8uoNhjfodgrRMJoxKdbGwgA8XueDgl1zbSSr4WbGWFFNL5clw?= =?us-ascii?Q?oMhoyRqN2GWZomerjTrOdey/EHEEPjukgvQNM81yxUIHFgwFbHGZXcjZD2vy?= =?us-ascii?Q?CbI0x42/P4M0=3D?= X-Microsoft-Exchange-Diagnostics: 1; DM2PR0501MB1392; 5:lj9dI9Hg/zTM89jV/SWmxVK4Y1POQqgBs5L7ErwQKvIPBhvaAmuCWiROIF7J6u94+sfZmcnLDPWmNwksksBe0iQ1Kh2rf2c/9dnffGN8LlV5ulyp7KlkE+mh4b6w/i+WgSwjRUSEdG/OYkz45aSvCg==; 24:SkIkdJ6j/ZJfqDz+iokMy02ermh50oLwwMkeIx+LWd4gAreC/y2OLVHFsq/COiloyD5oPGVCS+HvlAbOy/Hfu82oZXwEd64is2gqPXhqKYQ=; 20:wAmVHKoujo42DIlzKf2ap+kWuLtOpBSQ8Jc7rMPiU+vQdUwo7p1bM2q1HL7OMzWxlluSZv96DOKCTWMUgRVRnw== SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: juniper.net X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Nov 2015 06:29:39.5236 (UTC) X-MS-Exchange-CrossTenant-Id: bea78b3c-4cdb-4130-854a-1d193232e5f4 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=bea78b3c-4cdb-4130-854a-1d193232e5f4; Ip=[66.129.239.18]; Helo=[p-emfe01b-sac.jnpr.net] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM2PR0501MB1392 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Nov 2015 06:29:50 -0000 Mark Johnston wrote: > For the past while I've been experimenting with various ways to > implement "zero-cost" SDT DTrace probes. Basically, at the moment an SDT > probe site expands to this: Would it be feasible to compile the probes into the kernel as active calls to a registrar function? That would eliminate all the complexity of finding PC's though you'd probably need to pass extra args to convey the point of the probe? It would hurt boot time a little too - each probe point would make a call to register itself (and get overwritten with nops as a reward) but very simple? From owner-freebsd-arch@freebsd.org Sun Nov 22 10:50:37 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0048CA357EC for ; Sun, 22 Nov 2015 10:50:37 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B38821949; Sun, 22 Nov 2015 10:50:36 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD)) (envelope-from ) id 1a0SE3-000JyS-Vh; Sun, 22 Nov 2015 13:50:32 +0300 Date: Sun, 22 Nov 2015 13:50:31 +0300 From: Slawa Olhovchenkov To: Mark Johnston Cc: freebsd-arch@FreeBSD.org Subject: Re: zero-cost SDT probes Message-ID: <20151122105031.GP48728@zxy.spb.ru> References: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Nov 2015 10:50:37 -0000 On Sat, Nov 21, 2015 at 06:45:42PM -0800, Mark Johnston wrote: > Hi, > > For the past while I've been experimenting with various ways to > implement "zero-cost" SDT DTrace probes. Basically, at the moment an SDT > probe site expands to this: > > if (func_ptr != NULL) > func_ptr(); > > When the probe is enabled, func_ptr is set to dtrace_probe(); otherwise > it's NULL. With zero-cost probes, the SDT_PROBE macros expand to > > func(); I am experimenting with overhead DTrace probes in userspace. Total executing time program w/o any probes 3% less then program with not enabled probes. With enabled probes (conditional probes too) -- each probe add about 0.7us. I am place DTrace probe inside inner loop, worst case. ===== #include #include "probes.h" long primes[1000000] = { 3 }; long primecount = 1; int main(int argc, char **argv) { long divisor = 0; long currentprime = 5; long isprime = 1; while (currentprime < 1000000) { isprime = 1; PRIMES_PRIMECALC_START(currentprime); for(divisor=0;divisor Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C809FA32196 for ; Sun, 22 Nov 2015 12:12:57 +0000 (UTC) (envelope-from oliver.pinter@hardenedbsd.org) Received: from mail-wm0-x22c.google.com (mail-wm0-x22c.google.com [IPv6:2a00:1450:400c:c09::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5AD5A180F for ; Sun, 22 Nov 2015 12:12:57 +0000 (UTC) (envelope-from oliver.pinter@hardenedbsd.org) Received: by wmvv187 with SMTP id v187so126407267wmv.1 for ; Sun, 22 Nov 2015 04:12:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hardenedbsd-org.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=xfcW7txb01iZxeBSPdDbLGZbjtEKSoSMwiKdeIu6dkE=; b=KPImeEYfVfqazGGWlvQaMer5UWp3sAz6RnPT59Ruj6hW/ufK5U38NDSXuwN74tYnB6 0R+SjeiAvFHY0bGYQw+B/DbkSwhbU+RdqMC5o9pQChQyQxY3ul/GCwWJpcuBS7p9OtcO om01uzz9FR3kHKM4ovEM61KMK6HSliuPLwcru3+uIh2zwUaFPkECDK6t1rST1lZkytjx VRK8vlI67wEFFZMBKuRsCnm3o83coHm5OYH6PN7dHslblAsKyF6BuzVGxD6FRpsXj7O3 z9bTlVFXgTpKds52yOyUQ9odBqCobZz/CUdy+XovoesJkQ9oibRYrYJfeOmiD3H1B1BL 0T9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=xfcW7txb01iZxeBSPdDbLGZbjtEKSoSMwiKdeIu6dkE=; b=QfGIsP8nuelxyI/+ksusopdOj7M1p67x59pkJUV5agHGxlg1/2eTNzfRHrq/ExJm0w JjMTu2havr0Bf0wUZEJ6kqA9QQLF07yZUTS74VlbiKHWTwcIjsimZSUpo4PP4eTAhKjZ UdqDNxUu/tPP8De7C09Dj3MstVeG17cAgmX+VMMV88x0SKwLVw9p5tXQHP+Itosk16X9 FZviNVwIShy5RNouQqgd2VsWGgxpTG1Ka5hZq548sxzL0uipQcGrh3AX/xjsW6k91NBJ UDwU4CVXjWSEuvTfIXsGbpfYl4Rj3wXQOe7bt6oVTLxf1YM8kB4hgiAYaZFDSNkcX4K8 U2ZA== X-Gm-Message-State: ALoCoQnamqAMpM3TQhLIcwjr8Fj/uU5am8d0YjDNizfO6C5BBR7B7XjGktXco1oTe2DMA/OqyqBL MIME-Version: 1.0 X-Received: by 10.28.171.134 with SMTP id u128mr11009403wme.22.1448194374759; Sun, 22 Nov 2015 04:12:54 -0800 (PST) Received: by 10.194.243.6 with HTTP; Sun, 22 Nov 2015 04:12:54 -0800 (PST) In-Reply-To: <2753.1448173777@chaos> References: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com> <2753.1448173777@chaos> Date: Sun, 22 Nov 2015 13:12:54 +0100 Message-ID: Subject: Re: zero-cost SDT probes From: Oliver Pinter To: "Simon J. Gerraty" Cc: Mark Johnston , freebsd-arch@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Nov 2015 12:12:57 -0000 On 11/22/15, Simon J. Gerraty wrote: > Mark Johnston wrote: >> For the past while I've been experimenting with various ways to >> implement "zero-cost" SDT DTrace probes. Basically, at the moment an SDT >> probe site expands to this: > > Would it be feasible to compile the probes into the kernel > as active calls to a registrar function? > That would eliminate all the complexity of finding PC's > though you'd probably need to pass extra args to convey the point of the > probe? > > It would hurt boot time a little too - each probe point would make a > call to register itself (and get overwritten with nops as a reward) but > very simple? In opBSD I have already a similar mechanism for SMAP: https://github.com/opntr/opBSD/commits/op/gsoc2014/master . > _______________________________________________ > freebsd-arch@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > From owner-freebsd-arch@freebsd.org Sun Nov 22 16:44:50 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C3D2EA35565 for ; Sun, 22 Nov 2015 16:44:50 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (relay02.stack.nl [IPv6:2001:610:1108:5010::104]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "mailhost.stack.nl", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 8EFBD177D; Sun, 22 Nov 2015 16:44:50 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mx1.stack.nl (Postfix) with ESMTP id 873423592FA; Sun, 22 Nov 2015 17:44:46 +0100 (CET) Received: by snail.stack.nl (Postfix, from userid 1677) id 3F1D528494; Sun, 22 Nov 2015 17:44:46 +0100 (CET) Date: Sun, 22 Nov 2015 17:44:46 +0100 From: Jilles Tjoelker To: Mark Johnston Cc: freebsd-arch@FreeBSD.org Subject: Re: zero-cost SDT probes Message-ID: <20151122164446.GA22980@stack.nl> References: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Nov 2015 16:44:50 -0000 On Sat, Nov 21, 2015 at 06:45:42PM -0800, Mark Johnston wrote: > For the past while I've been experimenting with various ways to > implement "zero-cost" SDT DTrace probes. Basically, at the moment an SDT > probe site expands to this: > if (func_ptr != NULL) > func_ptr(); > When the probe is enabled, func_ptr is set to dtrace_probe(); otherwise > it's NULL. With zero-cost probes, the SDT_PROBE macros expand to > func(); > When the kernel is running, each probe site has been overwritten with > NOPs. When a probe is enabled, one of the NOPs is overwritten with a > breakpoint, and the handler uses the PC to figure out which probe fired. > This approach has the benefit of incurring less overhead when the probe > is not enabled; it's more complicated to implement though, which is why > this hasn't already been done. > I have a working implementation of this for amd64 and i386[1]. Before > adding support for the other arches, I'd like to get some idea as to > whether the approach described below is sound and acceptable. I have not run any benchmarks but I expect that this removes only a small part of the overhead of disabled probes. Saving and restoring caller-save registers and setting up parameters certainly increases code size and I-cache use. On the other hand, a branch that is always or never taken will generally cost at most 2 cycles. Avoiding this overhead would require not generating an ABI function call but a point where the probe parameters can be calculated from the registers and stack frame (like how a debugger prints local variables, but with a guarantee that "optimized out" will not happen). This requires compiler changes, though, and DTrace has generally not used DWARF-like debug information. For a fairer comparison, the five NOPs should be changed to one or two longer NOPs, since many CPUs decode at most 3 or 4 instructions per cycle. Some examples of longer NOPs are in contrib/llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp X86AsmBackend::writeNopData(). The two-byte NOP 0x66, 0x90 works on any x86 CPU. -- Jilles Tjoelker From owner-freebsd-arch@freebsd.org Sun Nov 22 23:59:11 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5FEF0A352AA for ; Sun, 22 Nov 2015 23:59:11 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-pa0-x232.google.com (mail-pa0-x232.google.com [IPv6:2607:f8b0:400e:c03::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2990E18BC for ; Sun, 22 Nov 2015 23:59:11 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by pabfh17 with SMTP id fh17so178088025pab.0 for ; Sun, 22 Nov 2015 15:59:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=u8kkYdukwBBxJi4rNqvHc/cdN7mqEhIeD1BI5VhIFvI=; b=RBQOXmsJz/YGCOBm19MsQD2GrMcc8MQuQgbSv2HJBtJVTvJzqbfb5A9iFGnV+SFHGP CO4BDPnd/DgZ1PSkzItVEY6DjlyBWfHr3jnBClyAMRA4FbsRdY2m+7bXVFkpy7LBSX9c UrXVanRvqM/ogwieE4FMPVRg8XcYs0vJ8C0f1XxmzQbLHhgZccQ4q7owpWUkTEk3evPA 4i8cPIULsaXhcWj16dRmoHze6p9JFPu77CsK1TYAxPBxGcmBsO6+6f6SabnGxo429swC a7pP8rGMl+IXcSBI1nts3BEWDId1CNKwVHA8La3bhUkCJh66Rz6IwVMk8C8nOWhujAYK qtEg== X-Received: by 10.66.122.39 with SMTP id lp7mr32974885pab.74.1448236748390; Sun, 22 Nov 2015 15:59:08 -0800 (PST) Received: from raichu ([104.232.114.184]) by smtp.gmail.com with ESMTPSA id rm10sm7785083pbc.96.2015.11.22.15.59.07 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 22 Nov 2015 15:59:07 -0800 (PST) Sender: Mark Johnston Date: Sun, 22 Nov 2015 15:59:03 -0800 From: Mark Johnston To: Jilles Tjoelker Cc: freebsd-arch@FreeBSD.org Subject: Re: zero-cost SDT probes Message-ID: <20151122235903.GA5647@raichu> References: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com> <20151122164446.GA22980@stack.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151122164446.GA22980@stack.nl> User-Agent: Mutt/1.5.24 (2015-08-30) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Nov 2015 23:59:11 -0000 On Sun, Nov 22, 2015 at 05:44:46PM +0100, Jilles Tjoelker wrote: > On Sat, Nov 21, 2015 at 06:45:42PM -0800, Mark Johnston wrote: > > For the past while I've been experimenting with various ways to > > implement "zero-cost" SDT DTrace probes. Basically, at the moment an SDT > > probe site expands to this: > > > if (func_ptr != NULL) > > func_ptr(); > > > When the probe is enabled, func_ptr is set to dtrace_probe(); otherwise > > it's NULL. With zero-cost probes, the SDT_PROBE macros expand to > > > func(); > > > When the kernel is running, each probe site has been overwritten with > > NOPs. When a probe is enabled, one of the NOPs is overwritten with a > > breakpoint, and the handler uses the PC to figure out which probe fired. > > This approach has the benefit of incurring less overhead when the probe > > is not enabled; it's more complicated to implement though, which is why > > this hasn't already been done. > > > I have a working implementation of this for amd64 and i386[1]. Before > > adding support for the other arches, I'd like to get some idea as to > > whether the approach described below is sound and acceptable. > > I have not run any benchmarks but I expect that this removes only a > small part of the overhead of disabled probes. Saving and restoring > caller-save registers and setting up parameters certainly increases code > size and I-cache use. On the other hand, a branch that is always or > never taken will generally cost at most 2 cycles. I've done some microbenchmarks using the lockstat probes on a Xeon E5-2630 with SMT disabled. They just read the TSC and acquire/release a lock in a loop, so there's no contention. In general I see at most a small difference between the old and new SDT implementations and a kernel with KDTRACE_HOOKS off altogether. For example, in my test a mtx lock/unlock pair takes 52 cycles on average without probes; with probes, it's 54 cycles with both SDT implementations. rw read locks are 77 cycles without probes, 79 with. rw write locks and sx exclusive locks don't appear to show any differences, and sx shared locks show the same timings without KDTRACE_HOOKS and with the new SDT implementation; the current implementation adds a cycle per acquire/release pair. None of this takes into account the cache effects of these probes. One advantage of the proposed implementation is that we eliminate the data access required to test if the probe is enabled in the first place. I'm also a bit uncertain about the I-cache impact. My understanding is that a fetch of an instruction will load the entire cache line containing that instruction. So unless the argument-marshalling instructions for a probe site spans at least one cache line, won't all they all be loaded anyway? Consider the disassemblies for __mtx_lock_flags() here: https://people.freebsd.org/~markj/__mtx_lock_flags_disas.txt Based on what I said above and assuming a 64-byte cache line size, I'd expect all instructions between 0xffffffff806d1328 and 0xffffffff806d134e to be loaded regardless of whether or not the branch is taken. Is that not the case? I'll also add that with this change the size of the kernel text shrinks a fair bit: from 8425096 bytes to 7983496 bytes with a custom MINIMAL-like kernel with lock inlining. Finally, I should have noted in my first post that this work has other motivations beyond possible performance improvements. In particular, recording call sites allows us to finally fill in the function component of SDT probes automatically. For example, with this work it becomes possible to enable the udp:::receive probe in udp6_receive(), but not the one in udp_receive(). Generally, DTrace probes that correspond to a specific instruction are said to be "anchored"; DTrace implements various bytecode operations differently depending on whether the probe is anchored, and SDT probes are expected to be, but with the current implementation they're not. As a result, some operations, such as stack(), do not work correctly with SDT probes. r288363 is a workaround for this problem; the change I proposed is a real solution. This is also a step towards fixing lockstat(1)'s caller identification when locks are not inlined. > > Avoiding this overhead would require not generating an ABI function call > but a point where the probe parameters can be calculated from the > registers and stack frame (like how a debugger prints local variables, > but with a guarantee that "optimized out" will not happen). This > requires compiler changes, though, and DTrace has generally not used > DWARF-like debug information. Integrating DWARF information into libdtrace has been something I've been slowly working on, with the goal of being able to place probes on arbitrary instructions instead of just function boundaries. But as you point out, compiler support is needed for any of this to be reliably useful for SDT. > > For a fairer comparison, the five NOPs should be changed to one or two > longer NOPs, since many CPUs decode at most 3 or 4 instructions per > cycle. Some examples of longer NOPs are in > contrib/llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp > X86AsmBackend::writeNopData(). The two-byte NOP 0x66, 0x90 works on any > x86 CPU. I'll try that, thanks. On amd64 at least, I think we'd have to use two NOPs: a single-byte NOP that can be overwritten when the probe is enabled, and then a four-byte NOP. From owner-freebsd-arch@freebsd.org Mon Nov 23 00:15:13 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C0367A3571E for ; Mon, 23 Nov 2015 00:15:13 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-pa0-x236.google.com (mail-pa0-x236.google.com [IPv6:2607:f8b0:400e:c03::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 929E51F79 for ; Mon, 23 Nov 2015 00:15:13 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by pacej9 with SMTP id ej9so173043751pac.2 for ; Sun, 22 Nov 2015 16:15:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=E8GGbVh+L7CfH4pLlm38YTFAVMzBfSyrvvhKk4GN2Hk=; b=cYjufNk3O4wJ/nNulORVqbAGyHxEk6676a5I9sFjg99sGZtk3pfkc1chUJr+0BXrOW XYS9pWwgVz1qPYPIoGY4ASMbd3cpXj0x+EwpD+UDfl2vWVWPRZ+aMHv0notN6K4Gd37s EQm2JtNwvig4X1glQhNcnnc1IITukLYEncs0D6aMx3/wTXCRDg2csTKn5UzljpBBR2fY +9LStEmMvicKHxX0kl1QTNnSlPqD8D/YfUiGVqNWNGLQUVcCukpOnUPSHKqdABrtQslU 669Nm3aNqKpGofdxwFQCcvlggnj98vwRSttN3b1JmxUldfem9LWfMuSFdVf8GFyDqNaE TAew== X-Received: by 10.68.224.106 with SMTP id rb10mr22513812pbc.17.1448237713226; Sun, 22 Nov 2015 16:15:13 -0800 (PST) Received: from raichu ([104.232.114.184]) by smtp.gmail.com with ESMTPSA id c1sm7921012pas.1.2015.11.22.16.15.12 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 22 Nov 2015 16:15:12 -0800 (PST) Sender: Mark Johnston Date: Sun, 22 Nov 2015 16:15:11 -0800 From: Mark Johnston To: "Simon J. Gerraty" Cc: freebsd-arch@freebsd.org Subject: Re: zero-cost SDT probes Message-ID: <20151123001511.GB5647@raichu> References: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com> <2753.1448173777@chaos> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2753.1448173777@chaos> User-Agent: Mutt/1.5.24 (2015-08-30) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Nov 2015 00:15:13 -0000 On Sat, Nov 21, 2015 at 10:29:37PM -0800, Simon J. Gerraty wrote: > Mark Johnston wrote: > > For the past while I've been experimenting with various ways to > > implement "zero-cost" SDT DTrace probes. Basically, at the moment an SDT > > probe site expands to this: > > Would it be feasible to compile the probes into the kernel > as active calls to a registrar function? > That would eliminate all the complexity of finding PC's > though you'd probably need to pass extra args to convey the point of the > probe? > > It would hurt boot time a little too - each probe point would make a > call to register itself (and get overwritten with nops as a reward) but > very simple? I considered such an approach but didn't pursue it for a few reasons: - We'd have to pass a unique probe site identifier as an argument, which requires at least one extra instruction at the probe site. - If the probe site is a tail call, how can the registrar find the correct caller? - If a probe site isn't patched until multiple CPUs have started, how do we safely overwrite the call site in the face of the possibility that another thread is executing the call at the same time? When it comes to enabling or disabling a probe, we only need to write a single byte, but overwriting multiple bytes seems unsafe. I think the last point could possibly be addressed by overwriting the first byte of the call with a breakpoint before overwriting the rest of the call site with NOPs, using the breakpoint handler to fix up any threads that reached the probe site as it was being modified. But this detracts a bit from the simplicity of the approach. From owner-freebsd-arch@freebsd.org Mon Nov 23 01:10:37 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D68C9A3527A for ; Mon, 23 Nov 2015 01:10:37 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: from mail-io0-x233.google.com (mail-io0-x233.google.com [IPv6:2607:f8b0:4001:c06::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A228F1611; Mon, 23 Nov 2015 01:10:37 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: by iouu10 with SMTP id u10so176052604iou.0; Sun, 22 Nov 2015 17:10:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=VjglKUsjv5OxswawE8V1XRh/e80nmsv5K8vvTSD8zPg=; b=k44tCh1X/sgG6oKdf9hzgBPT21eoK1flTsR+7X/3Z5bCy9Nacdv7gc2I3/4D/3ufvo 5cnvTc81lfhQmHhtuNacCCofvtud7ub9j1ND9WTlCU1APIWk+OF/c4FgJ0wqJPonL2X1 +g6aeqHRlh5aklHlbVIc3yk7hJlUcLm5TU1l1F0XTIwpdBZRT5GtvM9XNm4woCQgVxX/ WeUwFZ1XMMBS2S7wy9xPGAyLka0U6zYYgubVAdPeX63T5XnGndYnO+tZBpUcmcPug6iD Uixa9bt6o1OSfnvbdOuiM04oFZf4AxABk0uhCdQcnPlwTLmvTMbJYR9GR5D7OyeJ8GS7 VR8Q== MIME-Version: 1.0 X-Received: by 10.107.159.199 with SMTP id i190mr21631986ioe.29.1448241037049; Sun, 22 Nov 2015 17:10:37 -0800 (PST) Received: by 10.107.170.102 with HTTP; Sun, 22 Nov 2015 17:10:36 -0800 (PST) In-Reply-To: <20151122164446.GA22980@stack.nl> References: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com> <20151122164446.GA22980@stack.nl> Date: Sun, 22 Nov 2015 20:10:36 -0500 Message-ID: Subject: Re: zero-cost SDT probes From: Ryan Stone To: Jilles Tjoelker Cc: Mark Johnston , "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Nov 2015 01:10:37 -0000 On Sun, Nov 22, 2015 at 11:44 AM, Jilles Tjoelker wrote: > I have not run any benchmarks but I expect that this removes only a > small part of the overhead of disabled probes. Saving and restoring > caller-save registers and setting up parameters certainly increases code > size and I-cache use. On the other hand, a branch that is always or > never taken will generally cost at most 2 cycles. > The original Solaris implementation side-stepped this by trying to place SDT probes next to existing function calls to minimize this overhead. I don't think that we in FreeBSD has been nearly as careful about this. It would be a good project for somebody to go through the existing SDT probes and see if they could be relocated slightly to produce the same semantics but less overhead. Avoiding this overhead would require not generating an ABI function call > but a point where the probe parameters can be calculated from the > registers and stack frame (like how a debugger prints local variables, > but with a guarantee that "optimized out" will not happen). This > requires compiler changes, though, and DTrace has generally not used > DWARF-like debug information. > Compiler support would be nice but is obviously a lot more complicated. I've long thought that a DTrace probe that expanded to something like the following would be ideal: jmp skip_dtrace # load arguments int 3 skip_dtrace: But in order to implement something like that, you'd need support from both the compiler and the linker. From owner-freebsd-arch@freebsd.org Mon Nov 23 04:14:49 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 13227A35E7E for ; Mon, 23 Nov 2015 04:14:49 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au [211.29.132.53]) by mx1.freebsd.org (Postfix) with ESMTP id 83BD617A8; Mon, 23 Nov 2015 04:14:47 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197]) by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id 7F838D48E13; Mon, 23 Nov 2015 14:48:14 +1100 (AEDT) Date: Mon, 23 Nov 2015 14:48:14 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Mark Johnston cc: Jilles Tjoelker , freebsd-arch@freebsd.org Subject: Re: zero-cost SDT probes In-Reply-To: <20151122235903.GA5647@raichu> Message-ID: <20151123113932.C906@besplex.bde.org> References: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com> <20151122164446.GA22980@stack.nl> <20151122235903.GA5647@raichu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=cK4dyQqN c=1 sm=1 tr=0 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=kj9zAlcOel0A:10 a=6I5d2MoRAAAA:8 a=W0WeRUywduGuKb2zYbgA:9 a=Zi4IgBJU2n4zak7G:21 a=XN6Sc2SuarRgwsJa:21 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Nov 2015 04:14:49 -0000 On Sun, 22 Nov 2015, Mark Johnston wrote: > On Sun, Nov 22, 2015 at 05:44:46PM +0100, Jilles Tjoelker wrote: >> On Sat, Nov 21, 2015 at 06:45:42PM -0800, Mark Johnston wrote: >>> For the past while I've been experimenting with various ways to >>> implement "zero-cost" SDT DTrace probes. Basically, at the moment an SDT >>> probe site expands to this: >> >>> if (func_ptr != NULL) >>> func_ptr(); >> >>> When the probe is enabled, func_ptr is set to dtrace_probe(); otherwise >>> it's NULL. With zero-cost probes, the SDT_PROBE macros expand to >> >>> func(); >> >>> When the kernel is running, each probe site has been overwritten with >>> NOPs. When a probe is enabled, one of the NOPs is overwritten with a >>> breakpoint, and the handler uses the PC to figure out which probe fired. >>> This approach has the benefit of incurring less overhead when the probe >>> is not enabled; it's more complicated to implement though, which is why >>> this hasn't already been done. >> >>> I have a working implementation of this for amd64 and i386[1]. Before >>> adding support for the other arches, I'd like to get some idea as to >>> whether the approach described below is sound and acceptable. >> >> I have not run any benchmarks but I expect that this removes only a >> small part of the overhead of disabled probes. Saving and restoring >> caller-save registers and setting up parameters certainly increases code >> size and I-cache use. On the other hand, a branch that is always or >> never taken will generally cost at most 2 cycles. Hi resolution kernel profiling (which adds a function call and return to every function without engrotting the source code with explicit calls) has surprisingly little overhead when not in use. This depends partly on it not using -finstrument-functions. -finstrument-functions produces bloated code to pass 2 args (function pointer and frame pointer) in a portable way. (Unfortunately, -mprofiler-epilogue was broken (turned into a no-op) in gcc-4.2.1 and is more broken (unsupported) in clang. It is regression-tested by configuring LINT with -pp, but the test is broken for gcc-4.2.1 by accepting -mprofiler-epilogue without actually supporting it, and for clang by ifdefing the addition of -mprofiler-epilogue. I normally don't notice the bug since I use old versions of FreeBSD with a working gcc (3.3.3), and recently fixed 4.2.1. The fix is fragile, and I just noticed that it doesn't work for functions returning a value that don't have an explicit return statement (this is permitted for main()). Old versions of FreeBSD used -finstrument-functions during a previous round of compiler breakage.) -pg and -mprofiler-epilogue produce calls to mcount and .mexitcount for every function (and most trap handlers and some jumps). These functions execute "cmpl $ENABLE,enable_flag; jne done; ... done: ret $0" when the feature is not enable. The branch is easy to predict since there are only 2 instances of it. Scattered instances in callers might bust the branch target cache. So might the scattered calls to .mcount and .mexitcount. Calls are similar to unconditional branches so they are easy to predict, but I think old x86's (ones not too old to have a branch target cache) don't handle them well. High resolution kernel profiling doesn't support SMP, except in my version where it is too slow to use with SMP (due to enormous lock contention). Low resolution kernel profiling has bad locking for the SMP case. The bad locking first gives the enormous lock contention even when profiling is not enabled, since it is done before checking the flag. It also gives deadlock in ddb and some other traps. In the non-SMP case, similar wrapping before checking the flag makes the low-res case is a smaller pessimization. A common function checking the flags wouldn't so well if func_ptr or depends on the call site. But if you can figure out everything from the call site address then it could work similarly to the breakpoint instruction. Kernel profiling could use similar nop/breakpoint methods. It is much easier to patch since there are only 2 functions and 2 args. It already determines the args from the frame. This is the main difference between it and -finstrument functions. > I've done some microbenchmarks using the lockstat probes on a Xeon > E5-2630 with SMT disabled. They just read the TSC and acquire/release a > lock in a loop, so there's no contention. In general I see at most a small > difference between the old and new SDT implementations and a kernel with > KDTRACE_HOOKS off altogether. For example, in my test a mtx lock/unlock > pair takes 52 cycles on average without probes; with probes, it's 54 > cycles with both SDT implementations. rw read locks are 77 cycles > without probes, 79 with. rw write locks and sx exclusive locks don't > appear to show any differences, and sx shared locks show the same > timings without KDTRACE_HOOKS and with the new SDT implementation; the > current implementation adds a cycle per acquire/release pair. When high resolution profiling is enabled, it takes similar times. It also uses the TSC (directly and not slowed down by synchronization) and this use is a major difference between it and low resolution profiling. Overheads on haswell: UP: 60 cycles for .mcount and 40 cycles for .mexitcount; SMP: 88 cycles for .mcount and 60 cycles for .mexitcount. IIRC, rdtsc takes 24 cycles on haswell -- more than half of the UP .mexitcount time; synchronizing it takes another 12-24 cycles; it was much slower on older Intel x86 (65 cycles on freefall with its previous CPU) and much faster on older amd x86 (9 cycles on Athlon XP). This slowness make sit not very useful to optimize other things in anything that uses the TSC. I can believe a mere 2 cycle difference between the highly optimized version and the simple version. It seems hardly worth the effort to optimize. Arg pushing and tests and/or calls are fast, especially when they don't do anything. It might be possible to optimize them better to reduce dependencies, so that they take zero time if they can be executed in parallel. The 100-200 extra cyles per function given by enabled kernel profiling make a difference of about 100% (twice as slow). With giant locking, this scales with the number of CPUs in the kernel (800%) (8-9 times slower) with 8 such CPUs. > None of this takes into account the cache effects of these probes. One > advantage of the proposed implementation is that we eliminate the data > access required to test if the probe is enabled in the first place. I'm Accesses to the same variable cost little. > also a bit uncertain about the I-cache impact. My understanding is that > a fetch of an instruction will load the entire cache line containing > that instruction. So unless the argument-marshalling instructions for a > probe site spans at least one cache line, won't all they all be loaded > anyway? This can be moved out of the way. I don't like __predict_*(), but one thing it can do right is this. The generated code for an inline flags test should look something like: cmpl $ENABLE,enable_flag je slow_path back: # main path here ret slow_path: cmpl $NULL,func_ptr je back # load func_ptr and args # ... jmp back >>> if (func_ptr != NULL) >>> func_ptr(); With everything non-inline, arg loading is kept out of the way without really trying: call decoder # main path here ret The call can be replaced by nops or a trap instruction + nops (or start like that). 'decoder' starts with the flags test. Loading args is difficult if they are local in the caller's frame. Oops, if they were in registers, then they would have to be saved in the frame for the call to decoder, and that is almost as slow as passing them. So the inline flags method may even be better than the no-op method -- it allows the compiler to do the register saving only in the slow path. In fact, I think you can do everything with this method and no complex decoding: # The following is identical to the above except for this comment. # To optimise this, replace the next 2 instructions by nops or # a trap instruction and nops. This gives only a time optimization. # Not large, but easy to do. When not enabled, space is wasted # far away where it doesn't mess up the cache. cmpl $ENABLE,enable_flag je slow_path back: # main path here ret slow_path: cmpl $NULL,func_ptr je back # load func_ptr and args # ... jmp back High resolution profiling deals with this partially as follows: - calling .mcount and .mexitcount tends to require more saving than necessary. gcc is not smart about placing these calls in the best position. The main thing done wrong is frame pointer handling. - but it is arranged that .mexitcount is called without saving the return register(s). The callee preserves them if necessary (only necessary if profiling is enabled). > Consider the disassemblies for __mtx_lock_flags() here: > https://people.freebsd.org/~markj/__mtx_lock_flags_disas.txt > Based on what I said above and assuming a 64-byte cache line size, I'd > expect all instructions between 0xffffffff806d1328 and 0xffffffff806d134e > to be loaded regardless of whether or not the branch is taken. Is that not > the case? They are large and ugly already. It probably doesn't matter for them. Lots of instructions can be loaded (and excecuted speculatively) during the slow locked instruction. Perhaps during the many earlier instructions. The instructions at the branch target are more important. I think the previous branch is usually taken (for no contention) and the branche at ...1328 is unimportant. Both branches are into another cache line and there is no padding to align the branch targets because such padding would not be useful for the target arch (or the optimizations are not complete). I think it is only important that enough instructions in the branch target are in the same cache line as the target. That usually happens accidentally when the target is the function epilogue. > I'll also add that with this change the size of the kernel text shrinks > a fair bit: from 8425096 bytes to 7983496 bytes with a custom MINIMAL-like > kernel with lock inlining. Kernel profiling also bloats the kernel a lot. I was happy to get back to the lesser bloat given by .mcount/.mexitcount instead of __cyg_profile_name_too_long_to_remember(). But the text space needed is small compared with the data space which is allocated at runtime (the allocation is stupid and allocates space even if profiling is never used). > Finally, I should have noted in my first post that this work has other > motivations beyond possible performance improvements. In particular, > recording call sites allows us to finally fill in the function component > of SDT probes automatically. For example, with this work it becomes > possible to enable the udp:::receive probe in udp6_receive(), but not > the one in udp_receive(). Patching inline "cmpl $ENABLE,enable flag" also allows this very easily, unless you want to vary the args for a single call site. Yet another variation, which is easier to implement since it only requires modifying data: call decoder testl %rax,%rax je slow_path # rest as above 'decoder' now just checks a table of enable flags indexed by the call site and returns true/false. All the args passing remains inline (but far away). (First check a global enable flag to optimize for the non-enabled case). >> Avoiding this overhead would require not generating an ABI function call >> but a point where the probe parameters can be calculated from the >> registers and stack frame (like how a debugger prints local variables, >> but with a guarantee that "optimized out" will not happen). This >> requires compiler changes, though, and DTrace has generally not used >> DWARF-like debug information. I use gcc -O1 -fno-inline-functions-called-once, and usually i386 and -march=i386, to prevent bogus optimizations. Maximal compiler optimizations rarely gain as much as 1% in kernel code, even in micro-benchmarks. Compilers like to produce large code that gives negative optimizations except in loops, and kernels don't have many loops. The above flags make ddb and stack traces almost correct on i386 for public functions. Static functions are often called with args in registers even on i386. To debug these, or to make them available using "call", they must be declared with a regparm attribute (cdecl only does enough for public functions where it is the default anyway). >> For a fairer comparison, the five NOPs should be changed to one or two >> longer NOPs, since many CPUs decode at most 3 or 4 instructions per >> cycle. Some examples of longer NOPs are in >> contrib/llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp >> X86AsmBackend::writeNopData(). The two-byte NOP 0x66, 0x90 works on any >> x86 CPU. > > I'll try that, thanks. On amd64 at least, I think we'd have to use two > NOPs: a single-byte NOP that can be overwritten when the probe is > enabled, and then a four-byte NOP. I think some arches need more than 1 byte for a trap/breakpoint instruction. I jut remembered what the "ret $0" in .mcount does. It is because plain ret doesn't work so well at a branch target, even when the target is aligned. IIRC, "nop; ret" is just as good a fix on the arches where plain ret is slower, but "ret $0" is better on older in-order arches where nop is not so special. Extra nops or larger nop instructions should also be sprinkled for alignment. It is too hard to know the best number to use in asm code. Compilers generate magic amounts depending on the arch and how far away the instruction pointer is from an alignment boundary (so as to not align if too far away). I don't trust compilers to get this right either, even if the arch matches exactly. In the too-far-away cases, it is likely that a small change nearby moves closer and thus gives faster code. Bruce From owner-freebsd-arch@freebsd.org Mon Nov 23 11:35:18 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5D385A3580A for ; Mon, 23 Nov 2015 11:35:18 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E41DE14CB; Mon, 23 Nov 2015 11:35:17 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id tANBZBu0027206 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Mon, 23 Nov 2015 13:35:11 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua tANBZBu0027206 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id tANBZBup027204; Mon, 23 Nov 2015 13:35:11 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 23 Nov 2015 13:35:11 +0200 From: Konstantin Belousov To: Mark Johnston Cc: freebsd-arch@FreeBSD.org Subject: Re: zero-cost SDT probes Message-ID: <20151123113511.GX58629@kib.kiev.ua> References: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Nov 2015 11:35:18 -0000 On Sat, Nov 21, 2015 at 06:45:42PM -0800, Mark Johnston wrote: > Hi, > > For the past while I've been experimenting with various ways to > implement "zero-cost" SDT DTrace probes. Basically, at the moment an SDT > probe site expands to this: > > if (func_ptr != NULL) > func_ptr(); > > When the probe is enabled, func_ptr is set to dtrace_probe(); otherwise > it's NULL. With zero-cost probes, the SDT_PROBE macros expand to > > func(); > > When the kernel is running, each probe site has been overwritten with > NOPs. When a probe is enabled, one of the NOPs is overwritten with a > breakpoint, and the handler uses the PC to figure out which probe fired. > This approach has the benefit of incurring less overhead when the probe > is not enabled; it's more complicated to implement though, which is why > this hasn't already been done. > > I have a working implementation of this for amd64 and i386[1]. Before > adding support for the other arches, I'd like to get some idea as to > whether the approach described below is sound and acceptable. > > The main difficulty is in figuring out where the probe sites actually > are once the kernel is running. In my patch, a probe site is a call to > an externally-defined function which is defined in an > automatically-generated C file. At link time, we first perform a partial > link of all the kernel's object files. Then, a script uses the relocations > against the still-undefined probe functions to generate > 1) stub functions for the probes, so that the kernel can actually be > linked, and > 2) a linker set containing the offsets of each probe site relative to > the beginning of the text section. > The result is linked with the partially-linked kernel to generate the > final kernel file. > > During boot, we iterate over the linker set, using the offsets plus the > address of btext to overwrite probe sites with NOPs. SDT probes in kernel > modules are handled differently (and more simply): the kernel linker just > has special handling for relocations against symbols named __dtrace_sdt_*; > this is how illumos/Solaris implements all of this. > > My uncertainty revolves around the use of relocations in the > partially-linked kernel to determine the address of probe sites in the > running kernel. With the GNU ld in base, this happens to work because > the final link doesn't modify the text section. Is this something I can > rely upon? Will this assumption be false with the advent of lld and LTO? > Are there other, cleaner ways to implement what I described above? You could consider using a cheap instruction which is conditionally converted into the trap, instead. E.g., you could have global page frame in KVA allocated, and for the normal operations, keep the page mapped with backing by a scratch page. The probe would be a volatile read from the page. When probes are activated, the page is unmapped, which converts the read into the page fault. This is similar to the write barriers implemented in some garbare collectors. There are two issues with this scheme: - The cost of probe is relatively large, even if the low level trap handler is further modified to recognize the probes by special address access. - The arguments passed to the probes should be put into some predefined place, e.g. somwhere in the *curthread, since trap handler cannot fetch them using the ABI conventions. As I mentioned above, this scheme is used by several implementations of the language runtimes, but there gc pauses are rare, and slightly larger cost of the even stopping the mutator is justified even by negligible cost reduction for normal flow. I am not sure if this approach worths the complications and overhead for probes. From owner-freebsd-arch@freebsd.org Mon Nov 23 13:18:44 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 36EAFA346A1 for ; Mon, 23 Nov 2015 13:18:44 +0000 (UTC) (envelope-from onwahe@gmail.com) Received: from mail-io0-x232.google.com (mail-io0-x232.google.com [IPv6:2607:f8b0:4001:c06::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 060F511A2 for ; Mon, 23 Nov 2015 13:18:44 +0000 (UTC) (envelope-from onwahe@gmail.com) Received: by iouu10 with SMTP id u10so188987371iou.0 for ; Mon, 23 Nov 2015 05:18:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=J1QWEgqtPthF2OC37falkZ4jPSXDKSnPxWTDDWNt5I4=; b=gEzpZSptcJhGAZrxYxVX0TBt0Hjxq5sEp5Yx2Yw/XcWpQ5iqjnmQvqhBD/dhw05YlV Xjx7cM7rruE69KWImob5UpTBTx4mRCIsrt5g9cX61H0Da9SjbS3YP1Agi7S+z4aL+0PF hRlJ2tLP/AReRxKbFGjS0Ujv0S10nZbY/u9BTWui6gW4nZ9oCa6Gp29opQSxzGLA6ENg 3rj2IIRfGXRKrbI0THE37YpE6d4QYRNIcwcaViZ6EFg0hDxY4zpVGcEEvz+FKJIu44EC n4wKfpkZKY6zqO0V9ATjVe+sa1URlipraOfONF3dH5T+xubgxryEp4AH4GR7K/LfJU8u 1kWQ== MIME-Version: 1.0 X-Received: by 10.107.167.9 with SMTP id q9mr25303051ioe.84.1448284723478; Mon, 23 Nov 2015 05:18:43 -0800 (PST) Received: by 10.64.130.38 with HTTP; Mon, 23 Nov 2015 05:18:43 -0800 (PST) In-Reply-To: <20151120144544.GB58629@kib.kiev.ua> References: <20151120144544.GB58629@kib.kiev.ua> Date: Mon, 23 Nov 2015 14:18:43 +0100 Message-ID: Subject: Re: a question about BUS_DMA_MIN_ALLOC_COMP flag meaning From: Svatopluk Kraus To: Konstantin Belousov Cc: FreeBSD Arch Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Nov 2015 13:18:44 -0000 On Fri, Nov 20, 2015 at 3:45 PM, Konstantin Belousov wrote: > On Wed, Nov 18, 2015 at 05:00:49PM +0100, Svatopluk Kraus wrote: >> Hi, >> >> I have fallen to some problem with inconsistent use of >> BUS_DMA_MIN_ALLOC_COMP flag. This flag was introduced in x86 MD code >> very very long ago and so, the problem covers all archs which came out >> from it. >> >> However, it's only about bus_dma_tag_t with BUS_DMA_COULD_BOUNCE flag set. >> >> (1) When bus_dma_tag_t is being created with BUS_DMA_ALLOCNOW flag >> specified, some bounce pages could be allocated in advance and >> BUS_DMA_MIN_ALLOC_COMP flag is set to the tag. The bounce pages are >> allocated only if the tag's maxsize property is higher than size of >> all bounce pages already allocated in a bounce zone. >> >> (2) When bus_dmamap_t is being created, then if BUS_DMA_MIN_ALLOC_COMP >> is not set on associated tag, some bounce pages are ALWAYS allocated >> and BUS_DMA_MIN_ALLOC_COMP is set afterwards, >> >> (3) else some bounce pages could be allocated if there is not enough >> pages in a bounce zone and BUS_DMA_MIN_ALLOC_COMP is set afterwards. >> >> The problem is the following. Due to case (2), the number of pages in >> bounce zone can grow infinitely, as bounce pages once allocated are >> never freed. It can happen when a big number of bus_dma_tag_t together >> with bus_dmamap_t are created, or they are created dynamically either >> because of a loadable module or by design. >> >> The inconsistency is that when bus_dma_tag_t is being created, there >> is no limit for how much pages could be allocated. On the other hand, >> when bus_dmamap_t is being created, there is MAX_BPAGES limitation. >> >> I think that fix for case (2) presented as x86 fix is the following: >> >> diff --git a/sys/x86/x86/busdma_bounce.c b/sys/x86/x86/busdma_bounce.c >> index 4826a2b..a15139f 100644 >> --- a/sys/x86/x86/busdma_bounce.c >> +++ b/sys/x86/x86/busdma_bounce.c >> @@ -308,7 +308,7 @@ bounce_bus_dmamap_create(bus_dma_tag_t dmat, int >> flags, bus_dmamap_t *mapp) >> else >> maxpages = MIN(MAX_BPAGES, Maxmem - >> atop(dmat->common.lowaddr)); >> - if ((dmat->bounce_flags & BUS_DMA_MIN_ALLOC_COMP) == 0 || >> + if ((dmat->bounce_flags & BUS_DMA_MIN_ALLOC_COMP) == 0 && >> (bz->map_count > 0 && bz->total_bpages < maxpages)) { >> pages = MAX(atop(dmat->common.maxsize), 1); >> pages = MIN(maxpages - bz->total_bpages, pages); >> >> >> IMO, it also fixes logic by making it same as in bus_dma_tag_t case. > I think that this patch is correct. So, with r291142 and r291193 intermezzo, the question is: what is right? In fact, there were two possibilities: (1) to keep BUS_DMA_MIN_ALLOC_COMP flag and make it consistent, so bounce pages are allocated only once for a tag, or (2) to remove it, so bounce pages are allocated for every created map which needs them, up to some sane limit. It turned out that (1) is not good for some driver. I'm not sure why, but only thing I can imagine now is that a tag was created with BUS_DMA_ALLOCNOW flag and then, a consumer breaks tag's maxsize property. Or, there is another inconsitency in the if statement: bz->map_count > 0. Bounce zone map count is incremented later, so when bounce zone is without map, the test fails. Not mention that this map count is not atomic. Anyhow, what is right, (1) or (2) ? > >> >> The next question is, if case (1) should be limited by MAX_BPAGES as >> in case (3) or maybe better if there should be some internal >> limitation for bounce zone itself. > Could we apply e.g. MAX_BPAGES limit to bounce zones, or should we allow > the limit to change based on the tag ? But I am not sure if there is > any reasonable way to formulate the limit. Neither me. > > MAX_BPAGES looks like some arbitrary sanity limit, e.g. we could have > unlimited maxsize, but also have an alignment constraints, and then > tag requires bouncing. I am not sure that hard-coded values, esp. the > amd64 32MB limit, makes much sense, or that basing the limit on the > tag constraints makes more sense. Might be, we should allow some total > percentage of the physical memory on machine to be consumed by all > bounce zones altogether ? IMO, some global limit should be there in any case. A tunable sounds good, with some warning if a box owner wants too much. From owner-freebsd-arch@freebsd.org Tue Nov 24 15:59:11 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 76B61A36996 for ; Tue, 24 Nov 2015 15:59:11 +0000 (UTC) (envelope-from onwahe@gmail.com) Received: from mail-io0-x232.google.com (mail-io0-x232.google.com [IPv6:2607:f8b0:4001:c06::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4306C160C for ; Tue, 24 Nov 2015 15:59:11 +0000 (UTC) (envelope-from onwahe@gmail.com) Received: by iofh3 with SMTP id h3so24509988iof.3 for ; Tue, 24 Nov 2015 07:59:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=yxWpGGYrGPGkHdPBnN3w7TkpLJuqQEybpLzKLyEY2j4=; b=q6cjp5IbDE49MPyIKYi3sOkWoB/2fUGeE8VqkSQmW7Cpjbokvh1nZmPQMImmPLO9R9 Ntb6Z4acQrMErGABdyGzWNuYRZ2teUoGKL0x6hcPrM63PkbRpldOpXdlTx0AD+gEsly+ lKkeVRAsoaTmP6hUnXQ5jlhLOoGQz/ra0GPhHtx+Yegagk5n/LTfwiY+hBMbSkuq+t+w MMtjfgFWCSUeqqQsRRSSp1IrM3ARsrJxyVKcy7W0jTmFpNcDa+GqGGGHik9U7nngzuyl VDQL9fgs/nRm8zO+d0dWWDh46JWT8tq/5uj7jV0sJTbny72G988DloRoW23WgvsgfDnb 7wmA== MIME-Version: 1.0 X-Received: by 10.107.11.166 with SMTP id 38mr31603436iol.186.1448380750637; Tue, 24 Nov 2015 07:59:10 -0800 (PST) Received: by 10.64.130.38 with HTTP; Tue, 24 Nov 2015 07:59:10 -0800 (PST) In-Reply-To: References: <20151120144544.GB58629@kib.kiev.ua> Date: Tue, 24 Nov 2015 16:59:10 +0100 Message-ID: Subject: Re: a question about BUS_DMA_MIN_ALLOC_COMP flag meaning From: Svatopluk Kraus To: Konstantin Belousov Cc: FreeBSD Arch Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Nov 2015 15:59:11 -0000 On Mon, Nov 23, 2015 at 2:18 PM, Svatopluk Kraus wrote: > On Fri, Nov 20, 2015 at 3:45 PM, Konstantin Belousov > wrote: >> On Wed, Nov 18, 2015 at 05:00:49PM +0100, Svatopluk Kraus wrote: >>> Hi, >>> >>> I have fallen to some problem with inconsistent use of >>> BUS_DMA_MIN_ALLOC_COMP flag. This flag was introduced in x86 MD code >>> very very long ago and so, the problem covers all archs which came out >>> from it. >>> >>> However, it's only about bus_dma_tag_t with BUS_DMA_COULD_BOUNCE flag set. >>> >>> (1) When bus_dma_tag_t is being created with BUS_DMA_ALLOCNOW flag >>> specified, some bounce pages could be allocated in advance and >>> BUS_DMA_MIN_ALLOC_COMP flag is set to the tag. The bounce pages are >>> allocated only if the tag's maxsize property is higher than size of >>> all bounce pages already allocated in a bounce zone. >>> >>> (2) When bus_dmamap_t is being created, then if BUS_DMA_MIN_ALLOC_COMP >>> is not set on associated tag, some bounce pages are ALWAYS allocated >>> and BUS_DMA_MIN_ALLOC_COMP is set afterwards, >>> >>> (3) else some bounce pages could be allocated if there is not enough >>> pages in a bounce zone and BUS_DMA_MIN_ALLOC_COMP is set afterwards. >>> >>> The problem is the following. Due to case (2), the number of pages in >>> bounce zone can grow infinitely, as bounce pages once allocated are >>> never freed. It can happen when a big number of bus_dma_tag_t together >>> with bus_dmamap_t are created, or they are created dynamically either >>> because of a loadable module or by design. >>> >>> The inconsistency is that when bus_dma_tag_t is being created, there >>> is no limit for how much pages could be allocated. On the other hand, >>> when bus_dmamap_t is being created, there is MAX_BPAGES limitation. >>> >>> I think that fix for case (2) presented as x86 fix is the following: >>> >>> diff --git a/sys/x86/x86/busdma_bounce.c b/sys/x86/x86/busdma_bounce.c >>> index 4826a2b..a15139f 100644 >>> --- a/sys/x86/x86/busdma_bounce.c >>> +++ b/sys/x86/x86/busdma_bounce.c >>> @@ -308,7 +308,7 @@ bounce_bus_dmamap_create(bus_dma_tag_t dmat, int >>> flags, bus_dmamap_t *mapp) >>> else >>> maxpages = MIN(MAX_BPAGES, Maxmem - >>> atop(dmat->common.lowaddr)); >>> - if ((dmat->bounce_flags & BUS_DMA_MIN_ALLOC_COMP) == 0 || >>> + if ((dmat->bounce_flags & BUS_DMA_MIN_ALLOC_COMP) == 0 && >>> (bz->map_count > 0 && bz->total_bpages < maxpages)) { >>> pages = MAX(atop(dmat->common.maxsize), 1); >>> pages = MIN(maxpages - bz->total_bpages, pages); >>> >>> >>> IMO, it also fixes logic by making it same as in bus_dma_tag_t case. >> I think that this patch is correct. > > So, with r291142 and r291193 intermezzo, the question is: what is right? > > In fact, there were two possibilities: > (1) to keep BUS_DMA_MIN_ALLOC_COMP flag and make it consistent, so > bounce pages are allocated only once for a tag, or > (2) to remove it, so bounce pages are allocated for every created map > which needs them, up to some sane limit. > > It turned out that (1) is not good for some driver. I'm not sure why, > but only thing I can imagine now is that a tag was created with > BUS_DMA_ALLOCNOW flag and then, a consumer breaks tag's maxsize > property. Or, there is another inconsitency in the if statement: > bz->map_count > 0. Bounce zone map count is incremented later, so when > bounce zone is without map, the test fails. Not mention that this map > count is not atomic. > > Anyhow, what is right, (1) or (2) ? > Well, Michal (mmel) came up with an idea which seems to me most likely now, as it explains why bz->total_bpages are tested against MAX_BPAGES when map is being created, but tag's maxsize is used for evaluation of how many bounce pages should be allocated. The idea is the following (considering that BUS_DMA_COULD_BOUNCE is set on a tag): If a tag has only one map, bounce pages should be allocated only once. Either when the tag is being created and BUS_DMA_ALLOCNOW flags is set or when its map is being created. The tag's maxsize property is used for evaluation of how many bounce pages are needed. If the tag has a second (or next) map(s), bz->map_count > 0 check is valid and more bounce pages could be allocated if bz->total_bpages has not met yet some sane limit. Of course, the bz->map_count > 0 check limps a bit as a bounce zone could be shared by more tags (and so their maps). Thus, real problem is that (1) tag's maxsize property is not limited by MAX_BPAGES, (2) each new tag add some bounce pages to bounce zone without any limit. So, I suggest (1) to limit tag's maxsize property, (2) to limit page count used by all bounce zones. It seems to me that these two limitations should be right. The limits should have some defaults with possibility to be changed by config or in boot time. Any objections? >> >>> >>> The next question is, if case (1) should be limited by MAX_BPAGES as >>> in case (3) or maybe better if there should be some internal >>> limitation for bounce zone itself. >> Could we apply e.g. MAX_BPAGES limit to bounce zones, or should we allow >> the limit to change based on the tag ? But I am not sure if there is >> any reasonable way to formulate the limit. > > Neither me. > >> >> MAX_BPAGES looks like some arbitrary sanity limit, e.g. we could have >> unlimited maxsize, but also have an alignment constraints, and then >> tag requires bouncing. I am not sure that hard-coded values, esp. the >> amd64 32MB limit, makes much sense, or that basing the limit on the >> tag constraints makes more sense. Might be, we should allow some total >> percentage of the physical memory on machine to be consumed by all >> bounce zones altogether ? > > IMO, some global limit should be there in any case. A tunable sounds > good, with some warning if a box owner wants too much. From owner-freebsd-arch@freebsd.org Tue Nov 24 22:20:25 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 132CCA3714C for ; Tue, 24 Nov 2015 22:20:25 +0000 (UTC) (envelope-from oshogbo.vx@gmail.com) Received: from mail-lf0-x231.google.com (mail-lf0-x231.google.com [IPv6:2a00:1450:4010:c07::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8C7261A47 for ; Tue, 24 Nov 2015 22:20:24 +0000 (UTC) (envelope-from oshogbo.vx@gmail.com) Received: by lfaz4 with SMTP id z4so39194068lfa.0 for ; Tue, 24 Nov 2015 14:20:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:mime-version:content-type :content-disposition:user-agent; bh=C0sTK7UihwiNJ0tUde022qlLlNTLmSDNFbi9DbIlOAk=; b=jmHbRbQlIT3IApLuejj/uh7clSXWE5Xm/F9tlMT+3olrShyFtcT4syH/ZnLf+OS1Ke 3otJyfOiDrLNBTWF8eMBlJ0LQHP+sY6C9B9A/QPTJqGgTa5749ABAkQDH7odQP20MktO kDA2jcIoFhg+gEY6DXXWYJXgO/onSKssH2ua7p6WOEycZuqGuv5aj+1eknmdvwKxSEwn o12QcDVrUGV9af6LCqwe5PuyJNeIvXytINbuSMZLIq7NoI0BuAAF/8GZpm67ZA8X/QED 8BEWgHjzoIOfBU3hEiE1hLSrPL3gFOvBrzYEJs6Xh8MIPtir5anKHYkSS/hkcCubBNL6 TO6g== X-Received: by 10.25.218.9 with SMTP id r9mr14484493lfg.138.1448403622623; Tue, 24 Nov 2015 14:20:22 -0800 (PST) Received: from jarvis.chello.pl (89-69-121-31.dynamic.chello.pl. [89.69.121.31]) by smtp.gmail.com with ESMTPSA id e63sm2921045lfe.5.2015.11.24.14.20.21 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 24 Nov 2015 14:20:21 -0800 (PST) Sender: Mariusz Zaborski Date: Tue, 24 Nov 2015 23:23:46 +0100 From: Mariusz Zaborski To: freebsd-arch@freebsd.org Cc: cl-capsicum-discuss@lists.cam.ac.uk Subject: Casper new architecture. Message-ID: <20151124222346.GA91383@jarvis.chello.pl> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="pWyiEgJYm5f9v55/" Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Nov 2015 22:20:25 -0000 --pWyiEgJYm5f9v55/ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hello, I have finally a new version of Casper and I wanted to share with you. [1] I would like to ask for code review and if possible for some people to run tests on thier machines. We decided to change a few things in the Casper architecture: The first one is that Capser isn't a daemon any more. Now, after calling the cap_init(3) function Casper will fork from it's original process, using pdfork(2). Thanks to the changes in r286698 the pdforking will not have any affects to the original process. Forking from a process has a lot of advantages: 1* We have the same cwd as the original process (so the programmer must be aware that if he changed the original process working directory, Casper directory will not be changed). But I feel that this is acceptable limitation. 2* The same uid, gid and groups. 3* The same MAC labels. 4* The same descriptor table. This is important for a filesystem service because with the old Casper for example, process substitution was not possible. When the filesystem service arrives, I want to add some special flags that will tell zygote processes to not clear file descriptors. Right know for the service, all fd are closed. 5* The same routing table. We can change routing table for process using setfib(2). 6* The same umask. 7* The same cpuset(1). And I probably missed some things on this list. I decided to remove the libcapsicum library. In my opinion Capser is connected with capsicum, but Casper can be used also with different sandbox techniques. Now I would like to refer to it just as libcasper. Second change is that Casper will not execute any binary files. Now every services (cap_{dns,grp,etc.}) will be in form of shared library. I don't see, right now, any advantages on keeping services as executable binaries. It's a little bit problematic to manage services when we don't have a global daemon. Services can be in different locations and hard coding one path (like before /etc/casperd) didn't feel right. On the other hand, adding additional arguments to cap_init() also don't convince me, because how can the programmer guess where the administrator will put the Casper services. So in my opinion using dynamic libraries right know is the best option. Programs need to know the replacement API (for example cap_gethostbyname which replace gethostbyname) so it needs to include some library (before just one global library, libcapsicum), so why not store services inside that library? Thanks to that we also have an implementation of service and replaced API in one place, so we don't need to jump between libexec and lib directories. I hope that you like new architecture of Casper. Cheers, Mariusz [1] https://people.freebsd.org/~oshogbo/casper.patch --pWyiEgJYm5f9v55/ Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQJ8BAEBCgBmBQJWVONjXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRBQ0I3NjA5RUE2Q0JEMTM5QUE5NUU2Mjg1 NDFFNzc1RTk2N0Y4OUNGAAoJEFQed16Wf4nPR/IQALnV/fFTTZFOWQdIFj8MxeE7 9E8y0SwSAKyDi/3w+LaVmlfoxe1GlejoojzTpSGAMw5NKpzqzvDcYl8q5j/k/1nN BHW4zLHqNXRgNvjnC/EZGApGpfqJC+ZDQacOIEI7YAqEmbRi00fl4mRcEpr4kmf4 gNjuZBAHGVYIZ3BlwO05tX4srK0MavfU+mgBZzbs9pquiIQL44UcSTQPHFvt0YgZ dv/jpZxJSbvLNRpWM/3PxlRDC1088L1Lxo0Ele06UNlogxFcpW9Sq615EQjLPGrh heMyb6KMp5UKqgxY9t2B+merIoOpBdzggX0CgsjYdkMRYYAmEtunVfR0hDuQLqgf O4EdEnjUIfFu0jFl1aMrwmAgPUMGZ6b+4/u0dpy6d+1bp2qpiZbqmEEUFZRDsxxZ YT5DavoJ1PosnkdqN1FS6rNjyyu+8v4tvPkMrYLkMJuaPolz9SYBG/12cq4vBAoj enXFywMaUEZJZzbfa+s1vC2ATkLn5T3sKQUxm4U73N0goz3dr9P0e++OBrmBPHPC 52TpHWjiY2cT7mJ+UR7hm/mzIRYwV/DPchE8UFWZKfKHTmNki9gWpltyxRRaL3k0 KHWCxkyNSewPr4ZvUeAHd7QuZcUfDJHeHe81YHsVEWPzC9ZrSL/Lm3gRuWcJwnT3 anBKYD2mJG38L9aOlHJq =I2Uq -----END PGP SIGNATURE----- --pWyiEgJYm5f9v55/-- From owner-freebsd-arch@freebsd.org Wed Nov 25 00:05:46 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DD651A3636B for ; Wed, 25 Nov 2015 00:05:46 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-vk0-x233.google.com (mail-vk0-x233.google.com [IPv6:2607:f8b0:400c:c05::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9494019EC for ; Wed, 25 Nov 2015 00:05:46 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by vkfr145 with SMTP id r145so24253194vkf.0 for ; Tue, 24 Nov 2015 16:05:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=iHCe+3NO/5Fxg51NApG+Xqm4VXSyRc8W7+g3XZsFtNY=; b=oXcNEUrWUVUxXxw3UnERYUnHVWMAy6XKXa2GyRxza4EPtRYkFQSq2o/QPigrMp445g JNmJdOgNBwVCYPIJmdMO1JBoEN7aiKGUa+Ht2rdG8DRTyEE5YVj9uMe2fDU73stH/U4l 4mGw/lO0WhoZuK7qxcVMogBv5OeRmIL/uYiXml5OKIFtePwq+azYtJJ9x2PrDatj4pdH aGg2FhyHFirT35Ydg+a0yKpIRusGX9rHhnM7H76WvAKyNlAfNw9fnLqlAD24elrtSpBW ZQdwxSWNjkxnFIGoINEHb62S6xjQ0s7QNtE7zJLlACNVrsfdZGet3AAh4e97mvUo+NPJ Mz1A== X-Received: by 10.31.52.68 with SMTP id b65mr30215193vka.150.1448409945483; Tue, 24 Nov 2015 16:05:45 -0800 (PST) Received: from wkstn-mjohnston.west.isilon.com (c-67-182-131-225.hsd1.wa.comcast.net. [67.182.131.225]) by smtp.gmail.com with ESMTPSA id x185sm16626506vkd.12.2015.11.24.16.05.44 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 24 Nov 2015 16:05:45 -0800 (PST) Sender: Mark Johnston Date: Tue, 24 Nov 2015 16:07:22 -0800 From: Mark Johnston To: Bruce Evans Cc: Jilles Tjoelker , freebsd-arch@freebsd.org Subject: Re: zero-cost SDT probes Message-ID: <20151125000721.GA70878@wkstn-mjohnston.west.isilon.com> References: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com> <20151122164446.GA22980@stack.nl> <20151122235903.GA5647@raichu> <20151123113932.C906@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151123113932.C906@besplex.bde.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Nov 2015 00:05:47 -0000 On Mon, Nov 23, 2015 at 02:48:14PM +1100, Bruce Evans wrote: > On Sun, 22 Nov 2015, Mark Johnston wrote: > > > On Sun, Nov 22, 2015 at 05:44:46PM +0100, Jilles Tjoelker wrote: > >> On Sat, Nov 21, 2015 at 06:45:42PM -0800, Mark Johnston wrote: > >>> For the past while I've been experimenting with various ways to > >>> implement "zero-cost" SDT DTrace probes. Basically, at the moment an SDT > >>> probe site expands to this: > >> > >>> if (func_ptr != NULL) > >>> func_ptr(); > >> > >>> When the probe is enabled, func_ptr is set to dtrace_probe(); otherwise > >>> it's NULL. With zero-cost probes, the SDT_PROBE macros expand to > >> > >>> func(); > >> > >>> When the kernel is running, each probe site has been overwritten with > >>> NOPs. When a probe is enabled, one of the NOPs is overwritten with a > >>> breakpoint, and the handler uses the PC to figure out which probe fired. > >>> This approach has the benefit of incurring less overhead when the probe > >>> is not enabled; it's more complicated to implement though, which is why > >>> this hasn't already been done. > >> > >>> I have a working implementation of this for amd64 and i386[1]. Before > >>> adding support for the other arches, I'd like to get some idea as to > >>> whether the approach described below is sound and acceptable. > >> > >> I have not run any benchmarks but I expect that this removes only a > >> small part of the overhead of disabled probes. Saving and restoring > >> caller-save registers and setting up parameters certainly increases code > >> size and I-cache use. On the other hand, a branch that is always or > >> never taken will generally cost at most 2 cycles. > > Hi resolution kernel profiling (which adds a function call and return to > every function without engrotting the source code with explicit calls) > has surprisingly little overhead when not in use. This depends partly > on it not using -finstrument-functions. -finstrument-functions produces > bloated code to pass 2 args (function pointer and frame pointer) in a > portable way. > (Unfortunately, -mprofiler-epilogue was broken (turned into a no-op) > in gcc-4.2.1 and is more broken (unsupported) in clang. It is > regression-tested by configuring LINT with -pp, but the test is > broken for gcc-4.2.1 by accepting -mprofiler-epilogue without actually > supporting it, and for clang by ifdefing the addition of > -mprofiler-epilogue. I normally don't notice the bug since I use > old versions of FreeBSD with a working gcc (3.3.3), and recently > fixed 4.2.1. The fix is fragile, and I just noticed that it doesn't > work for functions returning a value that don't have an explicit > return statement (this is permitted for main()). Old versions of > FreeBSD used -finstrument-functions during a previous round of > compiler breakage.) > > -pg and -mprofiler-epilogue produce calls to mcount and .mexitcount > for every function (and most trap handlers and some jumps). These > functions execute "cmpl $ENABLE,enable_flag; jne done; ... done: ret $0" > when the feature is not enable. The branch is easy to predict since > there are only 2 instances of it. Scattered instances in callers might > bust the branch target cache. So might the scattered calls to .mcount > and .mexitcount. Calls are similar to unconditional branches so they > are easy to predict, but I think old x86's (ones not too old to have a > branch target cache) don't handle them well. > > High resolution kernel profiling doesn't support SMP, except in my > version where it is too slow to use with SMP (due to enormous lock > contention). Low resolution kernel profiling has bad locking for > the SMP case. The bad locking first gives the enormous lock > contention even when profiling is not enabled, since it is done > before checking the flag. It also gives deadlock in ddb and some > other traps. In the non-SMP case, similar wrapping before checking > the flag makes the low-res case is a smaller pessimization. > > A common function checking the flags wouldn't so well if func_ptr or > depends on the call site. But if you can figure out > everything from the call site address then it could work similarly > to the breakpoint instruction. > > Kernel profiling could use similar nop/breakpoint methods. It is > much easier to patch since there are only 2 functions and 2 args. > It already determines the args from the frame. This is the main > difference between it and -finstrument functions. > > > I've done some microbenchmarks using the lockstat probes on a Xeon > > E5-2630 with SMT disabled. They just read the TSC and acquire/release a > > lock in a loop, so there's no contention. In general I see at most a small > > difference between the old and new SDT implementations and a kernel with > > KDTRACE_HOOKS off altogether. For example, in my test a mtx lock/unlock > > pair takes 52 cycles on average without probes; with probes, it's 54 > > cycles with both SDT implementations. rw read locks are 77 cycles > > without probes, 79 with. rw write locks and sx exclusive locks don't > > appear to show any differences, and sx shared locks show the same > > timings without KDTRACE_HOOKS and with the new SDT implementation; the > > current implementation adds a cycle per acquire/release pair. > > When high resolution profiling is enabled, it takes similar times. It > also uses the TSC (directly and not slowed down by synchronization) and > this use is a major difference between it and low resolution profiling. > Overheads on haswell: UP: 60 cycles for .mcount and 40 cycles for > .mexitcount; SMP: 88 cycles for .mcount and 60 cycles for .mexitcount. > IIRC, rdtsc takes 24 cycles on haswell -- more than half of the UP > .mexitcount time; synchronizing it takes another 12-24 cycles; it was > much slower on older Intel x86 (65 cycles on freefall with its previous > CPU) and much faster on older amd x86 (9 cycles on Athlon XP). This > slowness make sit not very useful to optimize other things in anything > that uses the TSC. > > I can believe a mere 2 cycle difference between the highly optimized > version and the simple version. It seems hardly worth the effort to > optimize. Arg pushing and tests and/or calls are fast, especially > when they don't do anything. It might be possible to optimize them > better to reduce dependencies, so that they take zero time if they > can be executed in parallel. The 100-200 extra cyles per function > given by enabled kernel profiling make a difference of about 100% > (twice as slow). With giant locking, this scales with the number > of CPUs in the kernel (800%) (8-9 times slower) with 8 such CPUs. > > > None of this takes into account the cache effects of these probes. One > > advantage of the proposed implementation is that we eliminate the data > > access required to test if the probe is enabled in the first place. I'm > > Accesses to the same variable cost little. Note that the current SDT implementation uses a different variable for each probe, so the resulting cache pollution gets worse as probes are added to various paths in the kernel. The current and proposed implementations allow one to have KLDs create and register probes automatically when they're loaded, which can't really be accomplished with a single variable and static set of flags. > [...] > In fact, I think you can do everything with this method and no complex > decoding: > > # The following is identical to the above except for this comment. > # To optimise this, replace the next 2 instructions by nops or > # a trap instruction and nops. This gives only a time optimization. > # Not large, but easy to do. When not enabled, space is wasted > # far away where it doesn't mess up the cache. > cmpl $ENABLE,enable_flag > je slow_path > back: > # main path here > ret > slow_path: > cmpl $NULL,func_ptr > je back > # load func_ptr and args > # ... > jmp back > > High resolution profiling deals with this partially as follows: > - calling .mcount and .mexitcount tends to require more saving than > necessary. gcc is not smart about placing these calls in the best > position. The main thing done wrong is frame pointer handling. > - but it is arranged that .mexitcount is called without saving the > return register(s). The callee preserves them if necessary (only > necessary if profiling is enabled). > > > Consider the disassemblies for __mtx_lock_flags() here: > > https://people.freebsd.org/~markj/__mtx_lock_flags_disas.txt > > Based on what I said above and assuming a 64-byte cache line size, I'd > > expect all instructions between 0xffffffff806d1328 and 0xffffffff806d134e > > to be loaded regardless of whether or not the branch is taken. Is that not > > the case? > > They are large and ugly already. It probably doesn't matter for them. > Lots of instructions can be loaded (and excecuted speculatively) during > the slow locked instruction. Perhaps during the many earlier instructions. > The instructions at the branch target are more important. I think the > previous branch is usually taken (for no contention) and the branche at > ...1328 is unimportant. Both branches are into another cache line and > there is no padding to align the branch targets because such padding > would not be useful for the target arch (or the optimizations are not > complete). I think it is only important that enough instructions in > the branch target are in the same cache line as the target. That usually > happens accidentally when the target is the function epilogue. > > > I'll also add that with this change the size of the kernel text shrinks > > a fair bit: from 8425096 bytes to 7983496 bytes with a custom MINIMAL-like > > kernel with lock inlining. > > Kernel profiling also bloats the kernel a lot. I was happy to get back > to the lesser bloat given by .mcount/.mexitcount instead of > __cyg_profile_name_too_long_to_remember(). But the text space needed is > small compared with the data space which is allocated at runtime (the > allocation is stupid and allocates space even if profiling is never used). My approach also has this problem. The call sites are stored in a linker set and end up taking up 16 bytes per probe site. There is also some runtime-allocated memory for SDT probes, but this is small relative to everything else. Overall, the kernel's size doesn't end up changing significantly. > > > Finally, I should have noted in my first post that this work has other > > motivations beyond possible performance improvements. In particular, > > recording call sites allows us to finally fill in the function component > > of SDT probes automatically. For example, with this work it becomes > > possible to enable the udp:::receive probe in udp6_receive(), but not > > the one in udp_receive(). > > Patching inline "cmpl $ENABLE,enable flag" also allows this very easily, > unless you want to vary the args for a single call site. It's easy if you know the address of the test, but how can one figure that out programatically? The reason that my implementation is able to distinguish between the probes in udp_receive() and udp6_receive() is because it uses relocations against the probe stubs to discover the probe sites. This would seem to suggest an approach where a probe macro expands to if (__predict_false(sdt_enabled)) (); That is, use a single flag variable that is set when _any_ SDT probe is enabled to determine whether to enter the slow path, which is hopefully kept out of the I-cache in most cases when the probes are not enabled. The downside is that enabling any probe will pessimize the code for all of them, but not in a way that's worse than what we have today. > > Yet another variation, which is easier to implement since it only requires > modifying data: > > call decoder > testl %rax,%rax > je slow_path > # rest as above > > 'decoder' now just checks a table of enable flags indexed by the call > site and returns true/false. All the args passing remains inline (but > far away). (First check a global enable flag to optimize for the > non-enabled case). I think this has a similar cache pollution problem. And it seems difficult to handle the case where a KLD load/unload may add or remove entries from the table, unless I'm misunderstanding your proposal. > > >> Avoiding this overhead would require not generating an ABI function call > >> but a point where the probe parameters can be calculated from the > >> registers and stack frame (like how a debugger prints local variables, > >> but with a guarantee that "optimized out" will not happen). This > >> requires compiler changes, though, and DTrace has generally not used > >> DWARF-like debug information. > > I use gcc -O1 -fno-inline-functions-called-once, and usually i386 and > -march=i386, to prevent bogus optimizations. Maximal compiler > optimizations rarely gain as much as 1% in kernel code, even in > micro-benchmarks. Compilers like to produce large code that gives > negative optimizations except in loops, and kernels don't have many > loops. > > The above flags make ddb and stack traces almost correct on i386 for > public functions. Static functions are often called with args in > registers even on i386. To debug these, or to make them available > using "call", they must be declared with a regparm attribute (cdecl > only does enough for public functions where it is the default anyway). > > >> For a fairer comparison, the five NOPs should be changed to one or two > >> longer NOPs, since many CPUs decode at most 3 or 4 instructions per > >> cycle. Some examples of longer NOPs are in > >> contrib/llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp > >> X86AsmBackend::writeNopData(). The two-byte NOP 0x66, 0x90 works on any > >> x86 CPU. > > > > I'll try that, thanks. On amd64 at least, I think we'd have to use two > > NOPs: a single-byte NOP that can be overwritten when the probe is > > enabled, and then a four-byte NOP. > > I think some arches need more than 1 byte for a trap/breakpoint instruction. > > I jut remembered what the "ret $0" in .mcount does. It is because > plain ret doesn't work so well at a branch target, even when the target > is aligned. IIRC, "nop; ret" is just as good a fix on the arches where > plain ret is slower, but "ret $0" is better on older in-order arches > where nop is not so special. Extra nops or larger nop instructions > should also be sprinkled for alignment. It is too hard to know the > best number to use in asm code. Compilers generate magic amounts > depending on the arch and how far away the instruction pointer is from > an alignment boundary (so as to not align if too far away). I don't > trust compilers to get this right either, even if the arch matches > exactly. In the too-far-away cases, it is likely that a small change > nearby moves closer and thus gives faster code. > > Bruce From owner-freebsd-arch@freebsd.org Wed Nov 25 00:09:56 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 56D6AA363D7 for ; Wed, 25 Nov 2015 00:09:56 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-vk0-x230.google.com (mail-vk0-x230.google.com [IPv6:2607:f8b0:400c:c05::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 084001B0A for ; Wed, 25 Nov 2015 00:09:56 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by vkha189 with SMTP id a189so24028368vkh.2 for ; Tue, 24 Nov 2015 16:09:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=4U1GwqIdwNDc6515VEg7JL4AsQsqDZdgZQVmUB00pLY=; b=g8cu0rVEvloaMlIHz8mazZXXYGokoUZZW6idSf75XQW5oo5loLtdB7Hya+U/MD838I lePwvRzgu7FSYy1SV0clMbSbBtXKBRhM3cbcGKiVuf3hIlGS9Q4lOKCz0lwiMBdVXHvE MgFs8Y6gei/LpMZXAnZvaBFDumEcCiZy2lLmXN1i6aqxBVcuzMPZYyw0PVbmlbw1s+Ci JyUafozEeboMjeQ2MsLbjQ7FDczGSiRNwvf2e2PW8n55Ov7/8yY6JJ6pJl4ciS0LZv3l oQpxkgoXwiX6dMhNR4KcJgrwM9WoFckA8U5pANdhHWVgGekhMqZunsVKvcLam3dDjf4p mWzw== X-Received: by 10.31.147.81 with SMTP id v78mr28935089vkd.58.1448410195094; Tue, 24 Nov 2015 16:09:55 -0800 (PST) Received: from wkstn-mjohnston.west.isilon.com (c-67-182-131-225.hsd1.wa.comcast.net. [67.182.131.225]) by smtp.gmail.com with ESMTPSA id c190sm16598416vkc.16.2015.11.24.16.09.54 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 24 Nov 2015 16:09:54 -0800 (PST) Sender: Mark Johnston Date: Tue, 24 Nov 2015 16:11:36 -0800 From: Mark Johnston To: Konstantin Belousov Cc: freebsd-arch@FreeBSD.org Subject: Re: zero-cost SDT probes Message-ID: <20151125001136.GB70878@wkstn-mjohnston.west.isilon.com> References: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com> <20151123113511.GX58629@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151123113511.GX58629@kib.kiev.ua> User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Nov 2015 00:09:56 -0000 On Mon, Nov 23, 2015 at 01:35:11PM +0200, Konstantin Belousov wrote: > On Sat, Nov 21, 2015 at 06:45:42PM -0800, Mark Johnston wrote: > > Hi, > > > > For the past while I've been experimenting with various ways to > > implement "zero-cost" SDT DTrace probes. Basically, at the moment an SDT > > probe site expands to this: > > > > if (func_ptr != NULL) > > func_ptr(); > > > > When the probe is enabled, func_ptr is set to dtrace_probe(); otherwise > > it's NULL. With zero-cost probes, the SDT_PROBE macros expand to > > > > func(); > > > > When the kernel is running, each probe site has been overwritten with > > NOPs. When a probe is enabled, one of the NOPs is overwritten with a > > breakpoint, and the handler uses the PC to figure out which probe fired. > > This approach has the benefit of incurring less overhead when the probe > > is not enabled; it's more complicated to implement though, which is why > > this hasn't already been done. > > > > I have a working implementation of this for amd64 and i386[1]. Before > > adding support for the other arches, I'd like to get some idea as to > > whether the approach described below is sound and acceptable. > > > > The main difficulty is in figuring out where the probe sites actually > > are once the kernel is running. In my patch, a probe site is a call to > > an externally-defined function which is defined in an > > automatically-generated C file. At link time, we first perform a partial > > link of all the kernel's object files. Then, a script uses the relocations > > against the still-undefined probe functions to generate > > 1) stub functions for the probes, so that the kernel can actually be > > linked, and > > 2) a linker set containing the offsets of each probe site relative to > > the beginning of the text section. > > The result is linked with the partially-linked kernel to generate the > > final kernel file. > > > > During boot, we iterate over the linker set, using the offsets plus the > > address of btext to overwrite probe sites with NOPs. SDT probes in kernel > > modules are handled differently (and more simply): the kernel linker just > > has special handling for relocations against symbols named __dtrace_sdt_*; > > this is how illumos/Solaris implements all of this. > > > > My uncertainty revolves around the use of relocations in the > > partially-linked kernel to determine the address of probe sites in the > > running kernel. With the GNU ld in base, this happens to work because > > the final link doesn't modify the text section. Is this something I can > > rely upon? Will this assumption be false with the advent of lld and LTO? > > Are there other, cleaner ways to implement what I described above? > > You could consider using a cheap instruction which is conditionally > converted into the trap, instead. E.g., you could have global page frame > in KVA allocated, and for the normal operations, keep the page mapped > with backing by a scratch page. The probe would be a volatile read from > the page. > > When probes are activated, the page is unmapped, which converts the read > into the page fault. This is similar to the write barriers implemented > in some garbare collectors. > > There are two issues with this scheme: > - The cost of probe is relatively large, even if the low level trap > handler is further modified to recognize the probes by special > address access. > - The arguments passed to the probes should be put into some predefined > place, e.g. somwhere in the *curthread, since trap handler cannot fetch > them using the ABI conventions. > > As I mentioned above, this scheme is used by several implementations of > the language runtimes, but there gc pauses are rare, and slightly larger > cost of the even stopping the mutator is justified even by negligible > cost reduction for normal flow. I am not sure if this approach worths > the complications and overhead for probes. If I understood correctly, each probe site would require a separate page in KVA to be able to enable and disable individual probes in the manner that I described in a previous reply. Today, a kernel with lock inlining has thousands of probe sites; wouldn't the requirement of allocating KVA for each of them be prohibitive on 32-bit architectures? From owner-freebsd-arch@freebsd.org Wed Nov 25 13:15:44 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6E991A36025 for ; Wed, 25 Nov 2015 13:15:44 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 66A451FEB; Wed, 25 Nov 2015 13:15:43 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id tAPDFYr5003863 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Wed, 25 Nov 2015 15:15:34 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua tAPDFYr5003863 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id tAPDFX6R003860; Wed, 25 Nov 2015 15:15:33 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 25 Nov 2015 15:15:33 +0200 From: Konstantin Belousov To: Mark Johnston Cc: freebsd-arch@FreeBSD.org Subject: Re: zero-cost SDT probes Message-ID: <20151125131533.GB3448@kib.kiev.ua> References: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com> <20151123113511.GX58629@kib.kiev.ua> <20151125001136.GB70878@wkstn-mjohnston.west.isilon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151125001136.GB70878@wkstn-mjohnston.west.isilon.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Nov 2015 13:15:44 -0000 On Tue, Nov 24, 2015 at 04:11:36PM -0800, Mark Johnston wrote: > If I understood correctly, each probe site would require a separate page > in KVA to be able to enable and disable individual probes in the manner > that I described in a previous reply. Today, a kernel with lock inlining > has thousands of probe sites; wouldn't the requirement of allocating KVA > for each of them be prohibitive on 32-bit architectures? Several variations of the approach allow to control each probe site individually, while still avoiding jumps and reducing the cache consumption. And, of course, the biggest advantage is avoiding the need to change the text at runtime. E.g., you could have a byte allocated somewhere for each probe, with usual boolean values true/false for enabled/disabled state. Also, somewhere, you have two KVA pages allocated, say, starting at address p, the first page is mapped, the second page is not. The pages are shared between all probes. Then, the following code sequence would trigger the page fault only for enabled probe: movzbl this_probe_enable_byte, %eax movl (p + PAGE_SIZE - 4)(%eax), %eax This approach is quite portable and can be expressed in C. If expected count of probes is thousands, as you mentioned, then you would pay only for several KB of memory for enable control bytes. Another variant is possible with the use of INTO instruction, which has relatively low latency when not trapping, according to the Agner Fog tables. From owner-freebsd-arch@freebsd.org Wed Nov 25 22:19:34 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 597BEA37D01 for ; Wed, 25 Nov 2015 22:19:34 +0000 (UTC) (envelope-from oshogbo.vx@gmail.com) Received: from mail-wm0-x22d.google.com (mail-wm0-x22d.google.com [IPv6:2a00:1450:400c:c09::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id DD60D17D1 for ; Wed, 25 Nov 2015 22:19:33 +0000 (UTC) (envelope-from oshogbo.vx@gmail.com) Received: by wmuu63 with SMTP id u63so19134wmu.0 for ; Wed, 25 Nov 2015 14:19:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=wPbXfb3jgyp0eIzNo3ylvcz0bk30w1igAa8/WJPfR+w=; b=L6Ppd7fmFJ9zU+CaAKk3RQFviStKU7WigVf6WsUl3sEHtUPjX/fGzHETCRqYrCxCAA VYJsEpvlyDiVbr6RRQXT6tOMvXRzXTKwUuPaw/PNVKLE/b5Xhgru50EkFUrk/2qjsXQW ICmvKsqHOBinyXKVwGAhadsozl4POr74AKh2lmheEVFp4Nw6sZTC9qevQ0W9y2yFfWnA zZWfcMP+Yvg4EUpgJnLQuexC1kFhJKvSxgpG5rOXnNeRbpoYAqOCpapPpmQ27zshSgZQ PkqJOT7KCdELMM88qPyouAgOS0giub0zVuLIRoMUROebWm67jh2pzXzEz4NNB9NShegS IDxw== X-Received: by 10.28.127.200 with SMTP id a191mr6876wmd.27.1448489971897; Wed, 25 Nov 2015 14:19:31 -0800 (PST) Received: from jarvis.whl (58.wheelsystems.com. [83.12.187.58]) by smtp.gmail.com with ESMTPSA id v4sm25038249wjx.18.2015.11.25.14.19.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 25 Nov 2015 14:19:30 -0800 (PST) Sender: Mariusz Zaborski Date: Wed, 25 Nov 2015 23:22:56 +0100 From: Mariusz Zaborski To: Jonathan Anderson Cc: cl-capsicum-discuss@lists.cam.ac.uk, freebsd-arch@freebsd.org Subject: Re: [capsicum] Casper new architecture. Message-ID: <20151125222256.GA18861@jarvis.whl> References: <20151124222346.GA91383@jarvis.chello.pl> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="Dxnq1zWXvFF0Q93v" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Nov 2015 22:19:34 -0000 --Dxnq1zWXvFF0Q93v Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable I cc'ed arch@ one more time. In first email I was using to and Jon replay w= ent only to the capsicum group. On Tue, Nov 24, 2015 at 09:10:29PM -0330, Jonathan Anderson wrote: > On 24 Nov 2015, at 18:53, Mariusz Zaborski wrote: >=20 > > Hello, > > > > I have finally a new version of Casper and I wanted to share with you. = [1] > > I would like to ask for code review and if possible for some people to > > run tests on thier machines. >=20 > Hi Mariusz, >=20 > This sounds rather exciting, and I will be pleased to do some reading and= testing. However, possibly not in great quantities until the term ends on = 8 December. >=20 > As an initial meta-comment, might you be able to upload the patch to revi= ews.freebsd.org? Phabricator really does make it easier to read and comment= on long, complex patches like this one (I think we=E2=80=99re looking at 2= 2.5 klines?). Done: https://reviews.freebsd.org/D4277 . Thanks for advice. >=20 > > We decided to change a few things in the Casper architecture: > > > > The first one is that Capser isn't a daemon any more. > > Now, after calling the cap_init(3) function Casper will fork from > > it's original process, using pdfork(2). Thanks to the changes in r286698 > > the pdforking will not have any affects to the original process. > > Forking from a process has a lot of advantages: > > 1* We have the same cwd as the original process (so the programmer must= be > > aware that if he changed the original process working directory, Casper > > directory will not be changed). But I feel that this is acceptable limi= tation. > > 2* The same uid, gid and groups. > > 3* The same MAC labels. > > 4* The same descriptor table. This is important for a filesystem service > > because with the old Casper for example, process substitution was not > > possible. When the filesystem service arrives, I want to add some speci= al > > flags that will tell zygote processes to not clear file descriptors. Ri= ght > > know for the service, all fd are closed. > > 5* The same routing table. We can change routing table for process using > > setfib(2). > > 6* The same umask. > > 7* The same cpuset(1). > > And I probably missed some things on this list. >=20 > Without reading or running the code yet, I suspect that this is a good ta= ck to take when pursuing application-level compartmentalization. Of course,= the new question becomes, can some Casper instance still service multiple = applications in, e.g., a login session? Can we interpose on messages sent f= rom an application to its libcasper to the desktop Casper to the system Cas= per (if there is such a thing any more)? I think this was the =E2=80=9CCasp= ers all the way down=E2=80=9D discussion in Cambridge a couple of years ago= =2E :) There isn't any instance of a global Casper. I remember that discussion. You suggested to mix both approach have one Cas= per in library and second one as global. For now I don't see any advantages of having them both. Not sure if I understand your example but you still can s= hare Casper process or one services. You can clone and send the Casper channel to other program, but for some services you need to remember that some things = will be inhered from original process (list above). >=20 >=20 > > I decided to remove the libcapsicum library. In my opinion Capser is > > connected with capsicum, but Casper can be used also with different san= dbox > > techniques. Now I would like to refer to it just as libcasper. >=20 > I think this sounds very sensible. >=20 >=20 > > Second change is that Casper will not execute any binary files. > > Now every services (cap_{dns,grp,etc.}) will be in form of shared libra= ry. > > I don't see, right now, any advantages on keeping services as executable > > binaries. It's a little bit problematic to manage services when we don'= t have a > > global daemon. Services can be in different locations and hard coding o= ne path > > (like before /etc/casperd) didn't feel right. On the other hand, adding= additional > > arguments to cap_init() also don't convince me, because how can the pro= grammer > > guess where the administrator will put the Casper services. So in my op= inion using > > dynamic libraries right know is the best option. Programs need to know = the replacement > > API (for example cap_gethostbyname which replace gethostbyname) so it n= eeds to > > include some library (before just one global library, libcapsicum), so = why not > > store services inside that library? >=20 > Yes, libraries do seem to be the natural place to land when Casper pdfork= s from the application rather than starting as a system daemon. One questio= n would be whether it=E2=80=99s possible to make the Casper libraries wrap = their native counterparts such that we can replace symbols for transparent = use (e.g., keep calling gethostbyname rather than cap_gethostbyname). I'm not sure if this is possible as you presented. I think we would had a symbols collisions with libc. So when I was implementing a fileargs library (//depot/user/oshogbo/capsicum_rights2/lib/libfileargs/...) I done funny trick. Basically we would always have lib_dns.so but depending on compilation we would use or not use casper in it: int cap_gethostbyname(...) { #ifdef HAVE_LIBCAPSICUM return (casper_gethostbyname(...)); #else return (gethostbyname(...)); #endif } So our application always would use cap_gethostbyname and #ifdef would be m= oved to services. It would make application much simpler. I would love to discuss this approach at some point. >=20 > > Thanks to that we also have an implementation of > > service and replaced API in one place, so we don't need to jump between= libexec > > and lib directories. > > > > I hope that you like new architecture of Casper. >=20 > Me too! :) >=20 > I wonder if you might be able to provide a little more discussion at the = level of how the IPC works, etc.? So what exactly interest you? I will tell you in general how it works now. Application starts. Service first register it self in libcasper thanks to library constructor (which is hidden in CREATE_SERVICE macro). Then we run cap_init() in application. Our process forks. The zygote is created and libcasper waits for commands. Then you open some service (service.dns for example). To this you use cap_service_open() function which will send nvlist whit the name of service= to Casper process. Casper process search for right functions (command function= and limit function) and pass it to zygote process. Zygote transform to be a new service. Casper send to the original process descriptor to talk with a new service. All IPC here is done using nvlist. After that you can create next service or close connection to Casper. Next you can limit your service using cap_limit_set. Its operates on nvlist. Its depending on services what you should put to this nvlist. After limiting you are using cap_*() functions which are sending commands t= o the service. Thanks for interest, Jon. Cheers, Mariusz --Dxnq1zWXvFF0Q93v Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQJ8BAEBCgBmBQJWVjSyXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRBQ0I3NjA5RUE2Q0JEMTM5QUE5NUU2Mjg1 NDFFNzc1RTk2N0Y4OUNGAAoJEFQed16Wf4nPyI4QAMNNCPp6htDT8AbwW5w1iWNW RJMHJvdPH8GmPhJWZ8MINYW+5gD7Ib1w84m2ph9nPLN/bG/DqPCBSTyjcLD+5oXn zYi08ResDQYVPEIa9TTXi1NmpuybbIf9oDUzCnMepVvZ+YfOJtzhEMfUERY70oWW QpxR9Y1V+ktD/PHLEaOBHJ5BJGSi5Zh5si915f4t+EMg6TjxrdEt/6/wNcce1x5Z zNIBjAWZvkwDz9/OMtb9HPkPZS3HmI7iBz61W96hWn4E4mX9hB73svHipOsYvwCV rxwrkWk0/+57EUJNU1g+PYU+SuJwrXJqbip5+nr9jMByzYuWTB1jObG4io61+qjr TGkuj8NjauP0aq1bkaWGY1pU/CawKOcs7+oR9ZoI0LzgHJj4e8q1MAPBAzNOVf97 xO5V47upqbZRVDdctuXmx2ggiTdVLNGEXFJA/Ct6+m40bU2CS3NU0pdq+THIQfzj dDNbc1eEEvuXqDQYZGQF+On584YkBDqOb+HxlxJjGi3D6bC7BuUDBLjQ1BBAbcbB bxxUP96TDt6ytaHxfoUSVYiZAXG2erFxEgy2TKcg53J93YreNvtWU/LeSyGSwaiy QUztH51sL2rdBvalQxMt33/YWt6PiSepp0ZqqpGP0uuDlvfiAQRXyO7xJzvrsF9b QGbbO3tLwp4MJbvBT1sS =ISYz -----END PGP SIGNATURE----- --Dxnq1zWXvFF0Q93v-- From owner-freebsd-arch@freebsd.org Wed Nov 25 23:23:47 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A6C23A36B24 for ; Wed, 25 Nov 2015 23:23:47 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qg0-x22f.google.com (mail-qg0-x22f.google.com [IPv6:2607:f8b0:400d:c04::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 58B3519B8 for ; Wed, 25 Nov 2015 23:23:47 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by qgec40 with SMTP id c40so43592829qge.2 for ; Wed, 25 Nov 2015 15:23:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=b+an/08KLDBip42lT3yPevLGf7qo+lEVJS5whjCmHLQ=; b=IAuywU0KMxwzt1FNOZ3jQfy069rlOaF1bjgJs5uqMnaYls9PoHFD9bveERHcMklLAX Y2mNnSK3UjeaEMzjTYJOx1AaPAyNOH45/j45wmOD82lpZ6hj2O7RGONISul5whRGJJu4 7VmDhRvfOydaz3FXxDSv59tuLWj4nzf+rIf+TQTu2qk+EpKqIu8Ted9gPJFn+CX6u4jZ pSrgGPxWn2DiyhaIoyuFLQMpXQ8ZCswnTnQ9yfaqTBtb1hFOqCVKcP4Fvu3yNJ4srUTc QQsBesGdv+w2pyrlAwjG63PnO5VjcmzHbMFP0gsYrNs1Itsl2eMN5pR+udPCYcMEZTXL Se4w== X-Received: by 10.140.172.3 with SMTP id s3mr45077432qhs.6.1448493826331; Wed, 25 Nov 2015 15:23:46 -0800 (PST) Received: from wkstn-mjohnston.west.isilon.com (c-67-182-131-225.hsd1.wa.comcast.net. [67.182.131.225]) by smtp.gmail.com with ESMTPSA id t47sm6233205qgt.28.2015.11.25.15.23.45 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 25 Nov 2015 15:23:46 -0800 (PST) Sender: Mark Johnston Date: Wed, 25 Nov 2015 15:25:24 -0800 From: Mark Johnston To: Konstantin Belousov Cc: freebsd-arch@FreeBSD.org Subject: Re: zero-cost SDT probes Message-ID: <20151125232524.GB67865@wkstn-mjohnston.west.isilon.com> References: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com> <20151123113511.GX58629@kib.kiev.ua> <20151125001136.GB70878@wkstn-mjohnston.west.isilon.com> <20151125131533.GB3448@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151125131533.GB3448@kib.kiev.ua> User-Agent: Mutt/1.5.24 (2015-08-30) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Nov 2015 23:23:47 -0000 On Wed, Nov 25, 2015 at 03:15:33PM +0200, Konstantin Belousov wrote: > On Tue, Nov 24, 2015 at 04:11:36PM -0800, Mark Johnston wrote: > > If I understood correctly, each probe site would require a separate page > > in KVA to be able to enable and disable individual probes in the manner > > that I described in a previous reply. Today, a kernel with lock inlining > > has thousands of probe sites; wouldn't the requirement of allocating KVA > > for each of them be prohibitive on 32-bit architectures? > > Several variations of the approach allow to control each probe site > individually, while still avoiding jumps and reducing the cache consumption. > And, of course, the biggest advantage is avoiding the need to change the > text at runtime. > > E.g., you could have a byte allocated somewhere for each probe, with usual > boolean values true/false for enabled/disabled state. Also, somewhere, > you have two KVA pages allocated, say, starting at address p, the first > page is mapped, the second page is not. The pages are shared between all > probes. Then, the following code sequence would trigger the page fault > only for enabled probe: > movzbl this_probe_enable_byte, %eax > movl (p + PAGE_SIZE - 4)(%eax), %eax > This approach is quite portable and can be expressed in C. > > If expected count of probes is thousands, as you mentioned, then you > would pay only for several KB of memory for enable control bytes. > > Another variant is possible with the use of INTO instruction, which > has relatively low latency when not trapping, according to the Agner > Fog tables. I see. I think this could be made to work, but there's still the complication of passing arguments to the probe. Copying them into some block in curthread is one way to do this, but it seems more expensive than the standard calling convention on amd64 at least. From owner-freebsd-arch@freebsd.org Thu Nov 26 07:36:50 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 06307A36FE2 for ; Thu, 26 Nov 2015 07:36:50 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 194231480; Thu, 26 Nov 2015 07:36:48 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id JAA17531; Thu, 26 Nov 2015 09:31:47 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1a1r1v-000GOI-8u; Thu, 26 Nov 2015 09:31:47 +0200 Subject: Re: zero-cost SDT probes To: Mark Johnston , Konstantin Belousov References: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com> <20151123113511.GX58629@kib.kiev.ua> <20151125001136.GB70878@wkstn-mjohnston.west.isilon.com> <20151125131533.GB3448@kib.kiev.ua> <20151125232524.GB67865@wkstn-mjohnston.west.isilon.com> Cc: freebsd-arch@FreeBSD.org From: Andriy Gapon Message-ID: <5656B52B.90203@FreeBSD.org> Date: Thu, 26 Nov 2015 09:30:51 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <20151125232524.GB67865@wkstn-mjohnston.west.isilon.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Nov 2015 07:36:50 -0000 On 26/11/2015 01:25, Mark Johnston wrote: > On Wed, Nov 25, 2015 at 03:15:33PM +0200, Konstantin Belousov wrote: >> Several variations of the approach allow to control each probe site >> individually, while still avoiding jumps and reducing the cache consumption. >> And, of course, the biggest advantage is avoiding the need to change the >> text at runtime. [snip] > I see. I think this could be made to work, but there's still the > complication of passing arguments to the probe. Copying them into some > block in curthread is one way to do this, but it seems more expensive > than the standard calling convention on amd64 at least. Besides, the FBT probes are not going anywhere and they require the run-time text modification. -- Andriy Gapon From owner-freebsd-arch@freebsd.org Thu Nov 26 12:23:11 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CDE7FA39CEF for ; Thu, 26 Nov 2015 12:23:11 +0000 (UTC) (envelope-from dr2867@pacbell.net) Received: from nm27-vm2.bullet.mail.ne1.yahoo.com (nm27-vm2.bullet.mail.ne1.yahoo.com [98.138.91.215]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 90BCC1ECE for ; Thu, 26 Nov 2015 12:23:11 +0000 (UTC) (envelope-from dr2867@pacbell.net) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pacbell.net; s=s2048; t=1448540169; bh=3uSlTIe+WRDvNbP3ZNpCXAWgRGOc8t97MGoAA7NNBuM=; h=Date:From:Reply-To:To:Cc:In-Reply-To:References:Subject:From:Subject; b=iAGEike8C+MOK9uEjiLmdlPrQ1qQTwauk1kiW8WkADzlpaHzHWknjeQKWZ6QuBCcTUtFcb3myLOtZw5phfozp8OCY/7oP82rjkztK47htk59FiBvD3EDpS4unN6AprMk7vhqWfUFi90V0xhPPibmH6Tfvk3DVOklxtlUaoiRNQtNWrTEegtNPyLm+Toe2Vx7u7cmlB+w9d/V4OamzyS9usoZM36I7kEZ+MBW/0xff5bs5k+b2QNwNcf/putbGV13t2DNl9+y7G72iCWZytwiHfy9CMzEmlnsTsiWXNH1uA0ziV13bvU6ohltJatPWf9gLfwfFfg3Ouo/+b/ajs3Lkg== Received: from [98.138.100.114] by nm27.bullet.mail.ne1.yahoo.com with NNFMP; 26 Nov 2015 12:16:09 -0000 Received: from [98.138.89.162] by tm105.bullet.mail.ne1.yahoo.com with NNFMP; 26 Nov 2015 12:16:09 -0000 Received: from [127.0.0.1] by omp1018.mail.ne1.yahoo.com with NNFMP; 26 Nov 2015 12:16:09 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 566662.58600.bm@omp1018.mail.ne1.yahoo.com X-YMail-OSG: fhWeBk4VM1ktqv9FfkvQHWfZC24Tt9J5fU9wDWMvSyQr6YpCx6DlfIHoYfW4BH5 RvqTo_Re79n1AMdehvhSRkiqt55.V1_TfnSB7fSSY2efGUkFSRRYGrehXD1eM0tfSxlF.aooBCeL nrywTl1kQmJPUAM5SCIKOCE8z2vv4ckEG4Q10ZhQAF9gi.NApaS1ReetgocE922PMpLiEdZlFj45 LO3GbgZFuO8lEcdS.bMAw6FavBf_AqXnAJiqkjzNizSSpzeh_BIixwITuh6j5Uqf_Qu3_uLwrXcD pf0ZD.YnIUOlM_NbVLS3sLQ0eT6Bpd3cMuYJbsx40tLuDySJW2DCPD3ZdFn.c2WY6MQ0lqXvo0um r4XxKTUmLXcuKv1BAvIDlZdr0ZXr__7gMvp4Cknmps3wHWIF8SwnPd8hz1z5Hfb_DVppClFAkj.8 djjNOYslW20dgnvEjCjABcYR5YytxB7LZHi_FA92VjGk4.HlfP53hyu1s7ufu7uHds.My.4E18WW oLja6zlH1TQIYvJi0K.1L5g-- Received: by 98.138.105.252; Thu, 26 Nov 2015 12:16:09 +0000 Date: Thu, 26 Nov 2015 12:16:08 +0000 (UTC) From: Daniel Rudy Reply-To: Daniel Rudy To: Jukka Ukkonen , Adrian Chadd Cc: "sparc64@freebsd.org" , Anna Wilcox , Warner Losh , freebsd-arch , Sean Bruno , Marius Strobl , Jordan Hubbard Message-ID: <907918196.5618077.1448540168305.JavaMail.yahoo@mail.yahoo.com> In-Reply-To: References: Subject: Re: Sparc64 doesn't care about you, and you shouldn't care about Sparc64 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Nov 2015 12:23:11 -0000 On Wednesday, November 11, 2015 8:33 AM, Jukka Ukkonen wrote: > I'm all for keeping an architecture like sparc around, as long as > there's active development and active users. MIPS has both. ARM has > both. Powerpc has both. Sparc is missing some active developers, but > it has plenty of FreeBSD users that speak up (and more users that only > speak up privately.) So, if you want to see sparc64 support continue, > this requires a grass roots effort to get more development happening - > either users need to step up, or someone has to start contributing > money. I have some Sparc64 hardware myself. A Sun Fire T2000 server. Runs OpenBSD just fine. I tried to get FreeBSD 10.2 installed and the CD boot trapped with illegal instruction. I don't check this email that often, but I will be willing to run some test builds to see if they will boot/install. _______________________________________________ freebsd-sparc64@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-sparc64 To unsubscribe, send any mail to "freebsd-sparc64-unsubscribe@freebsd.org" From owner-freebsd-arch@freebsd.org Thu Nov 26 16:29:29 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0446CA3A453; Thu, 26 Nov 2015 16:29:29 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-io0-x22a.google.com (mail-io0-x22a.google.com [IPv6:2607:f8b0:4001:c06::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BE5A11ABC; Thu, 26 Nov 2015 16:29:28 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: by ioc74 with SMTP id 74so91074185ioc.2; Thu, 26 Nov 2015 08:29:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=gAhubBZIhvkWwKLvF0p8AYxm+g4n+eU+8DeOISnbZ9E=; b=jTKimz12fRWQHt2lQB204uXenWQrhpqbrw+tsvUU7nxH7SxJ9csNdwV6JzuHdvXQXD p2tUHUo8rE6pf3AJfrzljC0XbOymE2SnZmJRPuv6eC/u6tk/6Luk9or5+8rrcgrePtLf u4y8bUwYKG44mNR91VTrBbGEuWH/amHuG2smC88i9a/DEKPt5UbQnXboxhQetczu4jiQ joK1hhVnB9wL7euS5fHIvFLuO/Nc2oJ7n3nL5r9LW209KKfWS4pRQhND71TP3Q3kFWYQ AUh5mnvhInqwtJTaBXwARyPIq+MOIhJnYxnr11DBd71yKG1AQunjgHKNQiA5z7rrU0Qr V+yw== MIME-Version: 1.0 X-Received: by 10.107.162.21 with SMTP id l21mr38620841ioe.123.1448555367981; Thu, 26 Nov 2015 08:29:27 -0800 (PST) Received: by 10.36.217.196 with HTTP; Thu, 26 Nov 2015 08:29:27 -0800 (PST) In-Reply-To: <907918196.5618077.1448540168305.JavaMail.yahoo@mail.yahoo.com> References: <907918196.5618077.1448540168305.JavaMail.yahoo@mail.yahoo.com> Date: Thu, 26 Nov 2015 08:29:27 -0800 Message-ID: Subject: Re: Sparc64 doesn't care about you, and you shouldn't care about Sparc64 From: Adrian Chadd To: Daniel Rudy Cc: Jukka Ukkonen , "sparc64@freebsd.org" , Anna Wilcox , Warner Losh , freebsd-arch , Sean Bruno , Marius Strobl , Jordan Hubbard Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Nov 2015 16:29:29 -0000 On 26 November 2015 at 04:16, Daniel Rudy wrote: > > > On Wednesday, November 11, 2015 8:33 AM, Jukka Ukkonen = wrote: > > >> I'm all for keeping an architecture like sparc around, as long as >> there's active development and active users. MIPS has both. ARM has >> both. Powerpc has both. Sparc is missing some active developers, but >> it has plenty of FreeBSD users that speak up (and more users that only >> speak up privately.) So, if you want to see sparc64 support continue, >> this requires a grass roots effort to get more development happening - >> either users need to step up, or someone has to start contributing >> money. > > > I have some Sparc64 hardware myself. A Sun Fire T2000 server. Runs Open= BSD just fine. I tried to get FreeBSD 10.2 installed and the CD boot trapp= ed with illegal instruction. I don't check this email that often, but I wi= ll be willing to run some test builds to see if they will boot/install. Yeah, freebsd-10.x will likely never run on sun4v. I have a T2000 arriving soon. I'll spin up openbsd on it to see how it runs and see what we can crime to update/resurrect the sun4v support. -a > > _______________________________________________ > freebsd-sparc64@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-sparc64 > To unsubscribe, send any mail to "freebsd-sparc64-unsubscribe@freebsd.org= " From owner-freebsd-arch@freebsd.org Thu Nov 26 22:57:43 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 67499A3A59B; Thu, 26 Nov 2015 22:57:43 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-ig0-x22d.google.com (mail-ig0-x22d.google.com [IPv6:2607:f8b0:4001:c05::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3637019F2; Thu, 26 Nov 2015 22:57:43 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: by igl9 with SMTP id 9so17716744igl.0; Thu, 26 Nov 2015 14:57:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:message-id:subject:from:to:content-type; bh=RcNvqOYt9U2w6OFmW3ECPzjb0CCqmOJDLmxgObsZlCA=; b=OMwXauI/CrowtgTAsL/k+wUDMerTKamAs6psO3/bOhSzkyIpLtm+VgOrdkb5wWvQvc tNqwLE4D7Lm1U+SF9DHf1imuXsRygzH01SehwqMV1zPPX/as//SrK0ChpBfi1gj3IfaZ oEK+gskv6SXSFTTwYwfV8P9Th43W/7pJb8Zr715V+BpdWTRlQUWS0V12LP1L+hzyCrfX N3NxIve4SAneAURIgWxJuF9MFr+jlyTkWR1K7sr41pAECX0qAX7qJuJrexxxNNbogJzo hIVEzWZA6DPup0Q2Ky30s2oyQ3NhvOAKm6tp1ho/h1SArEiLE/dK943Ym/omiBJEIc0X EREA== MIME-Version: 1.0 X-Received: by 10.50.136.226 with SMTP id qd2mr5210450igb.37.1448578662242; Thu, 26 Nov 2015 14:57:42 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.36.217.196 with HTTP; Thu, 26 Nov 2015 14:57:42 -0800 (PST) Date: Thu, 26 Nov 2015 14:57:42 -0800 X-Google-Sender-Auth: cRIV_5TCwxZSzimfoHs7cSKyRM8 Message-ID: Subject: moving some 802.11 code into lib80211 From: Adrian Chadd To: "freebsd-wireless@freebsd.org" , "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Nov 2015 22:57:43 -0000 hiya, I've started migrating net80211 specific bits out of ifconfig into a library. https://reviews.freebsd.org/D4290 I'd like to commit this initial step in a couple of days so I can easily add more to things. I'd love feedback. ;) Thanks, -adrian From owner-freebsd-arch@freebsd.org Fri Nov 27 07:31:53 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8E476A39E80; Fri, 27 Nov 2015 07:31:53 +0000 (UTC) (envelope-from rpaulo@me.com) Received: from mr11p00im-asmtp001.me.com (mr11p00im-asmtp001.me.com [17.110.69.252]) (using TLSv1.2 with cipher DHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 787C91D21; Fri, 27 Nov 2015 07:31:53 +0000 (UTC) (envelope-from rpaulo@me.com) MIME-version: 1.0 Content-transfer-encoding: 8BIT Content-type: text/plain; charset=UTF-8 Received: from akita.hsd1.ca.comcast.net (c-73-162-13-215.hsd1.ca.comcast.net [73.162.13.215]) by mr11p00im-asmtp001.me.com (Oracle Communications Messaging Server 7.0.5.35.0 64bit (built Mar 31 2015)) with ESMTPSA id <0NYG00D0CQ8XPY10@mr11p00im-asmtp001.me.com>; Fri, 27 Nov 2015 07:31:47 +0000 (GMT) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2015-11-27_01:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 kscore.is_bulkscore=0 kscore.compositescore=1 compositescore=0.9 suspectscore=0 phishscore=0 bulkscore=0 kscore.is_spamscore=0 rbsscore=0 spamscore=0 urlsuspectscore=0.9 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1510090000 definitions=main-1511270141 Message-id: <1448609505.4088.18.camel@me.com> Subject: Re: moving some 802.11 code into lib80211 From: Rui Paulo To: Adrian Chadd , "freebsd-wireless@freebsd.org" , "freebsd-arch@freebsd.org" Date: Thu, 26 Nov 2015 23:31:45 -0800 In-reply-to: References: X-Mailer: Evolution 3.18.2-1 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Nov 2015 07:31:53 -0000 On Thu, 2015-11-26 at 14:57 -0800, Adrian Chadd wrote: > hiya, > > I've started migrating net80211 specific bits out of ifconfig into a > library. > > https://reviews.freebsd.org/D4290 > > I'd like to commit this initial step in a couple of days so I can > easily add more to things. > > I'd love feedback. ;) > That looks useful.  A quick look at the API and I didn't see anything wrong with it... -- Rui Paulo