From owner-svn-src-all@FreeBSD.ORG Sat Nov 22 07:18:02 2014 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BEEB039B; Sat, 22 Nov 2014 07:18:02 +0000 (UTC) Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id 67B0ECBC; Sat, 22 Nov 2014 07:18:01 +0000 (UTC) Received: from c122-106-147-133.carlnfd1.nsw.optusnet.com.au (c122-106-147-133.carlnfd1.nsw.optusnet.com.au [122.106.147.133]) by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 8AC48D66ECB; Sat, 22 Nov 2014 18:17:45 +1100 (AEDT) Date: Sat, 22 Nov 2014 18:17:44 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Rui Paulo Subject: Re: svn commit: r274489 - in head/sys/amd64: amd64 include In-Reply-To: <35E5EAD8-99C1-43C0-8D01-B3B5B86ECA25@me.com> Message-ID: <20141122163552.H1447@besplex.bde.org> References: <201411132211.sADMBjP3009246@svn.freebsd.org> <35E5EAD8-99C1-43C0-8D01-B3B5B86ECA25@me.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=dMCfxopb c=1 sm=1 tr=0 a=7NqvjVvQucbO2RlWB8PEog==:117 a=PO7r1zJSAAAA:8 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=6I5d2MoRAAAA:8 a=x5yxdju0Swl4lk78KJwA:9 a=_ixRp0c-x380KhqA:21 a=pzufiC-cxJxxEct-:21 a=MPmkvig3PXUWFwLU:21 a=CjuIK1q_8ugA:10 Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Scott Long X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Nov 2014 07:18:03 -0000 On Thu, 20 Nov 2014, Rui Paulo wrote: > On Nov 13, 2014, at 14:11, Scott Long wrote: >> >> Author: scottl >> Date: Thu Nov 13 22:11:44 2014 >> New Revision: 274489 >> URL: https://svnweb.freebsd.org/changeset/base/274489 >> >> Log: >> Extend earlier addition of stack frames to most of support.S. This makes >> stack traces in KDB, HWPMC, and DTrace much more reliable and useful. > > No performance differences? The kernel enables/disables the compiler option to omit the frame pointer based on the kernel config file. If DDB, DTrace, or HWPMC is enabled, the frame pointer is always saved in C functions. That bug is only implemented for amd64 and powerpc: - it is in Makefile.amd64. It is hard-coded under the above options, and thus breaks any settings of -fno-omit-frame-pointer -fno-omit-leaf-frame- pointer in the user's options, depending on undocumented ordering of the options. It also breaks profiling. - it is in Makefile.powerpc unless DDB is configured. - it is not in Makefile.i386. files.i386 and files.pc98 take the necessary care to not blow away -fno-omit-frame-pointer in the user's options for atomic.c; however, all functions in atomic.c are leaf functions, so this may be broken now. The null documentation in cc.1 doesn't say. - it is in kmod.mk for some amd64 and powerpc. There it breaks modules unconditionally. The breakage for profiling is quite serious, since the frame pointer might be dereferenced unconditionally. However, amd64 and i386 still use my optimization of avoiding the dereference unless profiling is enabled as well as configured. Asm code in them uses my related optimization of not using a frame pointer at all for functions written in asm (ENTRY() hides the details, and the details are arranged so as not to The breakage is maximal for profiling of modules. You could have a kernel compile for profiling or just DDB, DTrace, or HWPMC, but modules not compiled for these. Only broken modules can depend on kernel options, and kmod.mk doesn't check the options anyway. The default is fail-unsafe for amd64 and powerpc. It gives broken modules that can never match the kernel profiling, DDB, DTrace or HWPMC options unless these are hacked into individual module Makefiles. This gives crashes soon if any module is used by a kernel with profiling configured and enabled. DDB can make invalid dereferences of the frame pointer, but these are trapped harmlessly (except someone broke the trap handler, so it now does a stack trace of ddb internals; this spams the console and risks a recursive trap). I don't know if DTrace and HWPMC also trap the dereferences. Profiling certainly doesn't. Kernel stack traces without DDB, DTrace or HWPMC on amd64 or DDB on powerpc seem to be broken, even without modules. > Some of these functions are in the hot path, so if you didn't see any performance problem, I wonder if we should disable -fomit-frame-pointer always. The performance problem is about 0.0001% of the time spent in the kernel (which is hopefully a small fraction of the time spent in userland) on modern OOE pipelined systems, since the frame pointer switch can run ini parallel on these systems and not many functions are well enough scheduled to not have spare resources for this. Especially on i386 where args are passed on the stack -- lots can run in parallel with just loading the args, and the only problems are the extra code size and extra memory accesses for switching the frame pointer. I made up the 0.0001% number. The number is tiny anyway, since there aren't many asm functions so it would take an unusual workload to spend even 1% of the time in these functions except possibly if they are copyin/out of large data. Then any extra 1-10 cycles in each function might be 1% of this. Functions like fubyte() are an exception -- even 1 extra cycle in them might have a measurable effect if they were called a lot. However, fubyte() isn't called a lot, and if it were then then a frame pointer is the least of its pessimizations. My old optimizations to avoid frame pointers for profiling had a small effect for i486's since i486's are in-order and only have 1 pipeline. Even then, the effect was insignificant when profiling was enabled since the main profiling routine took a long time and needs a frame pointer anyway. In a quick test of a microbenchmark in userland, -fomit-frame-frame-pointer -fomit-leaf-frame-pointer was 1 cycle slower (30 -> 31) for one function but 1 cycle faster (26 -> 25) for another function. The benchmark is known to execute about 2 copies of the function in parallel, so the extra instructions cost nothing if there is a spare slot for them to run in every 20-30 cycles. Compilers understand little of this. I think using a frame pointer is sometimes faster on x86 because instructions to access stack variables are 1 byte longer when not using a frame pointer and this sometimes cost. OTOH, it might be best to set up a frame pointer but not actually use it explicitly (it would only be used by DDB etc.), so that the frame pointer accesses have no dependencies except each other. Modern x86 hardware already does a lot of virtualization with special cases for the frame pointer to reduce dependencies, but it shouldn't hurt to reduce them explicitly. This happens automatically in the recent amd64 changes -- the frame pointer isn't used explicityly before or after. Bugs in this change include: - it isn't done for all arches. It would be harder on i386 since the args are on the stack and all stack offsets would change. - the details aren't hidden in the ENTRY() macro. Putting it there would make the necessary stack offsets a little harder to apply. The correct register to use for the stack offsets would also be a problem. Hard-coding use of the frame pointer would make the offsets easier to get right and not depend on options, except it would make using a frame pointer non-optional. Always using it wouldn't be too bad for leaf functions in asm, but there would still be complications for non-C functions. Profiling avoids some of these complications basically by setting up a frame pointer for the profiling call but undoing that before ENTRY() returns. This also allows the change to not interfere with profiling. - it doesn't track the -fomit-*-frame-pointer option in CFLAGS. Compilers are bad about putting their options in predefines. For profiling, the kernel options GPROF and GUPROF are used. Bruce