From owner-svn-src-head@freebsd.org Sun Mar 19 15:46:35 2017 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8BB1ED13735; Sun, 19 Mar 2017 15:46:35 +0000 (UTC) (envelope-from freebsd@pdx.rh.CN85.dnsmgr.net) Received: from pdx.rh.CN85.dnsmgr.net (br1.CN84in.dnsmgr.net [69.59.192.140]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 520261696; Sun, 19 Mar 2017 15:46:34 +0000 (UTC) (envelope-from freebsd@pdx.rh.CN85.dnsmgr.net) Received: from pdx.rh.CN85.dnsmgr.net (localhost [127.0.0.1]) by pdx.rh.CN85.dnsmgr.net (8.13.3/8.13.3) with ESMTP id v2JFkQ13060300; Sun, 19 Mar 2017 08:46:26 -0700 (PDT) (envelope-from freebsd@pdx.rh.CN85.dnsmgr.net) Received: (from freebsd@localhost) by pdx.rh.CN85.dnsmgr.net (8.13.3/8.13.3/Submit) id v2JFkQOh060299; Sun, 19 Mar 2017 08:46:26 -0700 (PDT) (envelope-from freebsd) From: "Rodney W. Grimes" Message-Id: <201703191546.v2JFkQOh060299@pdx.rh.CN85.dnsmgr.net> Subject: Re: svn commit: r315522 - in head: contrib/binutils/ld/emulparams sys/conf In-Reply-To: <20170319123107.W994@besplex.bde.org> To: Bruce Evans Date: Sun, 19 Mar 2017 08:46:26 -0700 (PDT) CC: Ed Maste , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Reply-To: rgrimes@freebsd.org X-Mailer: ELM [version 2.4ME+ PL121h (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Mar 2017 15:46:35 -0000 > On Sun, 19 Mar 2017, Ed Maste wrote: > > > Log: > > use INT3 instead of NOP for x86 binary padding > > > > We should never end up executing the inter-function padding, so we > > are better off faulting than silently carrying on to whatever function > > happens to be next. > > > > Note that LLD will soon do this by default (although it currently pads > > with zeros). > > > > Reviewed by: dim, kib > > MFC after: 1 month > > Sponsored by: The FreeBSD Foundation > > Differential Revision: https://reviews.freebsd.org/D10047 > > Is this a pessimization? Instruction prefetch near the end of almost > every function now fetches INT3 instead of NOP. Both have to be > decoded to decoded whether to speculatively execute them. INT3 is > unlikely to be speculatively executed, but it takes extra work to > decide not to do so. > > Functions normally end with a RET or unconditional JMP, and then branch > prediction usually prevents speculative execution beyond the end, so the > pessimization must be small. > > Intra-function padding that is executed now uses "fat NOP" instructions > like null LEA's since this is faster to execute than a long string of > NOPs. This is less readable than NOPs or even INT3's. Of course, INT3 > can't be used for executed padding. I think it is also used for intra- > function padding that is not executed. This is just harder to read > unless it is needed to avoid the possible pessimization in this commit. > The intra-function code with nops might look like: > > jmp over > nop > # 7 nops altogether > nop > over: > > or > > jmp over > nullpad7 # single 7 byte null padding instruction > over: > > and it is likely to be CPU-dependent whether 7 possibly-speculatively > executed nops take more or less resources than 1 possibly-speculatively > executed fancy instruction. I would expect the fancy instructions to > take more resources each. > > Fancy LEAs don't seem such a good choice for executed padding either. > amd64 uses lots of REX prefixes instead of fancy instructions, since > these are designed to have low overheads. They certainly aren't > executed separately. On i386, the same technique with lots of older > prefixes is not used much, probably because all prefixes have high > overheads on old i386's. They can be as slow as NOPs although they > aren't executed separately. As an intermediate ground what about using N of something really easy for the decoder/branch predictor to grovel over, then a single int3 at the end of the block so if we do fall into this we end up getting the desired effect? nop's followed by an > Bruce -- Rod Grimes rgrimes@freebsd.org