Date: Sun, 19 Mar 2017 08:46:26 -0700 (PDT) From: "Rodney W. Grimes" <freebsd@pdx.rh.CN85.dnsmgr.net> To: Bruce Evans <brde@optusnet.com.au> Cc: Ed Maste <emaste@freebsd.org>, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r315522 - in head: contrib/binutils/ld/emulparams sys/conf Message-ID: <201703191546.v2JFkQOh060299@pdx.rh.CN85.dnsmgr.net> In-Reply-To: <20170319123107.W994@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
> On Sun, 19 Mar 2017, Ed Maste wrote: > > > Log: > > use INT3 instead of NOP for x86 binary padding > > > > We should never end up executing the inter-function padding, so we > > are better off faulting than silently carrying on to whatever function > > happens to be next. > > > > Note that LLD will soon do this by default (although it currently pads > > with zeros). > > > > Reviewed by: dim, kib > > MFC after: 1 month > > Sponsored by: The FreeBSD Foundation > > Differential Revision: https://reviews.freebsd.org/D10047 > > Is this a pessimization? Instruction prefetch near the end of almost > every function now fetches INT3 instead of NOP. Both have to be > decoded to decoded whether to speculatively execute them. INT3 is > unlikely to be speculatively executed, but it takes extra work to > decide not to do so. > > Functions normally end with a RET or unconditional JMP, and then branch > prediction usually prevents speculative execution beyond the end, so the > pessimization must be small. > > Intra-function padding that is executed now uses "fat NOP" instructions > like null LEA's since this is faster to execute than a long string of > NOPs. This is less readable than NOPs or even INT3's. Of course, INT3 > can't be used for executed padding. I think it is also used for intra- > function padding that is not executed. This is just harder to read > unless it is needed to avoid the possible pessimization in this commit. > The intra-function code with nops might look like: > > jmp over > nop > # 7 nops altogether > nop > over: > > or > > jmp over > nullpad7 # single 7 byte null padding instruction > over: > > and it is likely to be CPU-dependent whether 7 possibly-speculatively > executed nops take more or less resources than 1 possibly-speculatively > executed fancy instruction. I would expect the fancy instructions to > take more resources each. > > Fancy LEAs don't seem such a good choice for executed padding either. > amd64 uses lots of REX prefixes instead of fancy instructions, since > these are designed to have low overheads. They certainly aren't > executed separately. On i386, the same technique with lots of older > prefixes is not used much, probably because all prefixes have high > overheads on old i386's. They can be as slow as NOPs although they > aren't executed separately. As an intermediate ground what about using N of something really easy for the decoder/branch predictor to grovel over, then a single int3 at the end of the block so if we do fall into this we end up getting the desired effect? nop's followed by an > Bruce -- Rod Grimes rgrimes@freebsd.org
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201703191546.v2JFkQOh060299>