Date: Sun, 19 Mar 2017 13:04:50 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Ed Maste <emaste@freebsd.org> Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r315522 - in head: contrib/binutils/ld/emulparams sys/conf Message-ID: <20170319123107.W994@besplex.bde.org> In-Reply-To: <201703190022.v2J0MDhq015941@repo.freebsd.org> References: <201703190022.v2J0MDhq015941@repo.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 19 Mar 2017, Ed Maste wrote: > Log: > use INT3 instead of NOP for x86 binary padding > > We should never end up executing the inter-function padding, so we > are better off faulting than silently carrying on to whatever function > happens to be next. > > Note that LLD will soon do this by default (although it currently pads > with zeros). > > Reviewed by: dim, kib > MFC after: 1 month > Sponsored by: The FreeBSD Foundation > Differential Revision: https://reviews.freebsd.org/D10047 Is this a pessimization? Instruction prefetch near the end of almost every function now fetches INT3 instead of NOP. Both have to be decoded to decoded whether to speculatively execute them. INT3 is unlikely to be speculatively executed, but it takes extra work to decide not to do so. Functions normally end with a RET or unconditional JMP, and then branch prediction usually prevents speculative execution beyond the end, so the pessimization must be small. Intra-function padding that is executed now uses "fat NOP" instructions like null LEA's since this is faster to execute than a long string of NOPs. This is less readable than NOPs or even INT3's. Of course, INT3 can't be used for executed padding. I think it is also used for intra- function padding that is not executed. This is just harder to read unless it is needed to avoid the possible pessimization in this commit. The intra-function code with nops might look like: jmp over nop # 7 nops altogether nop over: or jmp over nullpad7 # single 7 byte null padding instruction over: and it is likely to be CPU-dependent whether 7 possibly-speculatively executed nops take more or less resources than 1 possibly-speculatively executed fancy instruction. I would expect the fancy instructions to take more resources each. Fancy LEAs don't seem such a good choice for executed padding either. amd64 uses lots of REX prefixes instead of fancy instructions, since these are designed to have low overheads. They certainly aren't executed separately. On i386, the same technique with lots of older prefixes is not used much, probably because all prefixes have high overheads on old i386's. They can be as slow as NOPs although they aren't executed separately. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170319123107.W994>