Date: Tue, 28 Mar 2017 02:03:24 -0600 From: Warner Losh <imp@bsdimp.com> To: Bruce Evans <brde@optusnet.com.au> Cc: Warner Losh <imp@freebsd.org>, src-committers <src-committers@freebsd.org>, "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>, "svn-src-head@freebsd.org" <svn-src-head@freebsd.org> Subject: Re: svn commit: r316064 - head/sys/boot/i386/boot2 Message-ID: <CANCZdfrJNrk3h-vGzBcWkguWM3VimasYajxFgyOA9okPe5Ghrw@mail.gmail.com> In-Reply-To: <20170328141213.T927@besplex.bde.org> References: <201703272253.v2RMra2L032487@repo.freebsd.org> <20170328141213.T927@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
[[ sorry for the top post but it's quick ]] Summary, in Bruce's own words > the whole optimization step was silly. I agree. on my -current system, clang compiled boot2 was 4 bytes smaller after ripping it out. gcc was a whopping 7 bytes larger. Since gcc has 156 still free, I think it's best to just retire this. Maybe you can give me a hint as to which structs to look at to get back those 7 bytes :) Thanks for the writeup. It was most enlightening. Warner On Mon, Mar 27, 2017 at 11:53 PM, Bruce Evans <brde@optusnet.com.au> wrote: > On Mon, 27 Mar 2017, Warner Losh wrote: > >> Log: >> Fix build with path names with 'align' or 'nop' in them. >> >> clang is now inserting .file directives with the entire path in >> them. This is fine, except that our sed peephole optimizer removes >> them if ${SRCTOP} or ${OBJTOP} contains 'align' or 'nop', leading to >> build failures. The sed peephole optimizer removes useful things for >> boot2 when used with clang, so restrict its use to gcc. Also, gcc no >> longer generates nops to pad things, so there's no point in removing >> it. Specialize the optimization to just removing the .align 4 lines to >> preclude inadvertant path matching. >> >> Sponsored by: Netflix >> Commit brought to you the path: /home/xxx/NCD-3592-logsynopts/FreeBSD >> >> Modified: >> head/sys/boot/i386/boot2/Makefile >> >> Modified: head/sys/boot/i386/boot2/Makefile >> >> ============================================================================== >> --- head/sys/boot/i386/boot2/Makefile Mon Mar 27 22:34:43 2017 >> (r316063) >> +++ head/sys/boot/i386/boot2/Makefile Mon Mar 27 22:53:36 2017 >> (r316064) >> @@ -91,10 +91,18 @@ boot2.o: boot2.s >> >> SRCS= boot2.c boot2.h >> >> +# Gcc (4.2.1 at least) benefits from removing the forced alignment >> +# clang doesn't. Make the removal as specific as possible to avoid >> +# false positives (like path names with odd names for debugging info). >> +# In the past, gcc benefited from nop removal, but not in 4.2.1. >> +# Think of this as a poor-man's peephole optimizer for gcc 4.2.1 >> boot2.s: boot2.c boot2.h ${.CURDIR}/../../common/ufsread.c >> ${CC} ${CFLAGS} -S -o boot2.s.tmp ${.CURDIR}/boot2.c >> - sed -e '/align/d' -e '/nop/d' < boot2.s.tmp > boot2.s >> - rm -f boot2.s.tmp >> +.if ${COMPILER_TYPE} == "gcc" >> + sed -e '/\.align 4/d' < boot2.s.tmp > boot2.s >> +.else >> + cp boot2.s.tmp boot2.s >> +.endif >> >> boot2.h: boot1.out >> ${NM} -t d ${.ALLSRC} | awk '/([0-9])+ T xread/ \ > > > That's silly; if you just fix the regexps to actually be as specifice as > possible, then they would just work with no need for large comments or > ifdefs. '\.align' instead of 'align' is a good start. ' 4' instead of > nothing is not so good. There should also be a check for nothing > except whitespace before '\.align'. The whitespace before '4' might > be a tab, but perhaps gcc only generates a space, and only generates > unwanted alignments of precisely 4, with no fill bytes. The general > syntax would be too hard to parse, but it is easy to handle any digit, > and also '\.p2align', and allow nothing except whitespace after the > digit. > > Last time I looked at this about 5 years ago, but after clang, the > whole optimization step was silly. It made little difference for gcc > and no difference for clang as you say. The nop removal that you > removed was silliest. I think compilers stopped generating nops for > padding many years before that code was written. In text sections, > They should generate something like '.p2align x,y,z', where x is the > alignment, y is empty (it is the fill byte, which defaults to nop, but > we don't want that but want a larger null instruction for multiple > bytes) and z is a limit on the amount of padding (x is only a > preference and z limits it). This syntax would be hard to parse. > But CFLAGS has -Os, and that works for preventing padding in the > text section, so nop is never generated, not even with the spelling > 'align'. > > -Os doesn't work so well for other things. For kernels, it is completely > broken with gcc-4.2.1 unless -fno-inline-functions is also used (it > breaks the inlining parameters whose misconfiguration is probably the > actual bug. This doesn't seem to be a problem for boot code. IIRC, > the only simply broken thing is that alignment is still done for variables. > This generates ".align 4". > > -Os is differently broken for clang, so the simple editing doesn't help. > -Wa,-al to see what the assembler is doing is broken for clang. > -mpreferred-stack-boundary doesn't work for clang, so is not used, but > clang does better stack alignment by default. This might still waste > some instructions, but not for every function. -mno-align-long-strings > doesn't work for clang or for gcc > 4.2.1, so is not used. clang does > quite different string padding by default, and IIRC it is worse. But > boot2 hardly has any long strings. > > boot code is hard to test since it is not modular. This is much more broken > than when I last tested the size of boot2.o. The most obvious new bugs are: > - something like SRCTOP breaks finding the relative path to ufs headers in > a FreeBSD-11 sys tree > - ${CLANG_OPT_SMALL} is added for all compilers. This is defined > out-of-tree > in /usr/share/mk/bsd.sys.mk. (I tested on a host running nearly -current > with a nearly FreeBSD-10 sys tree). In boot2, CLANG_P{T_SMALL is > wrong even for clang since it contains -mstack-alignment=8 > unconditionally, but the stack alignment should be 4 or even 1 with > -m32 -Os. Fixing this makes no difference for boot2.o compiled by > clang. The corresponding -mpreferred-stack-boundary=2 for gcc is > configured correctly in boot2/Makefile. > > After fixing this, simply "CC=gcc-4.2.1 make boot2.o". Adding -Wa,-al > to CC also works (-Wa,-al is broken for clang, and clang produces a boot2.s > which cannot be assembled by gas). > > The file sizes for FreeBSD-10 are: > > text data bss dec hex filename > 5126 12 4940 10078 0x275e clang/boot2-unhacked.o > 5126 9 4940 10075 0x275b clang/boot2.o > 5036 32 4928 9996 0x270c gcc/boot-unhacked.o > 5032 29 4928 9989 0x2705 gcc/boot2.o > > In other words, the editing saves a whole 4 bytes of text and 3 bytes of > data for gcc. It also saves a whole 3 bytes of data for clang, and you > just broke this :-). (I would be surprised if the data isn't padded back > to a multiple of 4 anyway.) > > After the reductions, the result with gcc is 86 bytes smaller than with > clang, instead of only 82 bytes smaller. > > It is silly doing such small optimizations specially. > > The editing actually removes 22 alignment statements, all spelled > "^<tab>\.align<space>4$'. All of these are for data. -Wa,-al shows > that the padding is 2 instances of 2 bytes and 1 of 3 bytes. At > least half of the padding is for pessimal layout which is also a > style bug in the source code. Arrays of characters are mixed with > 32-bit variables. Style(9) says to sort on size (it means on alignment/ > element size), to minimize padding (mainly for structs) and get a conistent > order as a side effect. The padding usually doesn't matter except for > structs. > > After fixing the source code, there would be just 1 odd byte in a > character array that normally causes 3 bytes of padding after it. It > takes considerably more magic to avoid getting this padding anyway in > the link step, In the old a.out boot blocks, the linker forces 16-byte > alignment at the end of every object file. The old boot blocks put > most of the data in a single object file to avoid an aerage of 8 bytes > of padding between files. Elf is more flexible, and usually gives > 4-byte alignment. I think it is possible to match an odd byte at the > end of 1 object file with an odd 3 bytes at the beginning of the next > object file, but this usually doesn't happen. The new boot2 uses > hacks like including ufsread.c to avoid having lots of object files. > > I use the old a.out boot blocks updated to elf and EDD and have squeezed > them more extensively. The squeezing is also silly since I could very > easily switch to 64K ones (much larger than that is not easy, since > they have to fit below 640K and a few multiples of 64K are already > used for buffers). The limit on 8K is mainly a historical mistake. > A limit of 7.5K simplified booting from 15-sector floppies. 18-sector > floppies allowed easy expansion to 9K, but were unportable for a small > gain. The default partition layout left only 8K below the ffs > superblock, but was only a default. > > Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfrJNrk3h-vGzBcWkguWM3VimasYajxFgyOA9okPe5Ghrw>