From owner-svn-src-head@freebsd.org Tue Mar 28 08:03:26 2017 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 347ADD1B517 for ; Tue, 28 Mar 2017 08:03:26 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-it0-x236.google.com (mail-it0-x236.google.com [IPv6:2607:f8b0:4001:c0b::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id F1552837 for ; Tue, 28 Mar 2017 08:03:25 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-it0-x236.google.com with SMTP id y18so103364086itc.0 for ; Tue, 28 Mar 2017 01:03:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=qLaoogz3ViyPEwZvy0vyWWD/NoxLIkIZb9xyGOMUUeo=; b=hpRsdBwJ4WLkx9piXzH/W6MmbiTOjPP/Sh3WWFNf88oecKjIjhraH82zlxTiTiM8kf Qeub/yf8nfUIAdbAMHvQ2GhPquWoLtYerfwFrCNSoU+aUrc45QN/9Rlt3Rt3tTPO/PTg oG+45Absa7q9d9sskEOBPGiJhh46uc04nas3EhHxg30L3A/lyALtyO5zfS7/qamVdGtY YkYV+7mLWc5sq7ij4/kLWN3vqC/aSfsW6ArFDKdgf7ifpeet7DsqOsaIn4lEsaVxKrM0 T4HUCHou9DyJW76rgibU6b44LvXPXnwhIbc0bjyMAPRmccxW4Dbeudg9qN89GUorIGSg 5lBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=qLaoogz3ViyPEwZvy0vyWWD/NoxLIkIZb9xyGOMUUeo=; b=QVHdJArVbKXg6zDt5evc0w2mTIc6CN9vuvUSzobqIF7l07mb1CuXalmmMtYQYRC4p3 Qp2rJdHj1Tbl6ay83cQBjsZ2zyvSYUUpMdWXO4FJWLkqaqdZ20qyP1fVRADIaEATyx7h ajJkbDSHYQMrfn8oR3gz/c5aeqnVwoYEc37y/MvMO7k+fdzdzVvxACMGqk52fscWiUBT Yqul3g5MoP+we2chkSWF0ZCR9WTp70lq9HDHHYqAQ2tkyJgqOygin+3TuZxCZlWv+sOp nPGva0jdlX6PZ++lyi/76UeKiHVG1XVqxP8EsZl1yF+Xsj67p1aD+JatO8hcfWJQX5eM cfVg== X-Gm-Message-State: AFeK/H3CLAh7CqZYuCKshy91hQ6UPa4jX5EFpImaaxyTqyvUTak7ZKVo7Hfo+uklUhSUG4/lbR+ulE919oe6ig== X-Received: by 10.107.198.193 with SMTP id w184mr25021627iof.19.1490688205054; Tue, 28 Mar 2017 01:03:25 -0700 (PDT) MIME-Version: 1.0 Sender: wlosh@bsdimp.com Received: by 10.79.146.24 with HTTP; Tue, 28 Mar 2017 01:03:24 -0700 (PDT) X-Originating-IP: [2607:fb10:7021:1::504b] In-Reply-To: <20170328141213.T927@besplex.bde.org> References: <201703272253.v2RMra2L032487@repo.freebsd.org> <20170328141213.T927@besplex.bde.org> From: Warner Losh Date: Tue, 28 Mar 2017 02:03:24 -0600 X-Google-Sender-Auth: WFPgbvo_-2X8Yd8un2mIhLMAMG4 Message-ID: Subject: Re: svn commit: r316064 - head/sys/boot/i386/boot2 To: Bruce Evans Cc: Warner Losh , src-committers , "svn-src-all@freebsd.org" , "svn-src-head@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Mar 2017 08:03:26 -0000 [[ sorry for the top post but it's quick ]] Summary, in Bruce's own words > the whole optimization step was silly. I agree. on my -current system, clang compiled boot2 was 4 bytes smaller after ripping it out. gcc was a whopping 7 bytes larger. Since gcc has 156 still free, I think it's best to just retire this. Maybe you can give me a hint as to which structs to look at to get back those 7 bytes :) Thanks for the writeup. It was most enlightening. Warner On Mon, Mar 27, 2017 at 11:53 PM, Bruce Evans wrote: > On Mon, 27 Mar 2017, Warner Losh wrote: > >> Log: >> Fix build with path names with 'align' or 'nop' in them. >> >> clang is now inserting .file directives with the entire path in >> them. This is fine, except that our sed peephole optimizer removes >> them if ${SRCTOP} or ${OBJTOP} contains 'align' or 'nop', leading to >> build failures. The sed peephole optimizer removes useful things for >> boot2 when used with clang, so restrict its use to gcc. Also, gcc no >> longer generates nops to pad things, so there's no point in removing >> it. Specialize the optimization to just removing the .align 4 lines to >> preclude inadvertant path matching. >> >> Sponsored by: Netflix >> Commit brought to you the path: /home/xxx/NCD-3592-logsynopts/FreeBSD >> >> Modified: >> head/sys/boot/i386/boot2/Makefile >> >> Modified: head/sys/boot/i386/boot2/Makefile >> >> ============================================================================== >> --- head/sys/boot/i386/boot2/Makefile Mon Mar 27 22:34:43 2017 >> (r316063) >> +++ head/sys/boot/i386/boot2/Makefile Mon Mar 27 22:53:36 2017 >> (r316064) >> @@ -91,10 +91,18 @@ boot2.o: boot2.s >> >> SRCS= boot2.c boot2.h >> >> +# Gcc (4.2.1 at least) benefits from removing the forced alignment >> +# clang doesn't. Make the removal as specific as possible to avoid >> +# false positives (like path names with odd names for debugging info). >> +# In the past, gcc benefited from nop removal, but not in 4.2.1. >> +# Think of this as a poor-man's peephole optimizer for gcc 4.2.1 >> boot2.s: boot2.c boot2.h ${.CURDIR}/../../common/ufsread.c >> ${CC} ${CFLAGS} -S -o boot2.s.tmp ${.CURDIR}/boot2.c >> - sed -e '/align/d' -e '/nop/d' < boot2.s.tmp > boot2.s >> - rm -f boot2.s.tmp >> +.if ${COMPILER_TYPE} == "gcc" >> + sed -e '/\.align 4/d' < boot2.s.tmp > boot2.s >> +.else >> + cp boot2.s.tmp boot2.s >> +.endif >> >> boot2.h: boot1.out >> ${NM} -t d ${.ALLSRC} | awk '/([0-9])+ T xread/ \ > > > That's silly; if you just fix the regexps to actually be as specifice as > possible, then they would just work with no need for large comments or > ifdefs. '\.align' instead of 'align' is a good start. ' 4' instead of > nothing is not so good. There should also be a check for nothing > except whitespace before '\.align'. The whitespace before '4' might > be a tab, but perhaps gcc only generates a space, and only generates > unwanted alignments of precisely 4, with no fill bytes. The general > syntax would be too hard to parse, but it is easy to handle any digit, > and also '\.p2align', and allow nothing except whitespace after the > digit. > > Last time I looked at this about 5 years ago, but after clang, the > whole optimization step was silly. It made little difference for gcc > and no difference for clang as you say. The nop removal that you > removed was silliest. I think compilers stopped generating nops for > padding many years before that code was written. In text sections, > They should generate something like '.p2align x,y,z', where x is the > alignment, y is empty (it is the fill byte, which defaults to nop, but > we don't want that but want a larger null instruction for multiple > bytes) and z is a limit on the amount of padding (x is only a > preference and z limits it). This syntax would be hard to parse. > But CFLAGS has -Os, and that works for preventing padding in the > text section, so nop is never generated, not even with the spelling > 'align'. > > -Os doesn't work so well for other things. For kernels, it is completely > broken with gcc-4.2.1 unless -fno-inline-functions is also used (it > breaks the inlining parameters whose misconfiguration is probably the > actual bug. This doesn't seem to be a problem for boot code. IIRC, > the only simply broken thing is that alignment is still done for variables. > This generates ".align 4". > > -Os is differently broken for clang, so the simple editing doesn't help. > -Wa,-al to see what the assembler is doing is broken for clang. > -mpreferred-stack-boundary doesn't work for clang, so is not used, but > clang does better stack alignment by default. This might still waste > some instructions, but not for every function. -mno-align-long-strings > doesn't work for clang or for gcc > 4.2.1, so is not used. clang does > quite different string padding by default, and IIRC it is worse. But > boot2 hardly has any long strings. > > boot code is hard to test since it is not modular. This is much more broken > than when I last tested the size of boot2.o. The most obvious new bugs are: > - something like SRCTOP breaks finding the relative path to ufs headers in > a FreeBSD-11 sys tree > - ${CLANG_OPT_SMALL} is added for all compilers. This is defined > out-of-tree > in /usr/share/mk/bsd.sys.mk. (I tested on a host running nearly -current > with a nearly FreeBSD-10 sys tree). In boot2, CLANG_P{T_SMALL is > wrong even for clang since it contains -mstack-alignment=8 > unconditionally, but the stack alignment should be 4 or even 1 with > -m32 -Os. Fixing this makes no difference for boot2.o compiled by > clang. The corresponding -mpreferred-stack-boundary=2 for gcc is > configured correctly in boot2/Makefile. > > After fixing this, simply "CC=gcc-4.2.1 make boot2.o". Adding -Wa,-al > to CC also works (-Wa,-al is broken for clang, and clang produces a boot2.s > which cannot be assembled by gas). > > The file sizes for FreeBSD-10 are: > > text data bss dec hex filename > 5126 12 4940 10078 0x275e clang/boot2-unhacked.o > 5126 9 4940 10075 0x275b clang/boot2.o > 5036 32 4928 9996 0x270c gcc/boot-unhacked.o > 5032 29 4928 9989 0x2705 gcc/boot2.o > > In other words, the editing saves a whole 4 bytes of text and 3 bytes of > data for gcc. It also saves a whole 3 bytes of data for clang, and you > just broke this :-). (I would be surprised if the data isn't padded back > to a multiple of 4 anyway.) > > After the reductions, the result with gcc is 86 bytes smaller than with > clang, instead of only 82 bytes smaller. > > It is silly doing such small optimizations specially. > > The editing actually removes 22 alignment statements, all spelled > "^\.align4$'. All of these are for data. -Wa,-al shows > that the padding is 2 instances of 2 bytes and 1 of 3 bytes. At > least half of the padding is for pessimal layout which is also a > style bug in the source code. Arrays of characters are mixed with > 32-bit variables. Style(9) says to sort on size (it means on alignment/ > element size), to minimize padding (mainly for structs) and get a conistent > order as a side effect. The padding usually doesn't matter except for > structs. > > After fixing the source code, there would be just 1 odd byte in a > character array that normally causes 3 bytes of padding after it. It > takes considerably more magic to avoid getting this padding anyway in > the link step, In the old a.out boot blocks, the linker forces 16-byte > alignment at the end of every object file. The old boot blocks put > most of the data in a single object file to avoid an aerage of 8 bytes > of padding between files. Elf is more flexible, and usually gives > 4-byte alignment. I think it is possible to match an odd byte at the > end of 1 object file with an odd 3 bytes at the beginning of the next > object file, but this usually doesn't happen. The new boot2 uses > hacks like including ufsread.c to avoid having lots of object files. > > I use the old a.out boot blocks updated to elf and EDD and have squeezed > them more extensively. The squeezing is also silly since I could very > easily switch to 64K ones (much larger than that is not easy, since > they have to fit below 640K and a few multiples of 64K are already > used for buffers). The limit on 8K is mainly a historical mistake. > A limit of 7.5K simplified booting from 15-sector floppies. 18-sector > floppies allowed easy expansion to 9K, but were unportable for a small > gain. The default partition layout left only 8K below the ffs > superblock, but was only a default. > > Bruce