Date: Sat, 24 Dec 2011 09:37:53 +0000 From: Alexander Best <arundel@freebsd.org> To: Bruce Evans <brde@optusnet.com.au> Cc: freebsd-current@freebsd.org, freebsd-arch@freebsd.org Subject: Re: [rfc] removing -mpreferred-stack-boundary=2 flag for i386? Message-ID: <20111224093753.GA12377@freebsd.org> In-Reply-To: <20111224160050.T1141@besplex.bde.org> References: <20111223235642.GA37495@freebsd.org> <20111224160050.T1141@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--Kj7319i9nmIyA2yE Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Sat Dec 24 11, Bruce Evans wrote: > On Fri, 23 Dec 2011, Alexander Best wrote: > > >is -mpreferred-stack-boundary=2 really necessary for i386 builds any > >longer? > >i built GENERIC (including modules) with and without that flag. the results > >are: > > The same as it has always been. It avoids some bloat. > > >1654496 bytes with the flag set > >vs. > >1654952 bytes with the flag unset > > I don't believe this. GENERIC is enormously bloated, so it has size > more like 16MB than 1.6MB. Even a savings of 4K instead of 456 bytes > is hard to believe. I get a savings of 9K (text) in a 5MB kernel. > Changing the default target arch from i386 to pentium-undocumented has > reduced the text space savings a little, since the default for passing > args is now to preallocate stack space for them and store to this, > instead of to push them; this preallocation results in more functions > needing to allocate some stack space explicitly, and when some is > allocated explicitly, the text space cost for this doesn't depend on > the size of the allocation. > > Anyway, the savings are mostly from from avoiding cache misses from > sparse allocation on stacks. > > Also, FreeBSD-i386 hasn't been programmed to support aligned stacks: > - KSTACK_PAGES on i386 is 2, while on amd64 it is 4. Using more > stack might push something over the edge > - not much care is taken to align the initial stack or to keep the > stack aligned in calls from asm code. E.g., any alignment for > mi_startup() (and thus proc0?) is accidental. This may result > in perfect alignment or perfect misalignment. Hopefully, more > care is taken with thread startup. For gcc, the alignment is > done bogusly in main() in userland, but there is no main() in > the kernel. The alignment doesn't matter much (provided the > perfect misalignment is still to a multiple of 4), but when it > matters, the random misalignment that results from not trying to > do it at all is better than perfect misalignment from getting it > wrong. With 4-byte alignment, the only cases that it helps are > with 64-bit variables. > > >the gcc(1) man page states the following: > > > >" > >This extra alignment does consume extra stack space, and generally > >increases code size. Code that is sensitive to stack space usage, > >such as embedded systems and operating system kernels, may want to > >reduce the preferred alignment to -mpreferred-stack-boundary=2. > >" > > > >the comment in sys/conf/kern.mk however sorta suggests that the default > >alignment of 4 bytes might improve performance. > > The default stack alignment is 16 bytes, which unimproves performance. maybe the part of the comment in sys/conf/kern.mk, which mentions that a stack alignment of 16 bytes might improve micro benchmark results should be removed. this would prevent people (like me) from thinking, using a stack alignment of 4 bytes is a compromise between size and efficiently. it isn't! currently a stack alignment of 16 bytes has no advantages towards one with 4 bytes on i386. so specifying -mpreferred-stack-boundary=2 on i386 is absolutely mandatory. please see the attached patch, which also introduduces a line break in order to describe the stack alignment issue in a paragraph of its own. cheers. alex > > clang handles stack alignment correctly (only does it when it is needed) > so it doesn't need a -mpreferred-stack-boundary option and doesn't > always break without alignment in main(). Well, at least it used to, > IIRC. Testing it now shows that it does the necessary andl of the > stack pointer for __aligned(32), but for __aligned(16) it now assumes > that the stack is aligned by the caller. So it now needs > -mpreferred-stack-boundary=2, but doesn't have it. OTOH, clang doesn't > do the andl in main() like gcc does (unless you put a dummy __aligned(32) > there), but requires crt to pass an aligned stack. > > Bruce --Kj7319i9nmIyA2yE Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="kern.mk.diff" Index: /usr/src/sys/conf/kern.mk =================================================================== --- /usr/src/sys/conf/kern.mk (revision 228845) +++ /usr/src/sys/conf/kern.mk (working copy) @@ -30,12 +30,12 @@ # On i386, do not align the stack to 16-byte boundaries. Otherwise GCC 2.95 # and above adds code to the entry and exit point of every function to align the # stack to 16-byte boundaries -- thus wasting approximately 12 bytes of stack -# per function call. While the 16-byte alignment may benefit micro benchmarks, -# it is probably an overall loss as it makes the code bigger (less efficient -# use of code cache tag lines) and uses more stack (less efficient use of data -# cache tag lines). Explicitly prohibit the use of FPU, SSE and other SIMD -# operations inside the kernel itself. These operations are exclusively -# reserved for user applications. +# per function call. This makes the code bigger (less efficient use of code +# cache tag lines) and uses more stack (less efficient use of data cache tag +# lines). +# Explicitly prohibit the use of FPU, SSE and other SIMD operations inside the +# kernel itself. These operations are exclusively reserved for user +# applications. # # gcc: # Setting -mno-mmx implies -mno-3dnow --Kj7319i9nmIyA2yE--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20111224093753.GA12377>