From owner-freebsd-bugs Mon Aug 12 7:10:23 2002 Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3F76337B401 for ; Mon, 12 Aug 2002 07:10:06 -0700 (PDT) Received: from freefall.freebsd.org (freefall.FreeBSD.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id DC4F043E3B for ; Mon, 12 Aug 2002 07:10:05 -0700 (PDT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.12.4/8.12.4) with ESMTP id g7CEA5JU035720 for ; Mon, 12 Aug 2002 07:10:05 -0700 (PDT) (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.12.4/8.12.4/Submit) id g7CEA5eM035719; Mon, 12 Aug 2002 07:10:05 -0700 (PDT) Date: Mon, 12 Aug 2002 07:10:05 -0700 (PDT) Message-Id: <200208121410.g7CEA5eM035719@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org Cc: From: Bruce Evans Subject: Re: i386/41528: better stack alignment patch for lib/csu/i386-elf/ Reply-To: Bruce Evans Sender: owner-freebsd-bugs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org The following reply was made to PR i386/41528; it has been noted by GNATS. From: Bruce Evans To: NIIMI Satoshi Cc: FreeBSD-gnats-submit@FreeBSD.ORG Subject: Re: i386/41528: better stack alignment patch for lib/csu/i386-elf/ Date: Tue, 13 Aug 2002 00:12:56 +1000 (EST) On Sun, 11 Aug 2002, NIIMI Satoshi wrote: > >Description: > > Although system C compiler, GCC, maintains stack pointer to keep > aligned to 2**preferred-stack-boundary byte, C startup routine does > not care about this. > > This causes a big performance penalty for floating point operations > with variables in stack frame because IA32 CPUs are optimized to > operate with aligned data. I think stack alignment belongs only in functions that actually store floating point or other data that actually needs large alignment, and it should be done by the compiler since only the compiler knows the alignment requirements. gcc on i386's requires the stack to be aligned 4 bytes below a 2^ boundary on entry to each function, but this is compiler-dependent and pessimal IMO. > >Fix: > > (gcc 3.1 masks %esp by himself, so this patch might not be required > for -current.) Right; it isen't needed for -current. gcc-3.1 treats main() specially and aligns the stack using a single andl instruction: --- $ cat z.c main() { foo(); } # -mpreferred-stack-boundary=4 is to ensure the standard default. $ cc -O -S z.c -mpreferred-stack-boundary=4 $ cat z.s .file "z.c" .text .p2align 2,,3 .globl main .type main,@function main: pushl %ebp movl %esp, %ebp subl $8, %esp <-- waste time andl $-16, %esp <-- align call foo leave ret .Lfe1: .size main,.Lfe1-main .ident "GCC: (GNU) 3.1 [FreeBSD] 20020509 (prerelease)" --- gcc only seems to do the andl for main(). It still does the old stack alignment for main() and apparently depends on all other functions doing it. I still use my old kernel hack for alignment: %%% Index: kern_exec.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_exec.c,v retrieving revision 1.179 diff -u -2 -r1.179 kern_exec.c --- kern_exec.c 1 Aug 2002 14:31:58 -0000 1.179 +++ kern_exec.c 2 Aug 2002 13:53:14 -0000 @@ -846,4 +843,13 @@ /* + * Align stack to a multiple of 0x20. + * XXX vectp has the wrong type; we usually want a vm_offset_t; + * the suword() family takes a void *, but should take a vm_offset_t. + * XXX should align stack for signals too. + * XXX should do this more machine/compiler-independently. + */ + vectp = (char **)(((vm_offset_t)vectp & ~(vm_offset_t)0x1F) - 4); + + /* * vectp also becomes our initial stack base */ %%% This is a wrong place to do alignment (see above). Alignment to a nice power of 2 here would be reasonable, but subtracting 4 isn't. 4 is magic, and the correct number depends on the language and the compiler that the application being exec'ed was compiled with, so it shoulnd't be given by a kernel arch-dependent #define. Fortunately, the problem is now fixed for the most important case of language == c && compiler == cc-current. However, there is still a problem for signal handlers. The i386 signal trampoline is: %%% NON_GPROF_ENTRY(sigcode) call *SIGF_HANDLER(%esp) /* call signal handler */ %%% This pushes one 32-bit word (for the return address) on top of the carelessly aligned argument words. To work properly for cc-current, the stack still needs to be 16-byte aligned before the call. Fortunately, this bug is mostly mott because floating point in signal handlers is not useful and doesn't work. > --- diff begins here --- > Index: stable/lib/csu/i386-elf/crt1.c > =================================================================== > RCS file: /home/ncvs/src/lib/csu/i386-elf/crt1.c,v > retrieving revision 1.4.2.1 > diff -u -r1.4.2.1 crt1.c > --- stable/lib/csu/i386-elf/crt1.c 30 Oct 2000 20:32:24 -0000 1.4.2.1 > +++ stable/lib/csu/i386-elf/crt1.c 10 Aug 2002 19:40:54 -0000 > @@ -93,7 +93,33 @@ > monstartup(&eprol, &etext); > #endif > _init(); > +#if 0 > exit( main(argc, argv, env) ); > +#else > + /* > + * GCC expects stack frame to be aligned like following figure. > + * > + * +--------------+ > + * |%ebp (if any) | > + * +--------------+ > + * |return address| > + * +--------------+ --- aligned by PREFERRED_STACK_BOUNDARY > + * | arguments | > + * | : | > + * | : | > + */ > + __asm__ (" > + subl $12, %%esp # create stack frame for arguments > + andl $~0xf, %%esp # align stack to 16-byte boundary > + movl %0, 0(%%esp) > + movl %1, 4(%%esp) > + movl %2, 8(%%esp) > + call main > + movl %%eax, 0(%%esp) > + call exit > + hlt # do not return > + " : : "r"(argc), "r"(argv), "r"(env)); > +#endif > } > ... I would only use this fix or one like it in RELENG_4. Maybe my kernel hack is better since it "fixes" most applications without a recompile. It is simpler because it doesn't use any assembly code or have to recover from the kernel pushing the args in a misaligned place. Minor improvements: - remove the comment about %ebp since the caller doesn't handle %ebp - remove the hlt since it wouldn't really help if it were reached (it is privileged so it just generates SIGBUS, and execution continues at the instruction after the hlt if there is a SIGBUS handler and it returns...). Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message