Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 16 Feb 2004 03:52:16 -0800
From:      Wes Peters <wes@softweyr.com>
To:        des@des.no (Dag-Erling =?iso-8859-1?q?Sm=F8rgrav?=), Alexandr Kovalenko <never@nevermind.kiev.ua>
Cc:        Juan Tumani <jtumani55@hotmail.com>
Subject:   Re: FreeBSD 5.2 v/s FreeBSD 4.9 MFLOPS performance (gcc3.3.3 v/s gcc2.9.5)
Message-ID:  <200402160352.16477.wes@softweyr.com>
In-Reply-To: <xzpvfm8yssm.fsf@dwp.des.no>
References:  <BAY12-F37zmBUw7MurD00010899@hotmail.com> <20040214082420.GB77411@nevermind.kiev.ua> <xzpvfm8yssm.fsf@dwp.des.no>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sunday 15 February 2004 12:46, Dag-Erling Sm=F8rgrav wrote:
> Alexandr Kovalenko <never@nevermind.kiev.ua> writes:
> > Could you please explain me this? Result is fully reproduceable. Please
> > note, that the only difference is the output file name. Even resulting
> > files match bit-to-bit. [...]
>
> Definitely some kind of alignment problem, but it only shows up at
> some optimization levels and not others.

I've tested the patch Dan mentioned before and the results were astonishing=
=2E =20
Running the flops.c 1.2 program in a loop, lengthening the environment stri=
ng=20
by one byte each time, I get 8 successive runs of fast, then 8 successive=20
runs of slow, where fast and slow vary between 650 and 990 mflops.  With th=
e=20
patch, the performance is always 990, within a few percent.

Should I commit this?

RCS file: /big/ncvs/src/sys/kern/kern_exec.c,v
retrieving revision 1.235
diff -u -w -r1.235 kern_exec.c
=2D-- kern_exec.c 28 Dec 2003 04:37:59 -0000      1.235
+++ kern_exec.c 11 Feb 2004 16:47:28 -0000
@@ -1014,6 +1014,15 @@
                 */
                vectp =3D (char **)(destp - (imgp->argc + imgp->envc + 2) *
                    sizeof(char *));
+=20
+       /*
+        * Align stack to a multiple of 0x20.
+        * XXX vectp has the wrong type; we usually want a vm_offset_t;
+        * the suword() family takes a void *, but should take a vm_offset_=
t.
+        * XXX should align stack for signals too.
+        * XXX should do this more machine/compiler-independently.
+        */
+       vectp =3D (char **)(((vm_offset_t)vectp & ~(vm_offset_t)0x1F) - 4);
=20
        /*
         * vectp also becomes our initial stack base


=2D-=20
         "Where am I, and what am I doing in this handbasket?"

Wes Peters                                                  Softweyr LLC
wes@softweyr.com                                    http://softweyr.com/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200402160352.16477.wes>