Date: Wed, 30 Jun 2010 11:15:23 -0700 From: Marcel Moolenaar <xcllnt@mac.com> To: Bruce Evans <brde@optusnet.com.au> Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, Marcel Moolenaar <marcel@freebsd.org>, src-committers@freebsd.org Subject: Re: svn commit: r209604 - head/lib/libc/gmon Message-ID: <79DD181D-3885-45F5-9E9D-753553D19891@mac.com> In-Reply-To: <20100630184517.B51465@delplex.bde.org> References: <201006300140.o5U1eQVG097566@svn.freebsd.org> <20100630184517.B51465@delplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Jun 30, 2010, at 2:40 AM, Bruce Evans wrote: > On Wed, 30 Jun 2010, Marcel Moolenaar wrote: > >> Log: >> On powerpc, calculate s_scale using the non-FP version previously >> specific to hp300. Since FreeBSD does not support hp300, hp300 has >> been removed from the condition altogether. > > Also, the style of the condition has been regressed from ifndef to > if !defined(). That's on purpose. Either we eliminate the whole conditional or we end up adding other platforms to it. >> The FP version broke profiling on powerpc due to invalid results. >> Casting to double instead of float resolved the issue, but with >> Book-E not having a FP unit, the non-FP version looked preferrable. >> Note that even on AIM hardware the FP version yielded an invalid >> value for s_scale, so the problem is most likely with the compiler >> or with the expression itself. > > Better use the integer code unconditionally if it works. I tested it on amd64 and it works. My initial thought was to remove the conditional entirely and always use the non-FP version, but I changed my mind and instead went with a targeted change only. The feedback would then guide me to the right implementation. I believe that the non-FP works on all platforms. At least ARM also has a problem with the FP version. > There are > minor advantages to never using FP in a program (it saves switching > FP state in signal handlers on some arches, where the switch can involve > up to 7 or 9 memory accesses to several hundred bytes of state each), > but using FP in monstartup() causes every profiled program to use FP > for most of its life. This is despite gprof using FP extensivly, so > that FP needs to work for profiling to work. *nod* >> Modified: head/lib/libc/gmon/gmon.c >> ============================================================================== >> --- head/lib/libc/gmon/gmon.c Wed Jun 30 01:10:08 2010 (r209603) >> +++ head/lib/libc/gmon/gmon.c Wed Jun 30 01:40:25 2010 (r209604) >> @@ -111,7 +111,7 @@ monstartup(lowpc, highpc) >> >> o = p->highpc - p->lowpc; >> if (p->kcountsize < o) { >> -#ifndef hp300 >> +#if !defined(__powerpc__) > > The style regression. > >> s_scale = ((float)p->kcountsize / o ) * SCALE_1_TO_1; > > I can't see any bugs in this expression. p->kcountsize is < o, and the > scale factor is only 65536, so there should be no problems with overflow. This leaves GCC a the problem. > Using float instead of double is a almost pointless since the expression > will be evaluated in double precision on most machines. powerpc claims > that FLT_EVAL_METHOD is 1, so powerpc is one of these machines, so > changing from float to double should have an especially small effect > on it. The effect is enormous actually. Casting to float yields the wrong result. Casting to double yields the right result. This is on both FP-capable PowerPC CPUs as well has non-FP PowerPC CPUs. > So the bug is apparently in powerpc's conversion from u_long > to float, or in promotion to double. I guess it is in conversion from > u_long to float. Possible. Then again, I see the same histogram problems on ARM that I saw on PowerPC, so this doesn't look like a PowerPC-specific bug. > Better still, rewrite the integer method using a C99 type, so that it > is as simple as the FP method: > > s_scale = ((uintmax_t)p->kcountsize << SCALE_SHIFT) / o; > > C99 uintmax_t now guarantees uintmax_t to have >= 64 bits, and practical > considerations guarantee p->kcountsize to fit in many fewer than 48 bits > even on 64-bit arches, so the shift cannot overflow. I like this. What about the following (white-space corrupted) simplification: Index: gmon.c =================================================================== --- gmon.c (revision 209604) +++ gmon.c (working copy) @@ -110,24 +110,9 @@ p->tos[0].link = 0; o = p->highpc - p->lowpc; - if (p->kcountsize < o) { -#if !defined(__powerpc__) - s_scale = ((float)p->kcountsize / o ) * SCALE_1_TO_1; -#else /* avoid floating point */ - int quot = o / p->kcountsize; + s_scale = (p->kcountsize < o) ? + ((uintmax_t)p->kcountsize << SCALE_1_TO_1) / o : SCALE_1_TO_1; - if (quot >= 0x10000) - s_scale = 1; - else if (quot >= 0x100) - s_scale = 0x10000 / quot; - else if (o >= 0x800000) - s_scale = 0x1000000 / (o / (p->kcountsize >> 8)); - else - s_scale = 0x1000000 / ((o << 8) / p->kcountsize); -#endif - } else - s_scale = SCALE_1_TO_1; - moncontrol(1); } -- Marcel Moolenaar xcllnt@mac.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?79DD181D-3885-45F5-9E9D-753553D19891>