FreeBSD Mail Archives

Date:      Wed, 30 Jun 2010 11:15:23 -0700
From:      Marcel Moolenaar <xcllnt@mac.com>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, Marcel Moolenaar <marcel@freebsd.org>, src-committers@freebsd.org
Subject:   Re: svn commit: r209604 - head/lib/libc/gmon
Message-ID:  <79DD181D-3885-45F5-9E9D-753553D19891@mac.com>
In-Reply-To: <20100630184517.B51465@delplex.bde.org>
References:  <201006300140.o5U1eQVG097566@svn.freebsd.org> <20100630184517.B51465@delplex.bde.org>


On Jun 30, 2010, at 2:40 AM, Bruce Evans wrote:

> On Wed, 30 Jun 2010, Marcel Moolenaar wrote:
> 
>> Log:
>> On powerpc, calculate s_scale using the non-FP version previously
>> specific to hp300. Since FreeBSD does not support hp300, hp300 has
>> been removed from the condition altogether.
> 
> Also, the style of the condition has been regressed from ifndef to
> if !defined().

That's on purpose. Either we eliminate the whole conditional or we
end up adding other platforms to it.


>> The FP version broke profiling on powerpc due to invalid results.
>> Casting to double instead of float resolved the issue, but with
>> Book-E not having a FP unit, the non-FP version looked preferrable.
>> Note that even on AIM hardware the FP version yielded an invalid
>> value for s_scale, so the problem is most likely with the compiler
>> or with the expression itself.
> 
> Better use the integer code unconditionally if it works.

I tested it on amd64 and it works. My initial thought was to remove
the conditional entirely and always use the non-FP version, but I
changed my mind and instead went with a targeted change only. The
feedback would then guide me to the right implementation.

I believe that the non-FP works on all platforms.
At least ARM also has a problem with the FP version.


>  There are
> minor advantages to never using FP in a program (it saves switching
> FP state in signal handlers on some arches, where the switch can involve
> up to 7 or 9 memory accesses to several hundred bytes of state each),
> but using FP in monstartup() causes every profiled program to use FP
> for most of its life.  This is despite gprof using FP extensivly, so
> that FP needs to work for profiling to work.

*nod*

>> Modified: head/lib/libc/gmon/gmon.c
>> ==============================================================================
>> --- head/lib/libc/gmon/gmon.c	Wed Jun 30 01:10:08 2010	(r209603)
>> +++ head/lib/libc/gmon/gmon.c	Wed Jun 30 01:40:25 2010	(r209604)
>> @@ -111,7 +111,7 @@ monstartup(lowpc, highpc)
>> 
>> 	o = p->highpc - p->lowpc;
>> 	if (p->kcountsize < o) {
>> -#ifndef hp300
>> +#if !defined(__powerpc__)
> 
> The style regression.
> 
>> 		s_scale = ((float)p->kcountsize / o ) * SCALE_1_TO_1;
> 
> I can't see any bugs in this expression.  p->kcountsize is < o,  and the
> scale factor is only 65536, so there should be no problems with overflow.

This leaves GCC a the problem. 

> Using float instead of double is a almost pointless since the expression
> will be evaluated in double precision on most machines.  powerpc claims
> that FLT_EVAL_METHOD is 1, so powerpc is one of these machines, so
> changing from float to double should have an especially small effect
> on it.

The effect is enormous actually. Casting to float yields the wrong
result. Casting to double yields the right result. This is on both
FP-capable PowerPC CPUs as well has non-FP PowerPC CPUs.

>  So the bug is apparently in powerpc's conversion from u_long
> to float, or in promotion to double.  I guess it is in conversion from
> u_long to float.

Possible. Then again, I see the same histogram problems on ARM that
I saw on PowerPC, so this doesn't look like a PowerPC-specific bug.

> Better still, rewrite the integer method using a C99 type, so that it
> is as simple as the FP method:
> 
> 		s_scale = ((uintmax_t)p->kcountsize << SCALE_SHIFT) / o;
> 
> C99 uintmax_t now guarantees uintmax_t to have >= 64 bits, and practical
> considerations guarantee p->kcountsize to fit in many fewer than 48 bits
> even on 64-bit arches, so the shift cannot overflow.

I like this. What about the following (white-space corrupted)
simplification:

Index: gmon.c
===================================================================
--- gmon.c	(revision 209604)
+++ gmon.c	(working copy)
@@ -110,24 +110,9 @@
 	p->tos[0].link = 0;
 
 	o = p->highpc - p->lowpc;
-	if (p->kcountsize < o) {
-#if !defined(__powerpc__)
-		s_scale = ((float)p->kcountsize / o ) * SCALE_1_TO_1;
-#else /* avoid floating point */
-		int quot = o / p->kcountsize;
+	s_scale = (p->kcountsize < o) ?
+	    ((uintmax_t)p->kcountsize << SCALE_1_TO_1) / o : SCALE_1_TO_1;
 
-		if (quot >= 0x10000)
-			s_scale = 1;
-		else if (quot >= 0x100)
-			s_scale = 0x10000 / quot;
-		else if (o >= 0x800000)
-			s_scale = 0x1000000 / (o / (p->kcountsize >> 8));
-		else
-			s_scale = 0x1000000 / ((o << 8) / p->kcountsize);
-#endif
-	} else
-		s_scale = SCALE_1_TO_1;
-
 	moncontrol(1);
 }
 


-- 
Marcel Moolenaar
xcllnt@mac.com

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?79DD181D-3885-45F5-9E9D-753553D19891>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation