Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 23 Jan 2005 08:49:07 +1100
From:      Peter Jeremy <PeterJeremy@optushome.com.au>
To:        Chris Landauer <cal@rushg.aero.org>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: time and timing errors in c code on 5.x/i386 (longish)
Message-ID:  <20050122214907.GA241@cirb503493.alcatel.com.au>
In-Reply-To: <200501212249.j0LMnfpJ091129@calamari.aero.org>
References:  <200501212249.j0LMnfpJ091129@calamari.aero.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 2005-Jan-21 14:49:41 -0800, Chris Landauer wrote:
>i'm running some combinatorial search programs that take weeks or months to
>complete, and no timer i've used is able to report correctly the user and
>system time (they all make the same mistake - eventually the user time stops
>incrementing) - i want precise times to do some predictive modeling

[evidence deleted]

The problem looks like an overflow error in calcru().  Have you seen any
kernel messages beginning 'calcru:'?

The offending code is:
        uu = (tu * ut) / tt;
where all variables are uint64
  uu is the user time in microseconds (that will be converted to a timeval
     and reported via getrusage())
  tu is the total usermode runtime allocated to your program (in usec)
  ut is the number of usermode statclock hits (128Hz)
  tt is the total (user+sys+int) statclock hits.

>user 378925.483628 syst 286.845375 elapse 381328.785295 pct 99.44%
>user 379089.748458 syst 286.962284 elapse 381493.700660 pct 99.45%
>user 379255.472355 syst 287.088004 elapse 381660.106387 pct 99.45%
>user 379417.184286 syst 287.190223 elapse 381822.457863 pct 99.45%
>user 379417.184286 syst 451.110470 elapse 381986.906692 pct 99.45%
>user 379417.184286 syst 615.737725 elapse 382152.058304 pct 99.45%

At this point tu is roughly 379417184286 and ut is roughly 48565399
The product is about 1.8e19 - which is roughly 2^64.

That particular code goes all the way back to BSD4.4lite so it's a bug
that has always existed.  We can't use FP in the kernel and don't
support 128-bit integers (or arithmetic) anywhere so a correct fix is
quite ugly (and inefficient) in portable C.

I can suggest two options:
1) If exact timings aren't critical, just use the elapsed time.
2) It would be fairly easy to write some i386 assembler (or __asm())
   that correctly calculated (uint64 * uint32)/uint32 which would
   work for tt < 2^32.  Assuming that nothing is being profiled, this
   would be good for just over a year of process time.

-- 
Peter Jeremy



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050122214907.GA241>