From owner-freebsd-hackers@FreeBSD.ORG Sun Jan 23 01:11:34 2005 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C373116A4CE for ; Sun, 23 Jan 2005 01:11:34 +0000 (GMT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6D6C343D1F for ; Sun, 23 Jan 2005 01:11:34 +0000 (GMT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.1/8.13.1) with ESMTP id j0N1BQTu056547; Sat, 22 Jan 2005 17:11:29 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <200501230111.j0N1BQTu056547@gw.catspoiler.org> Date: Sat, 22 Jan 2005 17:11:26 -0800 (PST) From: Don Lewis To: PeterJeremy@optushome.com.au In-Reply-To: <20050122214907.GA241@cirb503493.alcatel.com.au> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii cc: cal@rushg.aero.org cc: freebsd-hackers@FreeBSD.org Subject: Re: time and timing errors in c code on 5.x/i386 (longish) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 Jan 2005 01:11:34 -0000 On 23 Jan, Peter Jeremy wrote: > On Fri, 2005-Jan-21 14:49:41 -0800, Chris Landauer wrote: >>i'm running some combinatorial search programs that take weeks or months to >>complete, and no timer i've used is able to report correctly the user and >>system time (they all make the same mistake - eventually the user time stops >>incrementing) - i want precise times to do some predictive modeling > > [evidence deleted] > > The problem looks like an overflow error in calcru(). Have you seen any > kernel messages beginning 'calcru:'? I haven't on my 4.x machine. > The offending code is: > uu = (tu * ut) / tt; > where all variables are uint64 > uu is the user time in microseconds (that will be converted to a timeval > and reported via getrusage()) > tu is the total usermode runtime allocated to your program (in usec) > ut is the number of usermode statclock hits (128Hz) > tt is the total (user+sys+int) statclock hits. > >>user 378925.483628 syst 286.845375 elapse 381328.785295 pct 99.44% >>user 379089.748458 syst 286.962284 elapse 381493.700660 pct 99.45% >>user 379255.472355 syst 287.088004 elapse 381660.106387 pct 99.45% >>user 379417.184286 syst 287.190223 elapse 381822.457863 pct 99.45% >>user 379417.184286 syst 451.110470 elapse 381986.906692 pct 99.45% >>user 379417.184286 syst 615.737725 elapse 382152.058304 pct 99.45% > > At this point tu is roughly 379417184286 and ut is roughly 48565399 > The product is about 1.8e19 - which is roughly 2^64. > > That particular code goes all the way back to BSD4.4lite so it's a bug > that has always existed. We can't use FP in the kernel and don't > support 128-bit integers (or arithmetic) anywhere so a correct fix is > quite ugly (and inefficient) in portable C. I think this explains why setiathome thinks it has stopped accumulating CPU time after a while. I've mostly noticed this on my 4.x machine because it runs 24x7 and tends to have long uptimes.