From owner-freebsd-current@FreeBSD.ORG Tue Dec 7 14:39:56 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 534D0106566C; Tue, 7 Dec 2010 14:39:56 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id D2B8F8FC17; Tue, 7 Dec 2010 14:39:55 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 5EBB746B38; Tue, 7 Dec 2010 09:39:55 -0500 (EST) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 575328A009; Tue, 7 Dec 2010 09:39:54 -0500 (EST) From: John Baldwin To: David Xu Date: Tue, 7 Dec 2010 09:13:17 -0500 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20101102; KDE/4.4.5; amd64; ; ) References: <20101205231829.GA68156@troutmask.apl.washington.edu> <201012060944.03196.jhb@freebsd.org> <4CFD7BB0.20500@freebsd.org> In-Reply-To: <4CFD7BB0.20500@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201012070913.17118.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Tue, 07 Dec 2010 09:39:54 -0500 (EST) X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.9 required=4.2 tests=BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx Cc: freebsd-current@freebsd.org, Steve Kargl Subject: Re: Process accounting/timing has broken recently X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Dec 2010 14:39:56 -0000 On Monday, December 06, 2010 7:11:28 pm David Xu wrote: > John Baldwin wrote: > > On Sunday, December 05, 2010 6:18:29 pm Steve Kargl wrote: > > > >> Sometime in the last 7-10 days, some one made a > >> change that has broken process accounting/timing. > >> > >> laptop:kargl[42] foreach i ( 0 1 2 3 4 5 6 7 8 9 ) > >> foreach? time ./testf > >> foreach? end > >> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04 > >> 69.55 real 38.39 user 30.94 sys > >> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04 > >> 68.82 real 40.95 user 27.60 sys > >> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04 > >> 69.14 real 38.90 user 30.02 sys > >> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04 > >> 68.79 real 40.59 user 27.99 sys > >> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04 > >> 68.93 real 39.76 user 28.96 sys > >> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04 > >> 68.71 real 41.21 user 27.29 sys > >> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04 > >> 69.05 real 39.68 user 29.15 sys > >> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04 > >> 68.99 real 39.98 user 28.80 sys > >> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04 > >> 69.02 real 39.64 user 29.16 sys > >> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04 > >> 69.38 real 37.49 user 31.67 sys > >> > >> testf is a numerically intensive program that tests the > >> accuracy of expf() in a tight loop. User time varies > >> by ~3 seconds on my lightly loaded 2 GHz core2 duo processor. > >> I'm fairly certain that the code does not suddenly grow/loose > >> 6 GFLOP of operations. > >> > > > > The user/sys thing is a hack (and has been). We sample the PC at stathz (~128 > > hz) to figure out a user vs sys split and use that to divide up the total > > runtime (which actually is fairly accurate). All you need is for the clock > > ticks to fire just a bit differently between runs to get a swing in user vs > > system time. > > > > What I would like is to keep separate raw bintime's for user vs system time in > > the raw data instead, but that would involve checking the CPU ticker more > > often (e.g. twice for each syscall, interrupt, and trap in addition to the > > current once per context switch). So far folks seem to be more worried about > > the extra overhead rather than the loss of accuracy. > > > > > Adding any instruction into global syscall path should be cautioned, it > is worse then before, thinking about a threaded application, a userland > thread may have locked a mutex and calls a system call, the overhead > added to system call path can directly affect a threaded application's > performance now, because the time window the mutex is held > is longer than before, I have seen some people likes to fiddle with > system call path, it should be cautioned. OTOH, the current getrusage(2) stats cannot be trusted. The only meaningful thing you can do is to sum them since the total is known to be accurate at least. If it wouldn't make things so messy I'd consider a new kernel option 'ACCURATE_RUSAGE' or some such. -- John Baldwin