From owner-freebsd-current@FreeBSD.ORG Wed Dec 8 15:12:08 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8A9E9106566B; Wed, 8 Dec 2010 15:12:08 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 5964E8FC16; Wed, 8 Dec 2010 15:12:08 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 046AC46B0D; Wed, 8 Dec 2010 10:12:08 -0500 (EST) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id B685D8A009; Wed, 8 Dec 2010 10:12:06 -0500 (EST) From: John Baldwin To: David Xu Date: Wed, 8 Dec 2010 09:29:22 -0500 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20101102; KDE/4.4.5; amd64; ; ) References: <20101205231829.GA68156@troutmask.apl.washington.edu> <201012070913.17118.jhb@freebsd.org> <4CFEF34D.1030701@freebsd.org> In-Reply-To: <4CFEF34D.1030701@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201012080929.22825.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Wed, 08 Dec 2010 10:12:06 -0500 (EST) X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.9 required=4.2 tests=BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx Cc: freebsd-current@freebsd.org, Steve Kargl Subject: Re: Process accounting/timing has broken recently X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Dec 2010 15:12:08 -0000 On Tuesday, December 07, 2010 9:54:05 pm David Xu wrote: > John Baldwin wrote: > > On Monday, December 06, 2010 7:11:28 pm David Xu wrote: > >> John Baldwin wrote: > >>> On Sunday, December 05, 2010 6:18:29 pm Steve Kargl wrote: > >>> > >>>> Sometime in the last 7-10 days, some one made a > >>>> change that has broken process accounting/timing. > >>>> > >>>> laptop:kargl[42] foreach i ( 0 1 2 3 4 5 6 7 8 9 ) > >>>> foreach? time ./testf > >>>> foreach? end > >>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04 > >>>> 69.55 real 38.39 user 30.94 sys > >>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04 > >>>> 68.82 real 40.95 user 27.60 sys > >>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04 > >>>> 69.14 real 38.90 user 30.02 sys > >>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04 > >>>> 68.79 real 40.59 user 27.99 sys > >>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04 > >>>> 68.93 real 39.76 user 28.96 sys > >>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04 > >>>> 68.71 real 41.21 user 27.29 sys > >>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04 > >>>> 69.05 real 39.68 user 29.15 sys > >>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04 > >>>> 68.99 real 39.98 user 28.80 sys > >>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04 > >>>> 69.02 real 39.64 user 29.16 sys > >>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04 > >>>> 69.38 real 37.49 user 31.67 sys > >>>> > >>>> testf is a numerically intensive program that tests the > >>>> accuracy of expf() in a tight loop. User time varies > >>>> by ~3 seconds on my lightly loaded 2 GHz core2 duo processor. > >>>> I'm fairly certain that the code does not suddenly grow/loose > >>>> 6 GFLOP of operations. > >>>> > >>> The user/sys thing is a hack (and has been). We sample the PC at stathz (~128 > >>> hz) to figure out a user vs sys split and use that to divide up the total > >>> runtime (which actually is fairly accurate). All you need is for the clock > >>> ticks to fire just a bit differently between runs to get a swing in user vs > >>> system time. > >>> > >>> What I would like is to keep separate raw bintime's for user vs system time in > >>> the raw data instead, but that would involve checking the CPU ticker more > >>> often (e.g. twice for each syscall, interrupt, and trap in addition to the > >>> current once per context switch). So far folks seem to be more worried about > >>> the extra overhead rather than the loss of accuracy. > >>> > >>> > >> Adding any instruction into global syscall path should be cautioned, it > >> is worse then before, thinking about a threaded application, a userland > >> thread may have locked a mutex and calls a system call, the overhead > >> added to system call path can directly affect a threaded application's > >> performance now, because the time window the mutex is held > >> is longer than before, I have seen some people likes to fiddle with > >> system call path, it should be cautioned. > > > > OTOH, the current getrusage(2) stats cannot be trusted. The only meaningful > > thing you can do is to sum them since the total is known to be accurate at > > least. > > > > If it wouldn't make things so messy I'd consider a new kernel option > > 'ACCURATE_RUSAGE' or some such. > > > Our getrusage is already very slow, everytime, it needs to > iterate the threads list with a process SLOCK held. I saw some mysql > versions heavily use getrusage, and a horribly slow. I think a > ACCURATE_RUSAGE will make it worse ? Using 'time foo' only retrieves the usage once at the end via wait(). However, FWIW, I think ACCURATE_RUSAGE would simplify calcru1() quite a bit since you would just sum up the thread's usage (as we do know), but then do a direct conversion from bintime to timeval for user and system without ever having to worry about time going backwards, etc. All the hacks to enforce monotonicity that are currently in place would go away since we would have the real measurements and not be inferring them from statclock ticks. -- John Baldwin