From owner-freebsd-arch@FreeBSD.ORG Thu Jun 7 12:55:34 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D4FFD1065674 for ; Thu, 7 Jun 2012 12:55:34 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 916C78FC18 for ; Thu, 7 Jun 2012 12:55:34 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id D8930B978; Thu, 7 Jun 2012 08:55:33 -0400 (EDT) From: John Baldwin To: Konstantin Belousov Date: Thu, 7 Jun 2012 08:50:55 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p13; KDE/4.5.5; amd64; ; ) References: <20120606165115.GQ85127@deviant.kiev.zoral.com.ua> <201206061423.53179.jhb@freebsd.org> <20120606205938.GS85127@deviant.kiev.zoral.com.ua> In-Reply-To: <20120606205938.GS85127@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201206070850.55751.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 07 Jun 2012 08:55:34 -0400 (EDT) Cc: freebsd-arch@freebsd.org Subject: Re: Fast gettimeofday(2) and clock_gettime(2) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jun 2012 12:55:35 -0000 On Wednesday, June 06, 2012 4:59:38 pm Konstantin Belousov wrote: > On Wed, Jun 06, 2012 at 02:23:53PM -0400, John Baldwin wrote: > > On Wednesday, June 06, 2012 12:51:15 pm Konstantin Belousov wrote: > > > A positive result from the recent flame-bait on arch@ is the working > > > implementation of the fast gettimeofday(2) and clock_gettime(2). The > > > speedup I see is around 6-7x on the 2600K. I think the speedup could > > > be even bigger on the previous generation of CPUs, where lock > > > operations and syscall entry are costlier. A sample test runs of > > > tools/tools/syscall_timing are presented at the end of message. > > > > In general this looks good but I see a few nits / races: > > > > 1) You don't follow the model of clearing tk_current to 0 while you > > are updating the structure that the in-kernel timecounter code > > uses. This also means you have to avoid using a tk_current of 0 > > and that userland has to keep spinning as long as tk_current is 0. > > Without this I believe userland can read a partially updated > > structure. > I changed the code to be much more similar to the kern_tc.c. I (re)added > the generation field, which is set to 0 upon kernel touching timehands. Thank you. BTW, I think we should use atomic_load_acq_int() on both accesses to th_gen (and the in-kernel binuptime should do the same). I realize this requires using rmb before the while condition in userland since we can't use atomic_load_acq_int() here. I think it should also use atomic_store_rel_int() for both stores to th_gen during the tc_windup() callback. > I think this can only happen if tc_windups occurs quite close in > succession, or usermode thread is suspended for long enough. BTW, > even generation could loop back to the previous value if thread is > stopped. Having the 32-bit generation count roll over should take a long while. > > > sandy% /usr/home/pooma/build/bsd/DEV/stuff/tests/syscall_timing_32 > > gettimeofday > > > Clock resolution: 0.000000076 > > > test loop time iterations periteration > > > gettimeofday 0 1.000994225 21623297 0.000000046 > > > gettimeofday 1 1.000994980 21596492 0.000000046 > > > gettimeofday 2 1.001070595 21598326 0.000000046 > > > gettimeofday 3 1.000922308 21581398 0.000000046 > > > gettimeofday 4 1.000984264 21605539 0.000000046 > > > gettimeofday 5 1.000989697 21601659 0.000000046 > > > gettimeofday 6 1.000996261 21598385 0.000000046 > > > gettimeofday 7 1.001002223 21583933 0.000000046 > > > gettimeofday 8 1.000985847 21599442 0.000000046 > > > gettimeofday 9 1.000994977 21600935 0.000000046 > > > sandy% sudo sysctl kern.timecounter.fast_gettime=0 > > > > I think this means you can call gettimeofday() in about 46 ns now > > vs 310 the "old" way? > > Yes. This is for 32bit, while for 64 bit binaries the numbers are > 155->25 ns on the same hw. Ah, good. A non-generic hardcoded amd64 version is around 20ns, so this is comparable. -- John Baldwin