From owner-freebsd-current@FreeBSD.ORG  Thu Oct 27 23:50:42 2005
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: current@freebsd.org
Delivered-To: freebsd-current@FreeBSD.ORG
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 8CF3D16A41F;
	Thu, 27 Oct 2005 23:50:42 +0000 (GMT) (envelope-from cswiger@mac.com)
Received: from pi.codefab.com (pi.codefab.com [199.103.21.227])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 1ABD843D45;
	Thu, 27 Oct 2005 23:50:42 +0000 (GMT) (envelope-from cswiger@mac.com)
Received: from localhost (localhost [127.0.0.1])
	by pi.codefab.com (Postfix) with ESMTP id 74F835D54;
	Thu, 27 Oct 2005 19:50:41 -0400 (EDT)
Received: from pi.codefab.com ([127.0.0.1])
	by localhost (pi.codefab.com [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id 75289-04; Thu, 27 Oct 2005 19:50:40 -0400 (EDT)
Received: from [192.168.1.3] (pool-68-161-122-227.ny325.east.verizon.net
	[68.161.122.227])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by pi.codefab.com (Postfix) with ESMTP id A8B8E5C53;
	Thu, 27 Oct 2005 19:50:39 -0400 (EDT)
Message-ID: <436167D5.2060104@mac.com>
Date: Thu, 27 Oct 2005 19:50:45 -0400
From: Chuck Swiger <cswiger@mac.com>
Organization: The Courts of Chaos
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
	rv:1.7.12) Gecko/20050915
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Poul-Henning Kamp <phk@phk.freebsd.dk>
References: <26845.1130452524@critter.freebsd.dk>
In-Reply-To: <26845.1130452524@critter.freebsd.dk>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
X-Virus-Scanned: amavisd-new at codefab.com
Cc: David Xu <davidxu@freebsd.org>, "Yuriy N. Shkandybin" <jura@networks.ru>,
	current@freebsd.org
Subject: Re: Timers and timing, was: MySQL Performance 6.0rc1
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 27 Oct 2005 23:50:42 -0000

Poul-Henning Kamp wrote:
> In message <43613541.7030009@mac.com>, Chuck Swiger writes:
>>It doesn't make sense to keep invoking a hardware clock from the kernel for a 
>>timer which is updated at a one-second resolution.  Can't we just keep a static 
>>time_t called __now in libc for time() to return or stuff into *tloc, which 
>>gets updated once in a while (have the scheduler check whether fractional 
>>seconds has rolled over every few ticks)?
> 
> That is a quite slippery slope to head down...
> 
> Calls to time(2) are actually very infrequent (it sort of follows
> logically from the resolution) and therefore they are unlikely to
> be a performance concern in any decently thought out code.

I would agree that calling time(2) millions of times per second is not a common 
or especially useful situation.  :-)

> So adding overhead to the scheduler to improve it is very likely going
> to be false economy:  Yes, performance of the time(2) call will improve
> but everything else will slow down as a result, even in programs
> which never inspect a single timestamp.

The notion of economy is a good one: we want the system to do the least amount 
of work required to perform the tasks assigned to it.  We also want the system 
kernel to manage limited/finite/expensive resources efficiently.

> No, this is just the wrong way to attack the problem.

I believe Darwin keeps the timecounters of the system exposed on a common page 
mapped via the System framework (their libc+libm), which gets mapped in once by 
init, and then shared with all of it's children copy-on-write.  They are using 
the PowerPC timebase registers according to a thread on the darwin-kernel list.

Darwin seems to have very good timing code, and using usleep() to wake up at a 
specific time seems to work quite well.  I wonder if the issue with tvtohz in 
sys/kern/kern_clock.c has been fixed:

http://www.pkix.net/~chuck/timer/
http://www.pkix.net/~chuck/timer/wakeup001.gif

> What is needed here is for somebody to define how non-perfect we
> are willing to allow our timekeeping to be, and _THEN_ we can start
> to look at how fact we can make it work.

OK.  How about this for one "test of timer quality":

If you call gettimeofday() in a tight loop and count how many times it sees 
tv_usecs incremented in a second on an idle machine, how well does the system do?

> Here are some questions to start out:
> 
> For reference the current codes behaviour is noted in [...]
> 
>     *	Does time have to be monotonic between CPUs ?
> 
> 		Consider:
> 
> 		gettimeofday(&t1)	// on CPU1
> 		work(x)			// a couple context switches
> 		gettimeofday(&t2)	// on CPU2
> 
> 		Should it be guaranteed that t2 >= t1 ?
> 
> 		[Yes]

Yes.

>     *   Does time have to be monotonic between different functions ?
> 
> 		Consider (for instance):
> 
> 		clock_gettime(&t1)
> 		work(x)	
> 		gettimeofday(&t2)
> 
> 		Should it be guaranteed that t2 >= t1 ?
> 
> 		For all mixes of time(), gettimeofday() and
> 		clock_gettime() ?
> 
> 		Or only for funcion pairs in order of increasing
> 		resolution ?
> 
> 		hint: think about how we round a timespec of
> 		1.000000500 to a timeval.
> 
> 		[t2 >= t1 for all mixes, provided comparison is
> 		 done in format with lowest resolution and conversion
> 		 is done by truncation]

I am willing to live with timestamps being monotonously increasing simply using 
the same API, but it is obviously better to have all of the interfaces 
reporting consistent views of the same time, modulo the precision limits of the 
various datatypes.

For one case, I have some code which needs to update statistics like "packets 
sent per second" (or "per minute" or "per hour") on a periodic basis.  I use a 
reasonable timeout-- ~50ms-- for a call to select() (or pcap_dispatch(), etc) 
so I check time() perhaps 20 times a second, and then update my per-second 
stats when I notice that time(&now) returns a different value.

Is there a better way of running code once a second, as close to the time the 
clock ticks?

> And when you have answered this, remember that your solution needs
> to be SMP friendly and work on all architectures.

I've at least got a few patches for sys/kern/kern_clock.c mentioned above which 
help the accuracy of usleep/nanosleep, does that count for something?  :-)

-- 
-Chuck