Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Oct 2005 19:50:45 -0400
From:      Chuck Swiger <cswiger@mac.com>
To:        Poul-Henning Kamp <phk@phk.freebsd.dk>
Cc:        David Xu <davidxu@freebsd.org>, "Yuriy N. Shkandybin" <jura@networks.ru>, current@freebsd.org
Subject:   Re: Timers and timing, was: MySQL Performance 6.0rc1
Message-ID:  <436167D5.2060104@mac.com>
In-Reply-To: <26845.1130452524@critter.freebsd.dk>
References:  <26845.1130452524@critter.freebsd.dk>

next in thread | previous in thread | raw e-mail | index | archive | help
Poul-Henning Kamp wrote:
> In message <43613541.7030009@mac.com>, Chuck Swiger writes:
>>It doesn't make sense to keep invoking a hardware clock from the kernel for a 
>>timer which is updated at a one-second resolution.  Can't we just keep a static 
>>time_t called __now in libc for time() to return or stuff into *tloc, which 
>>gets updated once in a while (have the scheduler check whether fractional 
>>seconds has rolled over every few ticks)?
> 
> That is a quite slippery slope to head down...
> 
> Calls to time(2) are actually very infrequent (it sort of follows
> logically from the resolution) and therefore they are unlikely to
> be a performance concern in any decently thought out code.

I would agree that calling time(2) millions of times per second is not a common 
or especially useful situation.  :-)

> So adding overhead to the scheduler to improve it is very likely going
> to be false economy:  Yes, performance of the time(2) call will improve
> but everything else will slow down as a result, even in programs
> which never inspect a single timestamp.

The notion of economy is a good one: we want the system to do the least amount 
of work required to perform the tasks assigned to it.  We also want the system 
kernel to manage limited/finite/expensive resources efficiently.

> No, this is just the wrong way to attack the problem.

I believe Darwin keeps the timecounters of the system exposed on a common page 
mapped via the System framework (their libc+libm), which gets mapped in once by 
init, and then shared with all of it's children copy-on-write.  They are using 
the PowerPC timebase registers according to a thread on the darwin-kernel list.

Darwin seems to have very good timing code, and using usleep() to wake up at a 
specific time seems to work quite well.  I wonder if the issue with tvtohz in 
sys/kern/kern_clock.c has been fixed:

http://www.pkix.net/~chuck/timer/
http://www.pkix.net/~chuck/timer/wakeup001.gif

> What is needed here is for somebody to define how non-perfect we
> are willing to allow our timekeeping to be, and _THEN_ we can start
> to look at how fact we can make it work.

OK.  How about this for one "test of timer quality":

If you call gettimeofday() in a tight loop and count how many times it sees 
tv_usecs incremented in a second on an idle machine, how well does the system do?

> Here are some questions to start out:
> 
> For reference the current codes behaviour is noted in [...]
> 
>     *	Does time have to be monotonic between CPUs ?
> 
> 		Consider:
> 
> 		gettimeofday(&t1)	// on CPU1
> 		work(x)			// a couple context switches
> 		gettimeofday(&t2)	// on CPU2
> 
> 		Should it be guaranteed that t2 >= t1 ?
> 
> 		[Yes]

Yes.

>     *   Does time have to be monotonic between different functions ?
> 
> 		Consider (for instance):
> 
> 		clock_gettime(&t1)
> 		work(x)	
> 		gettimeofday(&t2)
> 
> 		Should it be guaranteed that t2 >= t1 ?
> 
> 		For all mixes of time(), gettimeofday() and
> 		clock_gettime() ?
> 
> 		Or only for funcion pairs in order of increasing
> 		resolution ?
> 
> 		hint: think about how we round a timespec of
> 		1.000000500 to a timeval.
> 
> 		[t2 >= t1 for all mixes, provided comparison is
> 		 done in format with lowest resolution and conversion
> 		 is done by truncation]

I am willing to live with timestamps being monotonously increasing simply using 
the same API, but it is obviously better to have all of the interfaces 
reporting consistent views of the same time, modulo the precision limits of the 
various datatypes.

For one case, I have some code which needs to update statistics like "packets 
sent per second" (or "per minute" or "per hour") on a periodic basis.  I use a 
reasonable timeout-- ~50ms-- for a call to select() (or pcap_dispatch(), etc) 
so I check time() perhaps 20 times a second, and then update my per-second 
stats when I notice that time(&now) returns a different value.

Is there a better way of running code once a second, as close to the time the 
clock ticks?

> And when you have answered this, remember that your solution needs
> to be SMP friendly and work on all architectures.

I've at least got a few patches for sys/kern/kern_clock.c mentioned above which 
help the accuracy of usleep/nanosleep, does that count for something?  :-)

-- 
-Chuck



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?436167D5.2060104>