From owner-freebsd-bugs  Sat Dec  7  5: 0:22 2002
Delivered-To: freebsd-bugs@hub.freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 1E5E937B401
	for <freebsd-bugs@hub.freebsd.org>; Sat,  7 Dec 2002 05:00:19 -0800 (PST)
Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21])
	by mx1.FreeBSD.org (Postfix) with ESMTP id B3E6E43EA9
	for <freebsd-bugs@hub.freebsd.org>; Sat,  7 Dec 2002 05:00:18 -0800 (PST)
	(envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1])
	by freefall.freebsd.org (8.12.6/8.12.6) with ESMTP id gB7D0Ix3089254
	for <freebsd-bugs@freefall.freebsd.org>; Sat, 7 Dec 2002 05:00:18 -0800 (PST)
	(envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.12.6/8.12.6/Submit) id gB7D0I3w089250;
	Sat, 7 Dec 2002 05:00:18 -0800 (PST)
Date: Sat, 7 Dec 2002 05:00:18 -0800 (PST)
Message-Id: <200212071300.gB7D0I3w089250@freefall.freebsd.org>
To: freebsd-bugs@FreeBSD.org
Cc: 
From: Bruce Evans <bde@zeta.org.au>
Subject: Re: kern/46036: inaccurate timeouts in select(),nanosleep() + fix
Reply-To: Bruce Evans <bde@zeta.org.au>
Sender: owner-freebsd-bugs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-bugs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-bugs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-bugs>
X-Loop: FreeBSD.org

The following reply was made to PR kern/46036; it has been noted by GNATS.

From: Bruce Evans <bde@zeta.org.au>
To: Enache Adrian <enache@rdslink.ro>
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: kern/46036: inaccurate timeouts in select(),nanosleep() + fix
Date: Sat, 7 Dec 2002 23:57:40 +1100 (EST)

 On Fri, 6 Dec 2002, Enache Adrian wrote:
 
 > >Description:
 > 	select(),nanosleep(),poll() sleep one tick longer, _except_
 > 	when previous similar system call was interrupted by a signal
 > 	or by i/o conditions on the polled file descriptors.
 >
 > 	I find this rather buggy and inconsequent.
 > 	It badly breaks any code which tries to hook both timers
 > 	and input polling on a select() loop and defeats benchmark
 > 	programs which test for the accuracy of timeouts.
 >
 > 	This is due to tvtohz (sys/kern/kern_clock.c) which always
 > 	adds one to its result to allow for the current tick to
 > 	expire. Instead of adding one it should be better testing
 > 	for a non-zero result.
 
 tvtohz() actually adds 1 to ensure that the sleep time is at least the
 specified time.  This gives a minimum sleep time of 0.0 ticks longer
 than the specified sleep time (as required by standards) and a maximum
 sleep time of about 1.0 tick longer than the specified sleep time,
 with an average sleep time of about 0.5 ticks longer.  Not adding 1
 would give a {min,max,avg} sleep time of about {-1.0,0.0,-0.5} ticks
 longer than specified; i.e., it would be shorter than specified (broken)
 in most cases.
 
 Individual syscalls could handle this by sleeping for the shorter time
 and checking whether the timeout has actually expired when they wake up,
 but most don't, and this would be a pessimization in most cases.  Of
 the above syscalls, only nanosleep() checks that the timeout has actually
 expired.  It does this mainly for the technical reason that very large
 timeouts (ones greater than INT_MAX ticks) are required to work and may
 even be useful (especially if HZ is large).  Checking is a pessimization
 in at least the following cases, because we will just have to wait again
 after wasting time waking up to check:
 - most short timeouts that are a multiple of `tick'.
 - half or the cases for random timeouts of 0.5 ticks (since the current
   tick will expire before the timeout half the time), and proportionally
   for all random timeouts of 1 tick or shorter.
 
 > >How-To-Repeat:
 > 	This little program prints the timeout accuracy of the
 > 	select() system call as "[+-] sec usec".It catches SIGQUIT
 > 	and polls for console input.
 > 	It use to give:
 >
 >    unpatched (regular) kernel:
 > 	+   0 s       9938 us
 >
 > 		--input
 > 	+   0 s       9959 us
 > 	+   0 s       9972 us
 > 	+   0 s       9930 us
 > 	^\      --interrupt
 > 	+   0 s       9954 us
 > 	+   0 s       9954 us
 
 This is the correct behaviour.  We are mostly in sync with the clock
 interrupt, and the default timeout is a multiple of clock.tick, so we
 must sleep for an extra tick (less epsilon) for the timeout to expire.
 select() would have to go back to sleep for an extra tick if tvtohz()
 didn't add 1.
 
 >    patched kernel:
 > 	-   0 s         44 us
 > 	^\      --interrupt
 > 	-   0 s          3 us
 > 	-   0 s        100 us
 >
 > 		--input
 > 	-   0 s         44 us
 > 	-   0 s         47 us
 
 This is incorrect.  The times are the values
 ((actual timeout) - (specified timeout)).  These should be >= 0.
 The errors are small here because the timeout is a multiple of clock.tick
 and we are mostly in sync with clock interrupts.  In general the error
 from not adding 1 is up to 1 tick (see above).
 
 > -------------------------------8x-----------------------------------
 > >Fix:
 > 	patch to kern_clock.c and other places which assume that
 > 	tvtohz adds one follows.
 
 This mainly undoes some FreeBSD fixes.
 
 > 	notes:
 > 	tvtohz has been renamed to hzto in NetBSD but is the same.
 
 Actually, FreeBSD renamed hzto to tvtohz.
 
 > redo:
 > 	tv = savtv; gettimeofday(&otv,NULL);
 > 	/* a quick fix is to substract one tick from tv here
 > 	   but this will break things when compiled on other
 > 	   OS ( linux for instance ) */
 > 	for(;;) {
 > 		FD_SET(0,&rdfs);
 > 		k = select(1,&rdfs,NULL,NULL,&tv);
 > 		gettimeofday(&tv,NULL);
 
 I thought that Linux gets timeouts right -- at least sleep() used to
 work except in very early versions of Linux-0.x.  Perhaps it uses
 clock interrupts to get to get close to the timeout and then
 udelay() to get closer if the residual timeout is small.  In general,
 the residual could be up to `tick' usec, but in cases that are in sync
 with the clock like your test program, the residual would be small
 (about 44 usec) and udelay() could reasonably handle it.
 
 Bruce
 

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message