From owner-freebsd-hackers@FreeBSD.ORG  Thu Jul 12 14:26:40 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 653901065672
	for <freebsd-hackers@freebsd.org>; Thu, 12 Jul 2012 14:26:40 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
	[IPv6:2001:470:1f10:75::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 20B218FC17
	for <freebsd-hackers@freebsd.org>; Thu, 12 Jul 2012 14:26:40 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6DD63B960;
	Thu, 12 Jul 2012 10:26:39 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Ian Lepore <freebsd@damnhippie.dyndns.org>
Date: Thu, 12 Jul 2012 10:26:37 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p17; KDE/4.5.5; amd64; ; )
References: <1342036332.8313.8.camel@albrecht-desktop>
	<201207120834.40745.jhb@freebsd.org>
	<1342101436.1123.52.camel@revolution.hippie.lan>
In-Reply-To: <1342101436.1123.52.camel@revolution.hippie.lan>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201207121026.37849.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
	(bigwig.baldwin.cx); Thu, 12 Jul 2012 10:26:39 -0400 (EDT)
Cc: freebsd-hackers@freebsd.org, Paul Albrecht <albrecht@glccom.com>
Subject: Re: kqueue periodic timer confusion
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 Jul 2012 14:26:40 -0000

On Thursday, July 12, 2012 9:57:16 am Ian Lepore wrote:
> On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote:
> > On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote:
> > > On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote:
> > > > Hi,
> > > > 
> > > > Sorry about this repost but I'm confused about the responses I received
> > > > in my last post so I'm looking for some clarification.
> > > > 
> > > > Specifically, I though I could use the kqueue timer as essentially a
> > > > "drop in" replacement for linuxfd_create/read, but was surprised that
> > > > the accuracy of the kqueue timer is much less than what I need for my
> > > > application.
> > > > 
> > > > So my confusion at this point is whether this is consider to be a bug or
> > > > "feature"?
> > > > 
> > > > Here's some test code if you want to verify the problem:
> > > > 
> > > > #include <stdio.h>
> > > > #include <stdlib.h>
> > > > #include <string.h>
> > > > #include <unistd.h>
> > > > #include <errno.h>
> > > > #include <sys/types.h>
> > > > #include <sys/event.h>
> > > > #include <sys/time.h>
> > > > 
> > > > int
> > > > main(void)
> > > > {
> > > >         int i,msec;
> > > >         int kq,nev;
> > > >         struct kevent inqueue;
> > > >         struct kevent outqueue;
> > > >         struct timeval start,end;
> > > > 
> > > >         if ((kq = kqueue()) == -1) {
> > > >                 fprintf(stderr, "kqueue error!? errno = %s", 
> > strerror(errno));
> > > >                 exit(EXIT_FAILURE);
> > > >         }
> > > >         EV_SET(&inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0);
> > > > 
> > > >         gettimeofday(&start, 0);
> > > >         for (i = 0; i < 50; i++) {
> > > >                 if ((nev = kevent(kq, &inqueue, 1, &outqueue, 1, NULL)) == 
> > -1) {
> > > >                         fprintf(stderr, "kevent error!? errno = %s", 
> > strerror(errno));
> > > >                         exit(EXIT_FAILURE);
> > > >                 } else if (outqueue.flags & EV_ERROR) {
> > > >                         fprintf(stderr, "EV_ERROR: %s\n", 
> > strerror(outqueue.data));
> > > >                         exit(EXIT_FAILURE);
> > > >                 }
> > > >         }
> > > >         gettimeofday(&end, 0);
> > > > 
> > > >         msec = ((end.tv_sec - start.tv_sec) * 1000) + (((1000000 + 
> > end.tv_usec - start.tv_usec) / 1000) - 1000);
> > > > 
> > > >         printf("msec = %d\n", msec);
> > > > 
> > > >         close(kq);
> > > >         return EXIT_SUCCESS;
> > > > }
> > > > 
> > > > 
> > > 
> > > What you are seeing is "just the way FreeBSD currently works."  
> > > 
> > > Sleeping (in most all of its various forms, and I've just looked at the
> > > kevent code to verify this is true there) is handled by converting the
> > > amount of time to sleep (usually specified in a timeval or timespec
> > > struct) to a count of timer ticks, using an internal routine called
> > > tvtohz() in kern/kern_time.c.  That routine rounds up by one tick to
> > > account for the current tick.  Whether that's a good idea or not (it
> > > probably was once, and probably not anymore) it's how things currently
> > > work, and could explain the fairly consistant +1ms you're seeing.
> > 
> > This is all true, but mostly irrelevant for his case.  EVFILT_TIMER
> > installs a periodic callout that executes KNOTE() and then resets itself (via 
> > callout_reset()) each time it runs.  This should generally be closer to
> > regulary spaced intervals than something that does:
> > 
> 
> In what way is it irrelevant?  That is, what did I miss?  It appears to
> me that the next callout is scheduled by calling timertoticks() passing
> a count of milliseconds, that count is converted to a struct timeval and
> passed to tvtohz() which is where the +1 adjustment happens.  If you ask
> for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms.
> There is some time, likely a small number of microseconds, that you've
> consumed of the current tick, and that's what the +1 in tvtohz() is
> supposed to account for according to the comments.
> 
> The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick
> and then adds one tick on top of that.  That seems not quite right to
> me, except that it is a way to g'tee that you don't return early, and
> that is the one promise made by sleep routines on any OS; those magical
> "at least" words always appear in the docs.
> 
> Actually what I'm missing (that I know of) is how the scheduler works.
> Maybe the +1 adjustment to account for the fraction of the current tick
> you've already consumed is the right thing to do, even when that
> fraction is 1uS or less of a 1mS tick.  That would depend on scheduler
> behavior that I know nothing about.

Ohhhhh.  My bad, sorry.  You are correct.  It is a bug to use +1 in this
case.  That is, the +1 makes sense when you are computing a one-time delta
for things like nanosleep().  It is incorrect when computing a periodic
delta such as for computing the interval for an itimer (setitimer) or
EVFILT_TIMER().

Hah, setitimer()'s callout (realitexpire) uses tvtohz - 1:

sys/kern/kern_time.c:

/*
 * Real interval timer expired:
 * send process whose timer expired an alarm signal.
 * If time is not set up to reload, then just return.
 * Else compute next time timer should go off which is > current time.
 * This is where delay in processing this timeout causes multiple
 * SIGALRM calls to be compressed into one.
 * tvtohz() always adds 1 to allow for the time until the next clock
 * interrupt being strictly less than 1 clock tick, but we don't want
 * that here since we want to appear to be in sync with the clock
 * interrupt even when we're delayed.
 */
void
realitexpire(void *arg)
{
	struct proc *p;
	struct timeval ctv, ntv;

	p = (struct proc *)arg;
	PROC_LOCK(p);
	kern_psignal(p, SIGALRM);
	if (!timevalisset(&p->p_realtimer.it_interval)) {
		timevalclear(&p->p_realtimer.it_value);
		if (p->p_flag & P_WEXIT)
			wakeup(&p->p_itcallout);
		PROC_UNLOCK(p);
		return;
	}
	for (;;) {
		timevaladd(&p->p_realtimer.it_value,
		    &p->p_realtimer.it_interval);
		getmicrouptime(&ctv);
		if (timevalcmp(&p->p_realtimer.it_value, &ctv, >)) {
			ntv = p->p_realtimer.it_value;
			timevalsub(&ntv, &ctv);
			callout_reset(&p->p_itcallout, tvtohz(&ntv) - 1,
			    realitexpire, p);
			PROC_UNLOCK(p);
			return;
		}
	}
	/*NOTREACHED*/
}

Paul, try this patch for sys/kern/kern_event.c.  It uses the same approach as
seitimer() above:

Index: kern_event.c
===================================================================
--- kern_event.c	(revision 238365)
+++ kern_event.c	(working copy)
@@ -538,7 +538,7 @@ filt_timerexpire(void *knx)
 
 	if ((kn->kn_flags & EV_ONESHOT) != EV_ONESHOT) {
 		calloutp = (struct callout *)kn->kn_hook;
-		callout_reset_curcpu(calloutp, timertoticks(kn->kn_sdata),
+		callout_reset_curcpu(calloutp, timertoticks(kn->kn_sdata) - 1,
 		    filt_timerexpire, kn);
 	}
 }

-- 
John Baldwin