From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 12 18:52:30 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 54B9C1065673; Thu, 12 Jul 2012 18:52:30 +0000 (UTC) (envelope-from paul@glccom.com) Received: from exchange.glccom.com (exchange.glccom.com [209.152.99.146]) by mx1.freebsd.org (Postfix) with ESMTP id 063968FC08; Thu, 12 Jul 2012 18:52:30 +0000 (UTC) Received: from [192.168.10.27] (192.168.10.27) by exchange.glccom.com (192.168.10.5) with Microsoft SMTP Server id 8.3.83.0; Thu, 12 Jul 2012 13:41:27 -0500 From: Paul Albrecht To: Ian Lepore In-Reply-To: <1342107640.1123.70.camel@revolution.hippie.lan> References: <1342036332.8313.8.camel@albrecht-desktop> <201207120834.40745.jhb@freebsd.org> <1342101436.1123.52.camel@revolution.hippie.lan> <201207121026.37849.jhb@freebsd.org> <1342107640.1123.70.camel@revolution.hippie.lan> Content-Type: text/plain Date: Thu, 12 Jul 2012 13:50:26 -0500 Message-ID: <1342119026.7631.8.camel@albrecht-desktop> MIME-Version: 1.0 X-Mailer: Evolution 2.22.3.1 Content-Transfer-Encoding: 7bit Cc: Davide Italiano , "freebsd-hackers@freebsd.org" , Paul, Albrecht Subject: Re: kqueue periodic timer confusion X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jul 2012 18:52:30 -0000 On Thu, 2012-07-12 at 10:40 -0500, Ian Lepore wrote: > On Thu, 2012-07-12 at 17:08 +0200, Davide Italiano wrote: > > On Thu, Jul 12, 2012 at 4:26 PM, John Baldwin wrote: > > > On Thursday, July 12, 2012 9:57:16 am Ian Lepore wrote: > > >> On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote: > > >> > On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote: > > >> > > On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote: > > >> > > > Hi, > > >> > > > > > >> > > > Sorry about this repost but I'm confused about the responses I received > > >> > > > in my last post so I'm looking for some clarification. > > >> > > > > > >> > > > Specifically, I though I could use the kqueue timer as essentially a > > >> > > > "drop in" replacement for linuxfd_create/read, but was surprised that > > >> > > > the accuracy of the kqueue timer is much less than what I need for my > > >> > > > application. > > >> > > > > > >> > > > So my confusion at this point is whether this is consider to be a bug or > > >> > > > "feature"? > > >> > > > > > >> > > > Here's some test code if you want to verify the problem: > > >> > > > > > >> > > > #include > > >> > > > #include > > >> > > > #include > > >> > > > #include > > >> > > > #include > > >> > > > #include > > >> > > > #include > > >> > > > #include > > >> > > > > > >> > > > int > > >> > > > main(void) > > >> > > > { > > >> > > > int i,msec; > > >> > > > int kq,nev; > > >> > > > struct kevent inqueue; > > >> > > > struct kevent outqueue; > > >> > > > struct timeval start,end; > > >> > > > > > >> > > > if ((kq = kqueue()) == -1) { > > >> > > > fprintf(stderr, "kqueue error!? errno = %s", > > >> > strerror(errno)); > > >> > > > exit(EXIT_FAILURE); > > >> > > > } > > >> > > > EV_SET(&inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0); > > >> > > > > > >> > > > gettimeofday(&start, 0); > > >> > > > for (i = 0; i < 50; i++) { > > >> > > > if ((nev = kevent(kq, &inqueue, 1, &outqueue, 1, NULL)) == > > >> > -1) { > > >> > > > fprintf(stderr, "kevent error!? errno = %s", > > >> > strerror(errno)); > > >> > > > exit(EXIT_FAILURE); > > >> > > > } else if (outqueue.flags & EV_ERROR) { > > >> > > > fprintf(stderr, "EV_ERROR: %s\n", > > >> > strerror(outqueue.data)); > > >> > > > exit(EXIT_FAILURE); > > >> > > > } > > >> > > > } > > >> > > > gettimeofday(&end, 0); > > >> > > > > > >> > > > msec = ((end.tv_sec - start.tv_sec) * 1000) + (((1000000 + > > >> > end.tv_usec - start.tv_usec) / 1000) - 1000); > > >> > > > > > >> > > > printf("msec = %d\n", msec); > > >> > > > > > >> > > > close(kq); > > >> > > > return EXIT_SUCCESS; > > >> > > > } > > >> > > > > > >> > > > > > >> > > > > >> > > What you are seeing is "just the way FreeBSD currently works." > > >> > > > > >> > > Sleeping (in most all of its various forms, and I've just looked at the > > >> > > kevent code to verify this is true there) is handled by converting the > > >> > > amount of time to sleep (usually specified in a timeval or timespec > > >> > > struct) to a count of timer ticks, using an internal routine called > > >> > > tvtohz() in kern/kern_time.c. That routine rounds up by one tick to > > >> > > account for the current tick. Whether that's a good idea or not (it > > >> > > probably was once, and probably not anymore) it's how things currently > > >> > > work, and could explain the fairly consistant +1ms you're seeing. > > >> > > > >> > This is all true, but mostly irrelevant for his case. EVFILT_TIMER > > >> > installs a periodic callout that executes KNOTE() and then resets itself (via > > >> > callout_reset()) each time it runs. This should generally be closer to > > >> > regulary spaced intervals than something that does: > > >> > > > >> > > >> In what way is it irrelevant? That is, what did I miss? It appears to > > >> me that the next callout is scheduled by calling timertoticks() passing > > >> a count of milliseconds, that count is converted to a struct timeval and > > >> passed to tvtohz() which is where the +1 adjustment happens. If you ask > > >> for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms. > > >> There is some time, likely a small number of microseconds, that you've > > >> consumed of the current tick, and that's what the +1 in tvtohz() is > > >> supposed to account for according to the comments. > > >> > > >> The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick > > >> and then adds one tick on top of that. That seems not quite right to > > >> me, except that it is a way to g'tee that you don't return early, and > > >> that is the one promise made by sleep routines on any OS; those magical > > >> "at least" words always appear in the docs. > > >> > > >> Actually what I'm missing (that I know of) is how the scheduler works. > > >> Maybe the +1 adjustment to account for the fraction of the current tick > > >> you've already consumed is the right thing to do, even when that > > >> fraction is 1uS or less of a 1mS tick. That would depend on scheduler > > >> behavior that I know nothing about. > > > > > > Ohhhhh. My bad, sorry. You are correct. It is a bug to use +1 in this > > > case. That is, the +1 makes sense when you are computing a one-time delta > > > for things like nanosleep(). It is incorrect when computing a periodic > > > delta such as for computing the interval for an itimer (setitimer) or > > > EVFILT_TIMER(). > > > > > > Hah, setitimer()'s callout (realitexpire) uses tvtohz - 1: > > > > > > sys/kern/kern_time.c: > > > > > > /* > > > * Real interval timer expired: > > > * send process whose timer expired an alarm signal. > > > * If time is not set up to reload, then just return. > > > * Else compute next time timer should go off which is > current time. > > > * This is where delay in processing this timeout causes multiple > > > * SIGALRM calls to be compressed into one. > > > * tvtohz() always adds 1 to allow for the time until the next clock > > > * interrupt being strictly less than 1 clock tick, but we don't want > > > * that here since we want to appear to be in sync with the clock > > > * interrupt even when we're delayed. > > > */ > > > void > > > realitexpire(void *arg) > > > { > > > struct proc *p; > > > struct timeval ctv, ntv; > > > > > > p = (struct proc *)arg; > > > PROC_LOCK(p); > > > kern_psignal(p, SIGALRM); > > > if (!timevalisset(&p->p_realtimer.it_interval)) { > > > timevalclear(&p->p_realtimer.it_value); > > > if (p->p_flag & P_WEXIT) > > > wakeup(&p->p_itcallout); > > > PROC_UNLOCK(p); > > > return; > > > } > > > for (;;) { > > > timevaladd(&p->p_realtimer.it_value, > > > &p->p_realtimer.it_interval); > > > getmicrouptime(&ctv); > > > if (timevalcmp(&p->p_realtimer.it_value, &ctv, >)) { > > > ntv = p->p_realtimer.it_value; > > > timevalsub(&ntv, &ctv); > > > callout_reset(&p->p_itcallout, tvtohz(&ntv) - 1, > > > realitexpire, p); > > > PROC_UNLOCK(p); > > > return; > > > } > > > } > > > /*NOTREACHED*/ > > > } > > > > > > Paul, try this patch for sys/kern/kern_event.c. It uses the same approach as > > > seitimer() above: > > > > > > Index: kern_event.c > > > =================================================================== > > > --- kern_event.c (revision 238365) > > > +++ kern_event.c (working copy) > > > @@ -538,7 +538,7 @@ filt_timerexpire(void *knx) > > > > > > if ((kn->kn_flags & EV_ONESHOT) != EV_ONESHOT) { > > > calloutp = (struct callout *)kn->kn_hook; > > > - callout_reset_curcpu(calloutp, timertoticks(kn->kn_sdata), > > > + callout_reset_curcpu(calloutp, timertoticks(kn->kn_sdata) - 1, > > > filt_timerexpire, kn); > > > } > > > } > > > > > > -- > > > John Baldwin > > > _______________________________________________ > > > freebsd-hackers@freebsd.org mailing list > > > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > > > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > > > > John, > > I don't think it's good to decrease by a unit the 'ticks' you pass to > > callout_reset_* KPI. > > If this have to be fixed it should be fixed at the callout level and > > not at the consumer level. In other words, subsystems that makes use > > of callout_reset_* should not deal with the inherent limitations of > > callout precision, as it is right now. > > > > Davide > > I'm not sure it can be fixed in the callout_reset_* routines because > those routines have no idea whether the passed-in tick count came from > tvtohz() where the +1 gets applied. Either tvtohz() has to stop adding > a tick, then other code that needs the adjustment has to be changed to > add it, or code that doesn't need the adjustment has to subtract it back > out. Neither prospect is very appealing, really. > One other thing to note that is that mac os has the same problem. Does that make any difference in determining how to do the fix? > -- Ian > -- Paul Albrecht