From owner-freebsd-hackers@FreeBSD.ORG Fri Jul 13 12:22:20 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7101F106566C; Fri, 13 Jul 2012 12:22:20 +0000 (UTC) (envelope-from davide.italiano@gmail.com) Received: from mail-pb0-f54.google.com (mail-pb0-f54.google.com [209.85.160.54]) by mx1.freebsd.org (Postfix) with ESMTP id 3CB318FC12; Fri, 13 Jul 2012 12:22:20 +0000 (UTC) Received: by pbbro2 with SMTP id ro2so5959272pbb.13 for ; Fri, 13 Jul 2012 05:22:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=C7TcbJCHRDfoUyF3X26Ab8WxG2/TzmAUoXaDH1zIjUc=; b=ZzhcRXUp0Cv2WyNABJpHfg6U+TsmzG3srujxHxdkG3U7s8S7H4woBdKmL1H5Rwo5/n gB/ITqZxwt6h5Gcl+cAaY83HhWt7AGSF3sjYlXa3/uzhYGwWpfBQEjpem3kFuA1ZdSXz wD58Rx2yPTNC0ERWiWBuEbZpBSK/SJ7+WEnjWawraE+1bM05liK/wQxo1cGXYICxHm7w 6ShXXwvY9J9sKU+Db1EwjoOlPDW/PKqqnUP0tYmN3ZApro8RWpfCsYQk1ya60ORkuskq WM90jutZr9qMDh6MPIQSBdiiz1ljk0DMusGYAs8ZG8nU5EsrLFs5GZ0wHJRPUZvHa59L BK4A== MIME-Version: 1.0 Received: by 10.68.191.8 with SMTP id gu8mr3216977pbc.158.1342182134024; Fri, 13 Jul 2012 05:22:14 -0700 (PDT) Received: by 10.66.82.201 with HTTP; Fri, 13 Jul 2012 05:22:13 -0700 (PDT) In-Reply-To: <201207121125.01228.jhb@freebsd.org> References: <1342036332.8313.8.camel@albrecht-desktop> <201207121026.37849.jhb@freebsd.org> <201207121125.01228.jhb@freebsd.org> Date: Fri, 13 Jul 2012 14:22:13 +0200 Message-ID: From: Davide Italiano To: John Baldwin Content-Type: text/plain; charset=ISO-8859-1 Cc: Ian Lepore , Paul Albrecht , freebsd-hackers@freebsd.org Subject: Re: kqueue periodic timer confusion X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Jul 2012 12:22:20 -0000 On Thu, Jul 12, 2012 at 5:25 PM, John Baldwin wrote: > On Thursday, July 12, 2012 11:08:47 am Davide Italiano wrote: >> On Thu, Jul 12, 2012 at 4:26 PM, John Baldwin wrote: >> > On Thursday, July 12, 2012 9:57:16 am Ian Lepore wrote: >> >> On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote: >> >> > On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote: >> >> > > On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote: >> >> > > > Hi, >> >> > > > >> >> > > > Sorry about this repost but I'm confused about the responses I received >> >> > > > in my last post so I'm looking for some clarification. >> >> > > > >> >> > > > Specifically, I though I could use the kqueue timer as essentially a >> >> > > > "drop in" replacement for linuxfd_create/read, but was surprised that >> >> > > > the accuracy of the kqueue timer is much less than what I need for my >> >> > > > application. >> >> > > > >> >> > > > So my confusion at this point is whether this is consider to be a bug or >> >> > > > "feature"? >> >> > > > >> >> > > > Here's some test code if you want to verify the problem: >> >> > > > >> >> > > > #include >> >> > > > #include >> >> > > > #include >> >> > > > #include >> >> > > > #include >> >> > > > #include >> >> > > > #include >> >> > > > #include >> >> > > > >> >> > > > int >> >> > > > main(void) >> >> > > > { >> >> > > > int i,msec; >> >> > > > int kq,nev; >> >> > > > struct kevent inqueue; >> >> > > > struct kevent outqueue; >> >> > > > struct timeval start,end; >> >> > > > >> >> > > > if ((kq = kqueue()) == -1) { >> >> > > > fprintf(stderr, "kqueue error!? errno = %s", >> >> > strerror(errno)); >> >> > > > exit(EXIT_FAILURE); >> >> > > > } >> >> > > > EV_SET(&inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0); >> >> > > > >> >> > > > gettimeofday(&start, 0); >> >> > > > for (i = 0; i < 50; i++) { >> >> > > > if ((nev = kevent(kq, &inqueue, 1, &outqueue, 1, NULL)) == >> >> > -1) { >> >> > > > fprintf(stderr, "kevent error!? errno = %s", >> >> > strerror(errno)); >> >> > > > exit(EXIT_FAILURE); >> >> > > > } else if (outqueue.flags & EV_ERROR) { >> >> > > > fprintf(stderr, "EV_ERROR: %s\n", >> >> > strerror(outqueue.data)); >> >> > > > exit(EXIT_FAILURE); >> >> > > > } >> >> > > > } >> >> > > > gettimeofday(&end, 0); >> >> > > > >> >> > > > msec = ((end.tv_sec - start.tv_sec) * 1000) + (((1000000 + >> >> > end.tv_usec - start.tv_usec) / 1000) - 1000); >> >> > > > >> >> > > > printf("msec = %d\n", msec); >> >> > > > >> >> > > > close(kq); >> >> > > > return EXIT_SUCCESS; >> >> > > > } >> >> > > > >> >> > > > >> >> > > >> >> > > What you are seeing is "just the way FreeBSD currently works." >> >> > > >> >> > > Sleeping (in most all of its various forms, and I've just looked at the >> >> > > kevent code to verify this is true there) is handled by converting the >> >> > > amount of time to sleep (usually specified in a timeval or timespec >> >> > > struct) to a count of timer ticks, using an internal routine called >> >> > > tvtohz() in kern/kern_time.c. That routine rounds up by one tick to >> >> > > account for the current tick. Whether that's a good idea or not (it >> >> > > probably was once, and probably not anymore) it's how things currently >> >> > > work, and could explain the fairly consistant +1ms you're seeing. >> >> > >> >> > This is all true, but mostly irrelevant for his case. EVFILT_TIMER >> >> > installs a periodic callout that executes KNOTE() and then resets itself (via >> >> > callout_reset()) each time it runs. This should generally be closer to >> >> > regulary spaced intervals than something that does: >> >> > >> >> >> >> In what way is it irrelevant? That is, what did I miss? It appears to >> >> me that the next callout is scheduled by calling timertoticks() passing >> >> a count of milliseconds, that count is converted to a struct timeval and >> >> passed to tvtohz() which is where the +1 adjustment happens. If you ask >> >> for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms. >> >> There is some time, likely a small number of microseconds, that you've >> >> consumed of the current tick, and that's what the +1 in tvtohz() is >> >> supposed to account for according to the comments. >> >> >> >> The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick >> >> and then adds one tick on top of that. That seems not quite right to >> >> me, except that it is a way to g'tee that you don't return early, and >> >> that is the one promise made by sleep routines on any OS; those magical >> >> "at least" words always appear in the docs. >> >> >> >> Actually what I'm missing (that I know of) is how the scheduler works. >> >> Maybe the +1 adjustment to account for the fraction of the current tick >> >> you've already consumed is the right thing to do, even when that >> >> fraction is 1uS or less of a 1mS tick. That would depend on scheduler >> >> behavior that I know nothing about. >> > >> > Ohhhhh. My bad, sorry. You are correct. It is a bug to use +1 in this >> > case. That is, the +1 makes sense when you are computing a one-time delta >> > for things like nanosleep(). It is incorrect when computing a periodic >> > delta such as for computing the interval for an itimer (setitimer) or >> > EVFILT_TIMER(). >> > >> > Hah, setitimer()'s callout (realitexpire) uses tvtohz - 1: >> > >> > sys/kern/kern_time.c: >> > >> > /* >> > * Real interval timer expired: >> > * send process whose timer expired an alarm signal. >> > * If time is not set up to reload, then just return. >> > * Else compute next time timer should go off which is > current time. >> > * This is where delay in processing this timeout causes multiple >> > * SIGALRM calls to be compressed into one. >> > * tvtohz() always adds 1 to allow for the time until the next clock >> > * interrupt being strictly less than 1 clock tick, but we don't want >> > * that here since we want to appear to be in sync with the clock >> > * interrupt even when we're delayed. >> > */ >> > void >> > realitexpire(void *arg) >> > { >> > struct proc *p; >> > struct timeval ctv, ntv; >> > >> > p = (struct proc *)arg; >> > PROC_LOCK(p); >> > kern_psignal(p, SIGALRM); >> > if (!timevalisset(&p->p_realtimer.it_interval)) { >> > timevalclear(&p->p_realtimer.it_value); >> > if (p->p_flag & P_WEXIT) >> > wakeup(&p->p_itcallout); >> > PROC_UNLOCK(p); >> > return; >> > } >> > for (;;) { >> > timevaladd(&p->p_realtimer.it_value, >> > &p->p_realtimer.it_interval); >> > getmicrouptime(&ctv); >> > if (timevalcmp(&p->p_realtimer.it_value, &ctv, >)) { >> > ntv = p->p_realtimer.it_value; >> > timevalsub(&ntv, &ctv); >> > callout_reset(&p->p_itcallout, tvtohz(&ntv) - 1, >> > realitexpire, p); >> > PROC_UNLOCK(p); >> > return; >> > } >> > } >> > /*NOTREACHED*/ >> > } >> > >> > Paul, try this patch for sys/kern/kern_event.c. It uses the same approach as >> > seitimer() above: >> > >> > Index: kern_event.c >> > =================================================================== >> > --- kern_event.c (revision 238365) >> > +++ kern_event.c (working copy) >> > @@ -538,7 +538,7 @@ filt_timerexpire(void *knx) >> > >> > if ((kn->kn_flags & EV_ONESHOT) != EV_ONESHOT) { >> > calloutp = (struct callout *)kn->kn_hook; >> > - callout_reset_curcpu(calloutp, timertoticks(kn->kn_sdata), >> > + callout_reset_curcpu(calloutp, timertoticks(kn->kn_sdata) - 1, >> > filt_timerexpire, kn); >> > } >> > } >> > >> > -- >> > John Baldwin >> > _______________________________________________ >> > freebsd-hackers@freebsd.org mailing list >> > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers >> > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" >> >> John, >> I don't think it's good to decrease by a unit the 'ticks' you pass to >> callout_reset_* KPI. >> If this have to be fixed it should be fixed at the callout level and >> not at the consumer level. In other words, subsystems that makes use >> of callout_reset_* should not deal with the inherent limitations of >> callout precision, as it is right now. > > Given that you are reworking callout already, it would seem a bit odd > to rework callout a separate time just to fix this bug. A simple fix > like this (which follows the same pattern we already use for setitimer()) > is something we can easily merge back to 8 and 9. Reworking callout in > a different way than you are already doing, OTOH, is not something we can > merge trivially. > > -- > John Baldwin I understand what you mean. Indeed I hadn't thought about the difficulties of merging that work back. Only a thing: if I'd were you before committing I'd add a comment explaining the reasons of the negative correction in the timeout value passed. Davide