Date: Thu, 19 Jul 2012 23:33:25 +0300 From: Andriy Gapon <avg@FreeBSD.org> To: freebsd-hackers@FreeBSD.org Subject: Re: kqueue periodic timer confusion Message-ID: <50086F15.2070106@FreeBSD.org> In-Reply-To: <CACYV=-FDnzv%2BwQ5-Ce0a90uGcya2NppBMPaNaLsc2KrV2GavVQ@mail.gmail.com> References: <1342036332.8313.8.camel@albrecht-desktop> <201207121026.37849.jhb@freebsd.org> <201207121125.01228.jhb@freebsd.org> <CACYV=-HryymtawfW9g4LKQy0D2f1v-iyPah5JSfAFCzebhfnew@mail.gmail.com> <1342710367.6938.6.camel@albrecht-desktop> <CACYV=-FDnzv%2BwQ5-Ce0a90uGcya2NppBMPaNaLsc2KrV2GavVQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
on 19/07/2012 18:11 Davide Italiano said the following: > On Thu, Jul 19, 2012 at 5:06 PM, Paul Albrecht <albrecht@glccom.com> wrote: >> On Fri, 2012-07-13 at 07:22 -0500, Davide Italiano wrote: >>> On Thu, Jul 12, 2012 at 5:25 PM, John Baldwin <jhb@freebsd.org> wrote: >>>> On Thursday, July 12, 2012 11:08:47 am Davide Italiano wrote: >>>>> On Thu, Jul 12, 2012 at 4:26 PM, John Baldwin <jhb@freebsd.org> wrote: >>>>>> On Thursday, July 12, 2012 9:57:16 am Ian Lepore wrote: >>>>>>> On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote: >>>>>>>> On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote: >>>>>>>>> On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Sorry about this repost but I'm confused about the responses I received >>>>>>>>>> in my last post so I'm looking for some clarification. >>>>>>>>>> >>>>>>>>>> Specifically, I though I could use the kqueue timer as essentially a >>>>>>>>>> "drop in" replacement for linuxfd_create/read, but was surprised that >>>>>>>>>> the accuracy of the kqueue timer is much less than what I need for my >>>>>>>>>> application. >>>>>>>>>> >>>>>>>>>> So my confusion at this point is whether this is consider to be a bug or >>>>>>>>>> "feature"? >>>>>>>>>> >>>>>>>>>> Here's some test code if you want to verify the problem: >>>>>>>>>> >>>>>>>>>> #include <stdio.h> >>>>>>>>>> #include <stdlib.h> >>>>>>>>>> #include <string.h> >>>>>>>>>> #include <unistd.h> >>>>>>>>>> #include <errno.h> >>>>>>>>>> #include <sys/types.h> >>>>>>>>>> #include <sys/event.h> >>>>>>>>>> #include <sys/time.h> >>>>>>>>>> >>>>>>>>>> int >>>>>>>>>> main(void) >>>>>>>>>> { >>>>>>>>>> int i,msec; >>>>>>>>>> int kq,nev; >>>>>>>>>> struct kevent inqueue; >>>>>>>>>> struct kevent outqueue; >>>>>>>>>> struct timeval start,end; >>>>>>>>>> >>>>>>>>>> if ((kq = kqueue()) == -1) { >>>>>>>>>> fprintf(stderr, "kqueue error!? errno = %s", >>>>>>>> strerror(errno)); >>>>>>>>>> exit(EXIT_FAILURE); >>>>>>>>>> } >>>>>>>>>> EV_SET(&inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0); >>>>>>>>>> >>>>>>>>>> gettimeofday(&start, 0); >>>>>>>>>> for (i = 0; i < 50; i++) { >>>>>>>>>> if ((nev = kevent(kq, &inqueue, 1, &outqueue, 1, NULL)) == >>>>>>>> -1) { >>>>>>>>>> fprintf(stderr, "kevent error!? errno = %s", >>>>>>>> strerror(errno)); >>>>>>>>>> exit(EXIT_FAILURE); >>>>>>>>>> } else if (outqueue.flags & EV_ERROR) { >>>>>>>>>> fprintf(stderr, "EV_ERROR: %s\n", >>>>>>>> strerror(outqueue.data)); >>>>>>>>>> exit(EXIT_FAILURE); >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> gettimeofday(&end, 0); >>>>>>>>>> >>>>>>>>>> msec = ((end.tv_sec - start.tv_sec) * 1000) + (((1000000 + >>>>>>>> end.tv_usec - start.tv_usec) / 1000) - 1000); >>>>>>>>>> >>>>>>>>>> printf("msec = %d\n", msec); >>>>>>>>>> >>>>>>>>>> close(kq); >>>>>>>>>> return EXIT_SUCCESS; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> What you are seeing is "just the way FreeBSD currently works." >>>>>>>>> >>>>>>>>> Sleeping (in most all of its various forms, and I've just looked at the >>>>>>>>> kevent code to verify this is true there) is handled by converting the >>>>>>>>> amount of time to sleep (usually specified in a timeval or timespec >>>>>>>>> struct) to a count of timer ticks, using an internal routine called >>>>>>>>> tvtohz() in kern/kern_time.c. That routine rounds up by one tick to >>>>>>>>> account for the current tick. Whether that's a good idea or not (it >>>>>>>>> probably was once, and probably not anymore) it's how things currently >>>>>>>>> work, and could explain the fairly consistant +1ms you're seeing. >>>>>>>> >>>>>>>> This is all true, but mostly irrelevant for his case. EVFILT_TIMER >>>>>>>> installs a periodic callout that executes KNOTE() and then resets itself (via >>>>>>>> callout_reset()) each time it runs. This should generally be closer to >>>>>>>> regulary spaced intervals than something that does: >>>>>>>> >>>>>>> >>>>>>> In what way is it irrelevant? That is, what did I miss? It appears to >>>>>>> me that the next callout is scheduled by calling timertoticks() passing >>>>>>> a count of milliseconds, that count is converted to a struct timeval and >>>>>>> passed to tvtohz() which is where the +1 adjustment happens. If you ask >>>>>>> for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms. >>>>>>> There is some time, likely a small number of microseconds, that you've >>>>>>> consumed of the current tick, and that's what the +1 in tvtohz() is >>>>>>> supposed to account for according to the comments. >>>>>>> >>>>>>> The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick >>>>>>> and then adds one tick on top of that. That seems not quite right to >>>>>>> me, except that it is a way to g'tee that you don't return early, and >>>>>>> that is the one promise made by sleep routines on any OS; those magical >>>>>>> "at least" words always appear in the docs. >>>>>>> >>>>>>> Actually what I'm missing (that I know of) is how the scheduler works. >>>>>>> Maybe the +1 adjustment to account for the fraction of the current tick >>>>>>> you've already consumed is the right thing to do, even when that >>>>>>> fraction is 1uS or less of a 1mS tick. That would depend on scheduler >>>>>>> behavior that I know nothing about. >>>>>> >>>>>> Ohhhhh. My bad, sorry. You are correct. It is a bug to use +1 in this >>>>>> case. That is, the +1 makes sense when you are computing a one-time delta >>>>>> for things like nanosleep(). It is incorrect when computing a periodic >>>>>> delta such as for computing the interval for an itimer (setitimer) or >>>>>> EVFILT_TIMER(). >>>>>> >>>>>> Hah, setitimer()'s callout (realitexpire) uses tvtohz - 1: >>>>>> >>>>>> sys/kern/kern_time.c: >>>>>> >>>>>> /* >>>>>> * Real interval timer expired: >>>>>> * send process whose timer expired an alarm signal. >>>>>> * If time is not set up to reload, then just return. >>>>>> * Else compute next time timer should go off which is > current time. >>>>>> * This is where delay in processing this timeout causes multiple >>>>>> * SIGALRM calls to be compressed into one. >>>>>> * tvtohz() always adds 1 to allow for the time until the next clock >>>>>> * interrupt being strictly less than 1 clock tick, but we don't want >>>>>> * that here since we want to appear to be in sync with the clock >>>>>> * interrupt even when we're delayed. >>>>>> */ >>>>>> void >>>>>> realitexpire(void *arg) >>>>>> { >>>>>> struct proc *p; >>>>>> struct timeval ctv, ntv; >>>>>> >>>>>> p = (struct proc *)arg; >>>>>> PROC_LOCK(p); >>>>>> kern_psignal(p, SIGALRM); >>>>>> if (!timevalisset(&p->p_realtimer.it_interval)) { >>>>>> timevalclear(&p->p_realtimer.it_value); >>>>>> if (p->p_flag & P_WEXIT) >>>>>> wakeup(&p->p_itcallout); >>>>>> PROC_UNLOCK(p); >>>>>> return; >>>>>> } >>>>>> for (;;) { >>>>>> timevaladd(&p->p_realtimer.it_value, >>>>>> &p->p_realtimer.it_interval); >>>>>> getmicrouptime(&ctv); >>>>>> if (timevalcmp(&p->p_realtimer.it_value, &ctv, >)) { >>>>>> ntv = p->p_realtimer.it_value; >>>>>> timevalsub(&ntv, &ctv); >>>>>> callout_reset(&p->p_itcallout, tvtohz(&ntv) - 1, >>>>>> realitexpire, p); >>>>>> PROC_UNLOCK(p); >>>>>> return; >>>>>> } >>>>>> } >>>>>> /*NOTREACHED*/ >>>>>> } >>>>>> >>>>>> Paul, try this patch for sys/kern/kern_event.c. It uses the same approach as >>>>>> seitimer() above: >>>>>> >>>>>> Index: kern_event.c >>>>>> =================================================================== >>>>>> --- kern_event.c (revision 238365) >>>>>> +++ kern_event.c (working copy) >>>>>> @@ -538,7 +538,7 @@ filt_timerexpire(void *knx) >>>>>> >>>>>> if ((kn->kn_flags & EV_ONESHOT) != EV_ONESHOT) { >>>>>> calloutp = (struct callout *)kn->kn_hook; >>>>>> - callout_reset_curcpu(calloutp, timertoticks(kn->kn_sdata), >>>>>> + callout_reset_curcpu(calloutp, timertoticks(kn->kn_sdata) - 1, >>>>>> filt_timerexpire, kn); >>>>>> } >>>>>> } >>>>>> >>>>>> -- >>>>>> John Baldwin >>>>>> _______________________________________________ >>>>>> freebsd-hackers@freebsd.org mailing list >>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers >>>>>> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" >>>>> >>>>> John, >>>>> I don't think it's good to decrease by a unit the 'ticks' you pass to >>>>> callout_reset_* KPI. >>>>> If this have to be fixed it should be fixed at the callout level and >>>>> not at the consumer level. In other words, subsystems that makes use >>>>> of callout_reset_* should not deal with the inherent limitations of >>>>> callout precision, as it is right now. >>>> >>>> Given that you are reworking callout already, it would seem a bit odd >>>> to rework callout a separate time just to fix this bug. A simple fix >>>> like this (which follows the same pattern we already use for setitimer()) >>>> is something we can easily merge back to 8 and 9. Reworking callout in >>>> a different way than you are already doing, OTOH, is not something we can >>>> merge trivially. >>>> >>>> -- >>>> John Baldwin >>> >>> I understand what you mean. Indeed I hadn't thought about the >>> difficulties of merging that work back. Only a thing: if I'd were you >>> before committing I'd add a comment explaining the reasons of the >>> negative correction in the timeout value passed. >>> >> >> ... and you're going to test the kqueue perioic timer after you make >> your changes to make sure there aren't any regressions? Right? >> > Yes, this is the idea. If you scrolled down here, then probably you have already learned what over-quoting is :-) >> The reason I'm asking is because darwin's bsd subsystem uses the callout >> framework for its kqueue implementation and the periodic timer is of >> course broken. >> >>> Davide >> -- >> Paul Albrecht >> Including signatures into quotes is a sign of a bad mail user agent. -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50086F15.2070106>