Date: Fri, 03 Oct 2014 10:16:56 -0600 From: Ian Lepore <ian@FreeBSD.org> To: John Baldwin <jhb@freebsd.org> Cc: freebsd-hackers@freebsd.org, Adrian Chadd <adrian@freebsd.org>, Paul Albrecht <palbrecht@glccom.com> Subject: Re: freebsd 10 kqueue timer regression Message-ID: <1412353016.12052.82.camel@revolution.hippie.lan> In-Reply-To: <2499075.KMdpQjyIZI@ralph.baldwin.cx> References: <8ABC0977-FB8F-45E7-ACCC-BFA92EE22E1C@glccom.com> <1412288106.12052.39.camel@revolution.hippie.lan> <1412297389.12052.46.camel@revolution.hippie.lan> <2499075.KMdpQjyIZI@ralph.baldwin.cx>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 2014-10-03 at 08:50 -0400, John Baldwin wrote: > On Thursday, October 02, 2014 06:49:49 PM Ian Lepore wrote: > > On Thu, 2014-10-02 at 16:15 -0600, Ian Lepore wrote: > > > On Thu, 2014-10-02 at 16:00 -0400, John Baldwin wrote: > > > > On Thursday, October 02, 2014 3:53:28 pm Ian Lepore wrote: > > > > > On Thu, 2014-10-02 at 12:47 -0700, Adrian Chadd wrote: > > > > > > I'm confused; it's doing 50 loops of a 20msec timer, right? So > > > > > > that's > > > > > > > > 1000ms. > > > > > > > > > Yes, so the entire loop should take 1000ms maybe + 1ms. Instead it > > > > > takes 1070. When I run it on an armv6 system running -current it > > > > > takes > > > > > 1050. When I run it on my 8.4 desktop (pre-eventtimers) it takes > > > > > 1013. > > > > > > > > > > -- Ian > > > > > > > > What if you set kern.eventtimer.periodic=1? > > > > > > Some interesting results... > > > > > > HZ 100 500 1000 > > > > > > --------------------------------- > > > periodic=0 1050 1050 1080 > > > periodic=1 1110 1012 1049 > > > > > > > > > The 1080 number was +/- 3ms, all the other numbers were +/- 1ms (except > > > for one outlier of 24363 at 100Hz non-periodic which I'm going to > > > pretend didn't happen). > > > > > > The 1050 numbers are probably each 20ms sleep actually taking 21ms, but > > > the old tvtohz code with -1 adjustments from the old email thread isn't > > > in play anymore. I don't know how to account for the other numbers at > > > all. There's all kinds of stuff I don't understand in the new code > > > involving tick thresholds and such. > > > > > > -- Ian > > > > The attached patch seems to fix the problem in what I think is the most > > correct way: scheduling the callout with absolute times based on the > > time the current event was scheduled for plus the requested interval. > > The net effect should be metronomic events that do not drift (or phase > > shift if you prefer) over time, regardless of any latency involved in > > processing the events. > > > > This makes all the numbers in the tests I ran above come out 1000. > > > > It doesn't make me understand the strange results from the prior tests > > any better. > > > > -- Ian > > Are you running ntpd or ptpd? If so, perhaps try the original tests without > the patch. > Yep, ntpd is running, and I was repeatedly running the test right after rebooting to change kern.hz. I realized that's the explanation for the 24-second outlier (which didn't take 24 seconds in reality) -- the hardware I'm testing on has no RTC, and ntpd must have stepped the clock 24 seconds during the 1 second test loop. I think ntpd probably explains another anomaly I saw in more testing after that post last night. I tested with a 20uS period instead of 20mS and I kept seeing the loop take ~999810uS. Initially I thought 190ppm of clock error was a bit much to explain away easily, then I realized it's quite likely that ntpd is steering hard to get on-time. (The normal drift rate for the timecounter clock on the system is ~18ppm.) Taking ntpd out of the picture will remove artifacts from its clock steering, but really then what the test program loop is doing is measuring a pair of unsteered clocks against each other. Depending on the hardware, those clocks may derive from a common oscillator or not (in which case they are most assuredly keeping time at slightly different rates, for such is the nature of clocks). With the hardware I'm testing on, I think I can arrange to have eventtimers and timecounters both running from the same underlying oscillator, then we should be truly measuring only the software behavior. > That said, I think one of the reasons the old code worked was that the > previous callout had the equivalent of the C_HARDCLOCK flag set. Thus, when > the timer interrupt fires and we rescheuled for N ticks, it was actually N > ticks - <time since the timer interrupt that triggered this callout> (if that > makes sense?). Now I think what is happening is that that the > callout_reset_sbt() is doing 'delta time t + <time since the timer interrupt>' > and that that accounts for your numbers. I'm lost with this paragraph. I don't know what you mean by "old code" (pre-eventtimer? pre-my-changes?). I do see stuff in the current code that, under some conditions, mixes in data involving the prior timecounter tick and also stuff involving precision values, but I just can't make sense of what it's doing. > In that case, I wonder if the reason > you are seeing such large gaps is that you have Cx states enabled (including > C1E) and that those are making that second factor large enough to account for > the skew. Try setting machdep.idle=spin as a test (this is a better test than > ntpd/ptpd I think). > I'm testing on ARM hardware with no power management enabled other than the idle loop executing a wait-for-interrupt instruction that does not stop the eventtimer and timecounter clocks, and the latency for coming out of that state is on the order of 1uS. > The reason I don't like using C_ABSOLUTE is that in theory that will result in > longer or shorter intervals if time is adjusted via ntp/ptp, whereas > historically EVFILT_TIMER has been CLOCK_UPTIME based instead of > CLOCK_REALTIME. > I don't see anything in the code that implies C_ABSOLUTE uses a different clock. In fact, it's even documented as being absolute time since boot. In callout_reset_sbt_on() if the absolute flag is set the sbt you pass in is used as-is. If the flag is not set, the sbt you pass in is added to either the current uptime or the last tc tick uptime. For getting a truly periodic timer event, this absolute time technique is required. Adding a delta to something that isn't the time the prior was *scheduled* to occur will result in a phase shift in the event occurrances over time (phase relative to uptime top-of-second). -- Ian
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1412353016.12052.82.camel>