Date: Wed, 21 Oct 2015 21:48:50 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Andriy Gapon <avg@FreeBSD.org> Cc: freebsd-hackers <freebsd-hackers@FreeBSD.org>, Poul-Henning Kamp <phk@phk.freebsd.dk>, Jung-uk Kim <jkim@FreeBSD.org> Subject: Re: instability of timekeeping Message-ID: <20151021184850.GX2257@kib.kiev.ua> In-Reply-To: <56274FFC.2000608@FreeBSD.org> References: <56261398.60102@FreeBSD.org> <56261FE6.90302@FreeBSD.org> <56274FFC.2000608@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Oct 21, 2015 at 11:42:36AM +0300, Andriy Gapon wrote: > On 20/10/2015 14:05, Andriy Gapon wrote: > > I performed a small observation. With ntpd disabled I ran `ntpdate -d` at 10 > > second intervals in a loop (done via `sleep 10`). It looks like for about 25 > > minutes the time offset between a reference server and my machine was quite > > stable. But then it sort of jumped about 2.5 seconds between two consecutive > > ntpdate invocations. > [snip] > > On 20/10/2015 13:12, Andriy Gapon wrote: > [snip] > >> kern.timecounter.tc.TSC-low.frequency: 1607357461 > >> kern.timecounter.tc.TSC-low.counter: 2457319922 > >> kern.timecounter.tc.TSC-low.mask: 4294967295 > [snip] > > Another observation and a hypothesis. > > I tried time counters other than TSC and I couldn't reproduce the issue with them. > > Another thing that occurred to me is that TSC-low.mask / TSC-low.frequency — 2.7 > seconds. > > >From these observations and reading some code comes my hypothesis. > First, I assume that visible values of TSCs on different cores are not perfectly > synchronized. Second, I think that there can be circumstances where > tc_ticktock() -> tc_windup() can get called on different cores sufficiently > close in time that the later call would see TSC value which is "before" > (sequentially smaller) than the TSC value read earlier (on the other core). In > that case the delta between the readings would be close to TSC-low.mask. > > Right now I do not have any proof that what the hypothesis says is what actually > happens. On the other hand, I do not see anything that would prevent the > hypothesized situation from occurring. > > To add more weight to the hypothesis: > cpu1:invltlb 674384 4 > cpu1:invlrng 453020 3 > cpu1:invlpg 108772578 652 > cpu1:preempt 37435000 224 > cpu1:ast 36757 0 > cpu1:rendezvous 9473 0 > cpu1:hardclock 22267434 133 > As you can see I am currently running workloads that result in a very > significant number of IPIs, especially the page invalidation IPIs. Given that > all (x86) IPIs have the same priority (based on their vector numbers) I think > it's plausible that the hardclock IPI could get arbitrarily delayed. > > I guess I could add a tracepoint to record deltas that are close to a current > timecounter's mask. > > Assuming the hypothesis is correct I see two possible ways to work-around the > problem: > > 1. Increase tsc_shift, so that the cross-CPU TSC differences are smaller (at the > cost of lower resolution). That should reduce the risk of seeing "backwards" > TSC values. Judging from numbers that tools/tools/tscdrift produces I can set > tsc_shift to 7. The resulting resolution should not be worse than that of HPET > or ACPI-fast counters with the benefit of TSC being much faster to read. > > 2. Change the code, so that tc_windup() is always called on the same CPU. E.g. > it could be the BSP or a CPU that receives the actual timer interrupts (as > opposed to the hardclock IPIs). This should help with the timekeeping, but > won't help with the "jitter" in binuptime() and friends. > > 3. In tc_delta() somehow detect and filter out "slightly behind" timecounter > readings. Not sure if this is possible at all. Am I right that the tsc synchronization test passes on your machine ? If yes, you probably cannot read 'slightly behind' timecounter after IPI on other core. Might be, try to change CPUID instruction in the test to MFENCE and see if the test still able to pass. Does the symptom disappear if you switch the eventtimer to LAPIC ? What happens if you turn off usermode gettimeofday() by setting kern.timercounter.fast_gettime to 0 ?
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20151021184850.GX2257>