From owner-freebsd-hackers@freebsd.org Wed Oct 28 16:50:13 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A38D1A1FFE0 for ; Wed, 28 Oct 2015 16:50:13 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 3E8A11B5E; Wed, 28 Oct 2015 16:50:12 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id SAA10967; Wed, 28 Oct 2015 18:50:04 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1ZrTvI-000IvY-18; Wed, 28 Oct 2015 18:50:04 +0200 Subject: Re: instability of timekeeping To: Konstantin Belousov , Alexander Motin References: <56261398.60102@FreeBSD.org> <56261FE6.90302@FreeBSD.org> <56274FFC.2000608@FreeBSD.org> <20151021184850.GX2257@kib.kiev.ua> <562F3E2F.2010100@FreeBSD.org> <20151027115810.GU2257@kib.kiev.ua> <562F8109.4050203@FreeBSD.org> <20151027140403.GB2257@kib.kiev.ua> Cc: freebsd-hackers , Poul-Henning Kamp , Jung-uk Kim From: Andriy Gapon X-Enigmail-Draft-Status: N1110 Message-ID: <5630FC3B.2070908@FreeBSD.org> Date: Wed, 28 Oct 2015 18:47:55 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <20151027140403.GB2257@kib.kiev.ua> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Oct 2015 16:50:13 -0000 On 27/10/2015 16:04, Konstantin Belousov wrote: > On Tue, Oct 27, 2015 at 03:50:01PM +0200, Andriy Gapon wrote: >> On 27/10/2015 13:58, Konstantin Belousov wrote: >>> On Tue, Oct 27, 2015 at 11:04:47AM +0200, Andriy Gapon wrote: >>>> And now another observation. I have C1E option enabled in BIOS. It >>>> means that if all cores enter C1 state, then the whole processor >>>> is magically placed into a deep C-state (C3, I think). LAPIC timer >>>> on this CPU model does not run in the deep C-state. So, I had to >>>> disable C1E option to test the LAPIC timer in a useful way. But >>>> before actually testing it I first tried to reproduce the problem. As >>>> you might have already guessed the problem is gone with that option >>>> disabled. Scratching my head to understand the implications of this >>>> observation. >>> >>> Most obvious explanation would be that the latency of wakeup is very large. >>> What is the HPET frequency when the jitter occur ? >>> >> >> kern.timecounter.tc.TSC-low.frequency: 1607351869 >> kern.eventtimer.et.HPET.frequency: 14318180 >> >> Or did you mean the actual rate of timer interrupts? > > Actual rate, and I used the wrong word. Since most likely your eventtimer > is not periodic, I mean the next timer interrupt deadline. > I wasn't sure how to answer your question, so I went ahead and obtained a little bit of debugging information using KTR. For a start, here is a patch that adds couple more trace points: https://people.freebsd.org/~avg/timekeeping-ktr.patch Please note that the patch has a small bug in hardclock_cnt(): + CTR1(KTR_SPARE2, "hardclock_cnt at %d: newticks=%d", + newticks); Two format specifiers but only one argument. I mention this, so that the trace is not misinterpreted. Then, here is a snippet of the trace that seems to be of interest: https://people.freebsd.org/~avg/timekeeping.ktrdump.txt I must admit that I have a hard time interpreting what that trace says. As I understand the KTR timestamps are raw TSC values. So, there is a huge gap between entries #999543 and #999544. It's larger than 2 * 2^32, so even after applying tsc_shift it's still larger that 2^32. I am not sure how the HPET timer could get programmed for such a long delay. Or maybe there is some lower level problem that caused the interrupt to be delayed by (16450821032024 - 16442184278422) / 2 / 1607351869 ≈ 2.687 seconds. That's quite a long time. One thing that I can tell is that my earlier hypothesis seems to be wrong as all raw TSC values are increasing. Grepping for lines where the timer gets reprogrammed I was surprised to see the following: 999543 1 16442184278422 load at 1: next 5042.d25692b3 eq 0 999507 3 16442183843960 load at 3: next 5042.d23e2146 eq 0 999503 2 16442183825488 load at 2: next 5042.d25692b3 eq 0 999472 0 16442183792870 load at 0: next 5042.d23e2146 eq 0 This looks like the next event time has flipped between those two values somehow. Also, the following seemed a little bit strange: 999508 3 16442183860624 ipi at 3: now 5042.d2301033 999467 3 16442183787851 active at 3: now 5042.d2306108 The later even reports a value of 'now' that is smaller. I think that that can be explained by the fact that in one case the value is obtained via sbinuptime() while in the other case it's taken from state->now, which is set when an actual timer interrupt is received (before sending a hardclock IPI). In either case I am going to add a few more trace points in et_start and the HPET timer code and see if I can catch anything interesting there. -- Andriy Gapon