From owner-freebsd-hackers@freebsd.org Tue Oct 20 11:06:57 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2A117A19685 for ; Tue, 20 Oct 2015 11:06:57 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 4A1F314AA; Tue, 20 Oct 2015 11:06:55 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA03152; Tue, 20 Oct 2015 14:06:54 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1ZoUko-0004zJ-5g; Tue, 20 Oct 2015 14:06:54 +0300 Subject: Re: instability of timekeeping To: freebsd-hackers References: <56261398.60102@FreeBSD.org> From: Andriy Gapon X-Enigmail-Draft-Status: N1110 Message-ID: <56261FE6.90302@FreeBSD.org> Date: Tue, 20 Oct 2015 14:05:10 +0300 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <56261398.60102@FreeBSD.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Oct 2015 11:06:57 -0000 I performed a small observation. With ntpd disabled I ran `ntpdate -d` at 10 second intervals in a loop (done via `sleep 10`). It looks like for about 25 minutes the time offset between a reference server and my machine was quite stable. But then it sort of jumped about 2.5 seconds between two consecutive ntpdate invocations. 20 Oct 13:21:02 ntpdate[85157]: ntpdate 4.2.8p3-a (1) Looking for host ntp.time.in.ua and service ntp 62.149.0.30 reversed to ntp.time.in.ua transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) server 62.149.0.30, port 123 stratum 1, precision -20, leap 00, trust 000 refid [GPS], delay 0.03326, dispersion 0.00014 transmitted 4, in filter 4 reference time: d9d09422.65c26838 Tue, Oct 20 2015 13:21:22.397 originate timestamp: d9d0942f.a03ebee7 Tue, Oct 20 2015 13:21:35.625 transmit timestamp: d9d09414.e9ed230c Tue, Oct 20 2015 13:21:08.913 filter delay: 0.03372 0.03491 0.03378 0.03326 0.00000 0.00000 0.00000 0.00000 filter offset: 26.70845 26.70896 26.70834 26.70832 0.000000 0.000000 0.000000 0.000000 delay 0.03326, dispersion 0.00014 offset 26.708320 20 Oct 13:21:08 ntpdate[85157]: step time server 62.149.0.30 offset 26.708320 sec [...] 20 Oct 13:45:20 ntpdate[87088]: ntpdate 4.2.8p3-a (1) Looking for host ntp.time.in.ua and service ntp 62.149.0.30 reversed to ntp.time.in.ua transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) server 62.149.0.30, port 123 stratum 1, precision -20, leap 00, trust 000 refid [GPS], delay 0.03442, dispersion 0.00018 transmitted 4, in filter 4 reference time: d9d099d3.67703742 Tue, Oct 20 2015 13:45:39.404 originate timestamp: d9d099e1.8a1b6c78 Tue, Oct 20 2015 13:45:53.539 transmit timestamp: d9d099c6.d34c1e00 Tue, Oct 20 2015 13:45:26.825 filter delay: 0.03481 0.03442 0.03448 0.03458 0.00000 0.00000 0.00000 0.00000 filter offset: 26.70957 26.70943 26.70913 26.70957 0.000000 0.000000 0.000000 0.000000 delay 0.03442, dispersion 0.00018 offset 26.709437 20 Oct 13:45:26 ntpdate[87088]: step time server 62.149.0.30 offset 26.709437 sec 20 Oct 13:45:36 ntpdate[87094]: ntpdate 4.2.8p3-a (1) Looking for host ntp.time.in.ua and service ntp 62.149.0.30 reversed to ntp.time.in.ua transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) server 62.149.0.30, port 123 stratum 1, precision -20, leap 00, trust 000 refid [GPS], delay 0.03349, dispersion 0.00012 transmitted 4, in filter 4 reference time: d9d099e5.63ead89c Tue, Oct 20 2015 13:45:57.390 originate timestamp: d9d099f4.6364c717 Tue, Oct 20 2015 13:46:12.388 transmit timestamp: d9d099d7.00939943 Tue, Oct 20 2015 13:45:43.002 filter delay: 0.03349 0.03413 0.03419 0.03455 0.00000 0.00000 0.00000 0.00000 filter offset: 29.38105 29.38106 29.38134 29.38149 0.000000 0.000000 0.000000 0.000000 delay 0.03349, dispersion 0.00012 offset 29.381055 20 Oct 13:45:43 ntpdate[87094]: step time server 62.149.0.30 offset 29.381055 sec On 20/10/2015 13:12, Andriy Gapon wrote: > > I recently replaced a 2-core Athlon II X2 CPU with a same-family Phenom II X4 > CPU and after that I started noticing problems with the timekeeping. It seems > that from time to time the jitter becomes so high that ntpd goes nuts or stops > synchronizing or panics. > > Here how the current event timer and time counter configurations look (slightly > trimmed): > $ sysctl kern.timecounter > kern.timecounter.tsc_shift: 1 > kern.timecounter.smp_tsc_adjust: 0 > kern.timecounter.smp_tsc: 1 > kern.timecounter.invariant_tsc: 1 > kern.timecounter.fast_gettime: 1 > kern.timecounter.tick: 1 > kern.timecounter.choice: TSC-low(800) ACPI-fast(900) HPET(950) i8254(0) > dummy(-1000000) > kern.timecounter.hardware: TSC-low > kern.timecounter.alloweddeviation: 5 > kern.timecounter.stepwarnings: 0 > kern.timecounter.tc.TSC-low.quality: 800 > kern.timecounter.tc.TSC-low.frequency: 1607357461 > kern.timecounter.tc.TSC-low.counter: 2457319922 > kern.timecounter.tc.TSC-low.mask: 4294967295 > kern.timecounter.tc.ACPI-fast.quality: 900 > kern.timecounter.tc.HPET.quality: 950 > kern.timecounter.tc.i8254.quality: 0 > $ sysctl kern.eventtimer > kern.eventtimer.periodic: 0 > kern.eventtimer.timer: HPET > kern.eventtimer.idletick: 0 > kern.eventtimer.singlemul: 2 > kern.eventtimer.choice: HPET(450) HPET1(450) HPET2(450) LAPIC(400) i8254(100) RTC(0) > kern.eventtimer.et.RTC.quality: 0 > kern.eventtimer.et.HPET2.quality: 450 > kern.eventtimer.et.HPET1.quality: 450 > kern.eventtimer.et.HPET.quality: 450 > kern.eventtimer.et.HPET.frequency: 14318180 > kern.eventtimer.et.HPET.flags: 3 > kern.eventtimer.et.i8254.quality: 100 > kern.eventtimer.et.LAPIC.quality: 400 > > Please note is that TSC-low time counter is chosen administratively whereas the > event timer configuration is fully automatic. > The previous configuration was produced in the same fashion. > One notable difference is that the previous CPU was 2-core and so two HPET > timers were virtually combined into a single timer with per-CPU capability. In > other words, two HPET timers used two drive two cores. > The newer CPU has four cores, so there are not enough HPET timers to drive each > core independently and thus there is no virtual bundling. Thus, one HPET timer > drives one core and that core forwards the interrupts to other cores via IPIs as > necessary. > > But I am far from sure that the stated difference is actually the source of the > instability. There could be other hardware-related reasons, of course. > > I wonder if there is a good way to analyze / debug this situation to see what > exactly is wrong. For now I am thinking about trying different time counter and > event timer configurations, but I would prefer a more guided "scientific" > approach over a blind trial and error one. > > I would appreciate any help, suggestions, hints. > > The CPUs: > CPU: AMD Athlon(tm) II X2 250 Processor (3013.79-MHz K8-class CPU) > Origin="AuthenticAMD" Id=0x100f62 Family=0x10 Model=0x6 Stepping=2 > > Features=0x178bfbff > Features2=0x802009 > AMD Features=0xee500800 > AMD > Features2=0x37ff > SVM: Features=0xf > Revision=1, ASIDs=64 > TSC: P-state invariant > > CPU: AMD Phenom(tm) II X4 955 Processor (3214.71-MHz K8-class CPU) > Origin="AuthenticAMD" Id=0x100f43 Family=0x10 Model=0x4 Stepping=3 > > Features=0x178bfbff > Features2=0x802009 > AMD Features=0xee500800 > AMD > Features2=0x37ff > SVM: Features=0xf > Revision=1, ASIDs=64 > TSC: P-state invariant > -- Andriy Gapon