From owner-freebsd-hackers@freebsd.org Tue Oct 20 10:14:15 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 149A6A1835B for ; Tue, 20 Oct 2015 10:14:15 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 8AF47A3; Tue, 20 Oct 2015 10:14:12 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id NAA02295; Tue, 20 Oct 2015 13:14:04 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1ZoTvg-0004vO-5d; Tue, 20 Oct 2015 13:14:04 +0300 To: freebsd-hackers From: Andriy Gapon Subject: instability of timekeeping X-Enigmail-Draft-Status: N1110 Message-ID: <56261398.60102@FreeBSD.org> Date: Tue, 20 Oct 2015 13:12:40 +0300 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Oct 2015 10:14:15 -0000 I recently replaced a 2-core Athlon II X2 CPU with a same-family Phenom II X4 CPU and after that I started noticing problems with the timekeeping. It seems that from time to time the jitter becomes so high that ntpd goes nuts or stops synchronizing or panics. Here how the current event timer and time counter configurations look (slightly trimmed): $ sysctl kern.timecounter kern.timecounter.tsc_shift: 1 kern.timecounter.smp_tsc_adjust: 0 kern.timecounter.smp_tsc: 1 kern.timecounter.invariant_tsc: 1 kern.timecounter.fast_gettime: 1 kern.timecounter.tick: 1 kern.timecounter.choice: TSC-low(800) ACPI-fast(900) HPET(950) i8254(0) dummy(-1000000) kern.timecounter.hardware: TSC-low kern.timecounter.alloweddeviation: 5 kern.timecounter.stepwarnings: 0 kern.timecounter.tc.TSC-low.quality: 800 kern.timecounter.tc.TSC-low.frequency: 1607357461 kern.timecounter.tc.TSC-low.counter: 2457319922 kern.timecounter.tc.TSC-low.mask: 4294967295 kern.timecounter.tc.ACPI-fast.quality: 900 kern.timecounter.tc.HPET.quality: 950 kern.timecounter.tc.i8254.quality: 0 $ sysctl kern.eventtimer kern.eventtimer.periodic: 0 kern.eventtimer.timer: HPET kern.eventtimer.idletick: 0 kern.eventtimer.singlemul: 2 kern.eventtimer.choice: HPET(450) HPET1(450) HPET2(450) LAPIC(400) i8254(100) RTC(0) kern.eventtimer.et.RTC.quality: 0 kern.eventtimer.et.HPET2.quality: 450 kern.eventtimer.et.HPET1.quality: 450 kern.eventtimer.et.HPET.quality: 450 kern.eventtimer.et.HPET.frequency: 14318180 kern.eventtimer.et.HPET.flags: 3 kern.eventtimer.et.i8254.quality: 100 kern.eventtimer.et.LAPIC.quality: 400 Please note is that TSC-low time counter is chosen administratively whereas the event timer configuration is fully automatic. The previous configuration was produced in the same fashion. One notable difference is that the previous CPU was 2-core and so two HPET timers were virtually combined into a single timer with per-CPU capability. In other words, two HPET timers used two drive two cores. The newer CPU has four cores, so there are not enough HPET timers to drive each core independently and thus there is no virtual bundling. Thus, one HPET timer drives one core and that core forwards the interrupts to other cores via IPIs as necessary. But I am far from sure that the stated difference is actually the source of the instability. There could be other hardware-related reasons, of course. I wonder if there is a good way to analyze / debug this situation to see what exactly is wrong. For now I am thinking about trying different time counter and event timer configurations, but I would prefer a more guided "scientific" approach over a blind trial and error one. I would appreciate any help, suggestions, hints. The CPUs: CPU: AMD Athlon(tm) II X2 250 Processor (3013.79-MHz K8-class CPU) Origin="AuthenticAMD" Id=0x100f62 Family=0x10 Model=0x6 Stepping=2 Features=0x178bfbff Features2=0x802009 AMD Features=0xee500800 AMD Features2=0x37ff SVM: Features=0xf Revision=1, ASIDs=64 TSC: P-state invariant CPU: AMD Phenom(tm) II X4 955 Processor (3214.71-MHz K8-class CPU) Origin="AuthenticAMD" Id=0x100f43 Family=0x10 Model=0x4 Stepping=3 Features=0x178bfbff Features2=0x802009 AMD Features=0xee500800 AMD Features2=0x37ff SVM: Features=0xf Revision=1, ASIDs=64 TSC: P-state invariant -- Andriy Gapon