From owner-freebsd-hackers@freebsd.org Tue Dec 13 18:37:00 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 99063C7600D for ; Tue, 13 Dec 2016 18:37:00 +0000 (UTC) (envelope-from ian@freebsd.org) Received: from outbound1b.ore.mailhop.org (outbound1b.ore.mailhop.org [54.200.247.200]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7E9529FA for ; Tue, 13 Dec 2016 18:37:00 +0000 (UTC) (envelope-from ian@freebsd.org) X-MHO-User: 258ebc48-c163-11e6-9f67-d3961ed5a660 X-Report-Abuse-To: https://support.duocircle.com/support/solutions/articles/5000540958-duocircle-standard-smtp-abuse-information X-Originating-IP: 73.78.92.27 X-Mail-Handler: DuoCircle Outbound SMTP Received: from ilsoft.org (unknown [73.78.92.27]) by outbound1.ore.mailhop.org (Halon) with ESMTPSA id 258ebc48-c163-11e6-9f67-d3961ed5a660; Tue, 13 Dec 2016 18:37:06 +0000 (UTC) Received: from rev (rev [172.22.42.240]) by ilsoft.org (8.15.2/8.15.2) with ESMTP id uBDIapNK001629; Tue, 13 Dec 2016 11:36:51 -0700 (MST) (envelope-from ian@freebsd.org) Message-ID: <1481654211.1889.346.camel@freebsd.org> Subject: Re: system time instability From: Ian Lepore To: Hrant Dadivanyan , Konstantin Belousov Cc: freebsd-hackers@freebsd.org Date: Tue, 13 Dec 2016 11:36:51 -0700 In-Reply-To: References: Content-Type: text/plain; charset="ISO-8859-1" X-Mailer: Evolution 3.18.5.1 FreeBSD GNOME Team Port Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Dec 2016 18:37:00 -0000 On Tue, 2016-12-13 at 22:17 +0400, Hrant Dadivanyan wrote: > > > > On Mon, Dec 12, 2016 at 11:33:59PM +0400, Hrant Dadivanyan wrote: > > > > > > > > > > > On Mon, Dec 12, 2016 at 11:04:08PM +0400, Hrant Dadivanyan > > > > wrote: > > > > > > > > > > Now, when you ask, I start to suspect PPS delivery to uart > > > > > again - cable > > > > > and amplifier, but can't understand how the 100ppm error fits > > > > > into that. > > > > If you disable PPS sync in ntp config, does the machine keep > > > > time adequately ? > > > > > > > Thanks for reminding - yes, I've tried this as well, the issue > > > persists. > > > So uart shouldn't be in charge. > > > > This statement seems to be wrong, look below. > > > > > > > > > > > > > > There might be relatively long pauses when system management > > > > mode handlers > > > > do something in response to hw events.  E.g. if you have USB > > > > emulation of > > > > AT keyboard enabled in BIOS, try to disable that.  And update > > > > the BIOS. > > > The USB is switched off in the BIOS. I've removed all changes in > > > sysctl.conf > > > and nice flag from ntpd, recompiled kernel as following: > > > include         GENERIC > > > options         PPS_SYNC > > > device          pf > > > device          pflog > > > and started over. Dmesg is attached. > > > > > Please show verbose dmesg. > > > I've updated BIOS to the latest one. Verbose dmesg is attached. > > > > > > > > > CPU: Intel(R) Core(TM)2 Duo CPU     E4500  @ 2.20GHz (2194.55-MHz > > > K8-class CPU) > > This is relatively old CPU which is known to have some (minor) > > issues with > > interaction between power saving and cores.  Try the following OS > > config: > > disable deep C states, allow only C1 (there might be some tweaks in > > BIOS, > > if possible, disable the Cn, n > 1, there too); > Have never touched Cx states on servers, it was disabled in BIOS and > sysctl > shows C1 as lowest. Now I've enabled it in BIOS, but didn't touch in > OS: > dev.cpu.1.cx_usage: 100.00% last 31000us > dev.cpu.1.cx_lowest: C1 > dev.cpu.1.cx_supported: C1/1/0 > dev.cpu.0.cx_usage: 100.00% last 5569us > dev.cpu.0.cx_lowest: C1 > dev.cpu.0.cx_supported: C1/1/0 > dev.cpu.0.freq_levels: 2200/35000 2000/31000 1800/27000 1600/23000 > 1400/19000 1200/16000 > dev.cpu.0.freq: 2200 > Is this correct ? > > > > > use LAPIC for event timer (not HPET); > Have disabled HPET in BIOS: > kern.eventtimer.periodic: 0 > kern.eventtimer.timer: LAPIC > kern.eventtimer.idletick: 0 > kern.eventtimer.singlemul: 2 > kern.eventtimer.choice: LAPIC(400) i8254(100) RTC(0) > kern.eventtimer.et.i8254.quality: 100 > kern.eventtimer.et.i8254.frequency: 1193182 > kern.eventtimer.et.i8254.flags: 1 > kern.eventtimer.et.RTC.quality: 0 > kern.eventtimer.et.RTC.frequency: 32768 > kern.eventtimer.et.RTC.flags: 17 > kern.eventtimer.et.LAPIC.quality: 400 > kern.eventtimer.et.LAPIC.frequency: 99751860 > kern.eventtimer.et.LAPIC.flags: 15 > > > > > re-check that you use RDTSC for the timecounter; > kern.timecounter.tsc_shift: 1 > kern.timecounter.smp_tsc_adjust: 0 > kern.timecounter.smp_tsc: 1 > kern.timecounter.invariant_tsc: 1 > kern.timecounter.fast_gettime: 1 > kern.timecounter.tick: 1 > kern.timecounter.choice: TSC-low(1000) ACPI-fast(900) i8254(0) > dummy(-1000000) > kern.timecounter.hardware: TSC-low > kern.timecounter.alloweddeviation: 5 > kern.timecounter.stepwarnings: 0 > kern.timecounter.tc.TSC-low.quality: 1000 > kern.timecounter.tc.TSC-low.frequency: 1097249250 > kern.timecounter.tc.TSC-low.counter: 1335765171 > kern.timecounter.tc.TSC-low.mask: 4294967295 > kern.timecounter.tc.ACPI-fast.quality: 900 > kern.timecounter.tc.ACPI-fast.frequency: 3579545 > kern.timecounter.tc.ACPI-fast.counter: 6343821 > kern.timecounter.tc.ACPI-fast.mask: 16777215 > kern.timecounter.tc.i8254.quality: 0 > kern.timecounter.tc.i8254.frequency: 1193182 > kern.timecounter.tc.i8254.counter: 2701 > kern.timecounter.tc.i8254.mask: 65535 > > > > > do not enable powerd. > > > Never did on servers. > > > > > You might also try the stable/11 kernel, which has more changes WRT > > C-states > > handling and PPS/ntp locking. > > > The server did run for almost a day without PPS and looks stable. I > start > to believe, to my shame, that I did a mistake when testing this > previously. > Then the whole post is wrong and cable seems to be most suspected > part again. > Even now it's hard to understand this wrong behaviour, but anyway ... > > Just replaced the cable with shielded one where each pair has > separate > shield, used dedicated pair for PPS and ground; grounded the shields. > > Thank you Konstantin, thank you Ian ! > Hrant > A bad PPS signal could definitely lead to frequency trouble, if the way the signal is bad involves ringing, or the electrical level floating around the cutoff points for detecting low vs. high level -- you'd get false pulses, and some of them would be close enough to the time of the real pulse that they would make it through the spike/median filters in ntpd.  An early or late pulse looks like a phase step, and several consistant-enough phase steps in the same polling period looks like a frequency step. You mentioned using a 74LS245 bus driver... that can lead to ringing if the load is light, maybe the rs232 port on this new hardware has a much higher input impedance than your old system.  It might be worth adding a series resistor at the computer end to soak up reflections, something in the 30-100 ohm range should work. -- Ian