Date: Mon, 12 Dec 2016 23:04:08 +0400 (AMT) From: Hrant Dadivanyan <hrant@dadivanyan.net> To: Ian Lepore <ian@freebsd.org> Cc: freebsd-hackers@freebsd.org Subject: Re: system time instability Message-ID: <E1cGVtQ-000Acm-7c@pandora.amnic.net> In-Reply-To: <1481556581.1889.322.camel@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
[ Charset ISO-8859-1 converted... ] > On Mon, 2016-12-12 at 17:23 +0400, Hrant Dadivanyan wrote: > > Hello, > > > > After upgrade of stratum 1 ntp server hardware from a Via EPIA Mini- > > ITX > > to Supermicro PDSBM-LN2 and OS from 8.4/i386 to 10.3-RELEASE- > > p12/amd64 it > > starts to work unstable. Most of the time it keeps time pretty well > > with > > offset less than 1-2 us, but once a few hours pll frequency jumps, > > then > > clock drifts. After passing calibration interval time (256s) > > frequency > > returns back to normal, then, after appropriate time, clock > > stabilizes > > again. Excerpt from loopstats: > > 57734 37624.525 -0.000000955 0.120 0.000000588 0.000211 4 > > 57734 37640.526 0.000000319 0.120 0.000000506 0.000198 4 > > 57734 37656.526 -0.000000789 0.120 0.001081214 0.000185 4 > > 57734 37672.526 -0.000398921 100.120 0.000154630 35.355339 4 > > 57734 37688.526 -0.001941140 100.120 0.000188374 33.071891 4 > > 57734 37704.525 -0.003389196 100.120 0.000177488 30.935922 4 > > 57734 37720.525 -0.004745689 100.120 0.000166147 28.937905 4 > > 57734 37736.525 -0.006022007 100.120 0.000156269 27.068931 4 > > 57734 37752.526 -0.007220430 100.120 0.000146663 25.320667 4 > > 57734 37768.526 -0.008343331 100.120 0.000137805 23.685315 4 > > 57734 37784.525 -0.009399651 100.120 0.000129406 22.155583 4 > > 57734 37800.525 -0.010391390 100.120 0.000121937 20.724651 4 > > 57734 37816.526 -0.011320293 100.120 0.000114053 19.386136 4 > > 57734 37832.526 -0.012194902 100.120 0.000107191 18.134069 4 > > 57734 37848.526 -0.013013037 100.120 0.000100035 16.962869 4 > > 57734 37864.526 -0.013783932 100.120 0.000094497 15.867311 4 > > 57734 37880.526 -0.014507271 100.120 0.000088691 14.842510 4 > > 57734 37896.525 -0.015184384 100.120 0.000083266 13.883897 4 > > 57734 37912.526 -0.015822296 100.120 0.000078249 12.987196 4 > > 57734 37928.525 -0.016119704 0.122 0.000103405 37.383615 4 > > 57734 37944.526 -0.015132723 0.122 0.000120509 34.969170 4 > > 57734 37960.526 -0.014207941 0.122 0.000113355 32.710663 4 > > 57734 37976.525 -0.013339661 0.122 0.000107051 30.598023 4 > > [snip] > > 57734 40296.525 -0.000000337 0.122 0.000001621 0.002136 4 > > 57734 40312.526 -0.000000980 0.122 0.000001635 0.001998 4 > > > > The change in pll frequency is usually 100ppm, but not always. For > > today, > > for example, it's 29ppm once, 69.3ppm once and 100ppm three times. > > > > Had tried three available timecounters: TSC-low, ACPI-fast, HPET. Had > > changed eventtimer from HPET to LAPIC, kern.eventtimer.periodic from > > 0 to 1. > > All the changes are followed by service ntpd restart. > > Also tried to change kern.hz from 1000 to 100. > > Had even tried 11.0 on other, but the exactly same board. The > > original > > board has OCXO instead of quartz, but reconnecting the original > > quartz > > doesn't help. > > > > Didn't try another hardware and/or OS yet, the server isn't easy > > reachable, > > but, in lack of better ideas, will definitely try. > > > > > > Kernel has stripped all unused drivers and options plus PPS_SYNC, > > then > > FFCLOCK added. All the additions: > > options IPSEC > > options GEOM_ELI > > options PPS_SYNC > > options FFCLOCK > > > > device crypto > > device enc > > device pf > > device pflog > > device smbus > > device ichsmb > > device smb > > device coretemp > > device cpuctl > > device nvram > > device smbios > > device ipmi > > device aesni > > > > The relevant part of ntp.conf: > > fudge 127.127.20.0 time2 0.6 flag1 1 flag2 0 flag3 1 > > server 127.127.20.0 mode 2 minpoll 4 prefer > > server <external_server> minpoll 8 iburst > > restrict default limited kod nomodify notrap nopeer noquery > > > > rc.conf: > > ntpd_program="/usr/local/sbin/ntpd" > > ntpd_config="/etc/ntpd.conf" > > ntpd_flags="-N -p /var/run/ntpd.pid -f /var/db/ntpd.drift" > > ntpd_sync_on_start="YES" > > > > sysctl.conf (this change is also seems irrelevant, rebooting without > > this > > frequency correction changes nothing in the behaviour): > > machdep.tsc_freq=2194498500 # pll freq offset change from 21.678 > > to 0.120ppm > > > > $ sysctl kern.hz kern.timecounter kern.eventtimer > > kern.hz: 1000 > > kern.timecounter.tsc_shift: 1 > > kern.timecounter.smp_tsc_adjust: 0 > > kern.timecounter.smp_tsc: 1 > > kern.timecounter.invariant_tsc: 1 > > kern.timecounter.fast_gettime: 1 > > kern.timecounter.tick: 1 > > kern.timecounter.choice: TSC-low(1000) ACPI-fast(900) i8254(0) > > HPET(950) dummy(-1000000) > > kern.timecounter.hardware: TSC-low > > kern.timecounter.alloweddeviation: 5 > > kern.timecounter.stepwarnings: 0 > > kern.timecounter.tc.TSC-low.quality: 1000 > > kern.timecounter.tc.TSC-low.frequency: 1097249250 > > kern.timecounter.tc.TSC-low.counter: 2359573202 > > kern.timecounter.tc.TSC-low.mask: 4294967295 > > kern.timecounter.tc.ACPI-fast.quality: 900 > > kern.timecounter.tc.ACPI-fast.frequency: 3579545 > > kern.timecounter.tc.ACPI-fast.counter: 9238615 > > kern.timecounter.tc.ACPI-fast.mask: 16777215 > > kern.timecounter.tc.i8254.quality: 0 > > kern.timecounter.tc.i8254.frequency: 1193182 > > kern.timecounter.tc.i8254.counter: 9906 > > kern.timecounter.tc.i8254.mask: 65535 > > kern.timecounter.tc.HPET.quality: 950 > > kern.timecounter.tc.HPET.frequency: 14318180 > > kern.timecounter.tc.HPET.counter: 2305610093 > > kern.timecounter.tc.HPET.mask: 4294967295 > > kern.eventtimer.periodic: 0 > > kern.eventtimer.timer: HPET > > kern.eventtimer.idletick: 0 > > kern.eventtimer.singlemul: 2 > > kern.eventtimer.choice: HPET(450) HPET1(440) HPET2(440) LAPIC(400) > > i8254(100) RTC(0) > > kern.eventtimer.et.i8254.quality: 100 > > kern.eventtimer.et.i8254.frequency: 1193182 > > kern.eventtimer.et.i8254.flags: 1 > > kern.eventtimer.et.RTC.quality: 0 > > kern.eventtimer.et.RTC.frequency: 32768 > > kern.eventtimer.et.RTC.flags: 17 > > kern.eventtimer.et.HPET2.quality: 440 > > kern.eventtimer.et.HPET2.frequency: 14318180 > > kern.eventtimer.et.HPET2.flags: 3 > > kern.eventtimer.et.HPET1.quality: 440 > > kern.eventtimer.et.HPET1.frequency: 14318180 > > kern.eventtimer.et.HPET1.flags: 3 > > kern.eventtimer.et.HPET.quality: 450 > > kern.eventtimer.et.HPET.frequency: 14318180 > > kern.eventtimer.et.HPET.flags: 3 > > kern.eventtimer.et.LAPIC.quality: 400 > > kern.eventtimer.et.LAPIC.frequency: 99749970 > > kern.eventtimer.et.LAPIC.flags: 15 > > $ > > > > Any hints ? > > > > Thank you, > > Hrant > > > > Very strange, I've never seen behavior like that. You're using ntpd > from ports, is it the latest version? > Yes, it's 4.2.8p9_1 from ports. > How are you feeding the PPS signal to the system? Do you know how wide > the PPS pulse is? I'm wondering if the driver is occasionally missing > an edge of a narrow pulse, although an occasional bad measurement > should get weeded out by ntpd's refclock median filter. If the pulse > is wider than a few microseconds the whole theory falls apart anyway. > Pulse width is 100 ms, receiver is Garmin GPS 18x LVC. Actually I've replaced reciver as well. The cable is too long 12-13 meters and there was badformat (I guess CRC) errors, when setup 4-5 years ago. I've used CAT5 cable with PPS and Rx wires twised to ground and 74LS245 bus driver close to GPS receiver to amplify signal. It's not a real amplifier, but works fine there for years and these errors gone. There are also a few per a hour: kernel reports TIME_ERROR: 0x2307: PPS Time Sync wanted but PPS Jitter exceeded errors in the logs, so it looks like the signal is not okay anyway. > Anyway, I'm a bit focused on the PPS because there were changes to the > serial (uart) PPS capture between 8.4 and 10.x, and I'm responsible for > some of them. :) > Now, when you ask, I start to suspect PPS delivery to uart again - cable and amplifier, but can't understand how the 100ppm error fits into that. Thank you, Hrant > -- Ian > -- Hrant Dadivanyan (aka Ran d'Adi) hrant(at)dadivanyan.net /* "Feci quod potui, faciant meliora potentes." */ ran(at)psg.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E1cGVtQ-000Acm-7c>