Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 12 Dec 2016 08:29:41 -0700
From:      Ian Lepore <ian@freebsd.org>
To:        Hrant Dadivanyan <hrant@dadivanyan.net>, freebsd-hackers@freebsd.org
Subject:   Re: system time instability
Message-ID:  <1481556581.1889.322.camel@freebsd.org>
In-Reply-To: <E1cGQaC-0005ZI-Mh@pandora.amnic.net>
References:  <E1cGQaC-0005ZI-Mh@pandora.amnic.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 2016-12-12 at 17:23 +0400, Hrant Dadivanyan wrote:
> Hello,
> 
> After upgrade of stratum 1 ntp server hardware from a Via EPIA Mini-
> ITX
> to Supermicro PDSBM-LN2 and OS from 8.4/i386 to 10.3-RELEASE-
> p12/amd64 it
> starts to work unstable. Most of the time it keeps time pretty well
> with
> offset less than 1-2 us, but once a few hours pll frequency jumps,
> then
> clock drifts. After passing calibration interval time (256s)
> frequency
> returns back to normal, then, after appropriate time, clock
> stabilizes
> again. Excerpt from loopstats:
> 57734 37624.525 -0.000000955 0.120 0.000000588 0.000211 4
> 57734 37640.526 0.000000319 0.120 0.000000506 0.000198 4
> 57734 37656.526 -0.000000789 0.120 0.001081214 0.000185 4
> 57734 37672.526 -0.000398921 100.120 0.000154630 35.355339 4
> 57734 37688.526 -0.001941140 100.120 0.000188374 33.071891 4
> 57734 37704.525 -0.003389196 100.120 0.000177488 30.935922 4
> 57734 37720.525 -0.004745689 100.120 0.000166147 28.937905 4
> 57734 37736.525 -0.006022007 100.120 0.000156269 27.068931 4
> 57734 37752.526 -0.007220430 100.120 0.000146663 25.320667 4
> 57734 37768.526 -0.008343331 100.120 0.000137805 23.685315 4
> 57734 37784.525 -0.009399651 100.120 0.000129406 22.155583 4
> 57734 37800.525 -0.010391390 100.120 0.000121937 20.724651 4
> 57734 37816.526 -0.011320293 100.120 0.000114053 19.386136 4
> 57734 37832.526 -0.012194902 100.120 0.000107191 18.134069 4
> 57734 37848.526 -0.013013037 100.120 0.000100035 16.962869 4
> 57734 37864.526 -0.013783932 100.120 0.000094497 15.867311 4
> 57734 37880.526 -0.014507271 100.120 0.000088691 14.842510 4
> 57734 37896.525 -0.015184384 100.120 0.000083266 13.883897 4
> 57734 37912.526 -0.015822296 100.120 0.000078249 12.987196 4
> 57734 37928.525 -0.016119704 0.122 0.000103405 37.383615 4
> 57734 37944.526 -0.015132723 0.122 0.000120509 34.969170 4
> 57734 37960.526 -0.014207941 0.122 0.000113355 32.710663 4
> 57734 37976.525 -0.013339661 0.122 0.000107051 30.598023 4
>  [snip]
> 57734 40296.525 -0.000000337 0.122 0.000001621 0.002136 4
> 57734 40312.526 -0.000000980 0.122 0.000001635 0.001998 4
> 
> The change in pll frequency is usually 100ppm, but not always. For
> today,
> for example, it's 29ppm once, 69.3ppm once and 100ppm three times.
> 
> Had tried three available timecounters: TSC-low, ACPI-fast, HPET. Had
> changed eventtimer from HPET to LAPIC, kern.eventtimer.periodic from
> 0 to 1.
> All the changes are followed by service ntpd restart.
> Also tried to change kern.hz from 1000 to 100.
> Had even tried 11.0 on other, but the exactly same board. The
> original
> board has OCXO instead of quartz, but reconnecting the original
> quartz
> doesn't help.
> 
> Didn't try another hardware and/or OS yet, the server isn't easy
> reachable,
> but, in lack of better ideas, will definitely try.
> 
> 
> Kernel has stripped all unused drivers and options plus PPS_SYNC,
> then
> FFCLOCK added. All the additions:
> options         IPSEC
> options         GEOM_ELI
> options         PPS_SYNC
> options         FFCLOCK
> 
> device          crypto
> device          enc
> device          pf
> device          pflog
> device          smbus
> device          ichsmb
> device          smb
> device          coretemp
> device          cpuctl
> device          nvram
> device          smbios
> device          ipmi
> device          aesni
> 
> The relevant part of ntp.conf:
> fudge  127.127.20.0 time2 0.6 flag1 1 flag2 0 flag3 1
> server 127.127.20.0 mode 2 minpoll 4 prefer
> server <external_server>   minpoll 8 iburst
> restrict default limited kod nomodify notrap nopeer noquery
> 
> rc.conf:
> ntpd_program="/usr/local/sbin/ntpd"
> ntpd_config="/etc/ntpd.conf"
> ntpd_flags="-N -p /var/run/ntpd.pid -f /var/db/ntpd.drift"
> ntpd_sync_on_start="YES"
> 
> sysctl.conf (this change is also seems irrelevant, rebooting without
> this
> frequency correction changes nothing in the behaviour):
> machdep.tsc_freq=2194498500     # pll freq offset change from 21.678
> to 0.120ppm
> 
> $ sysctl kern.hz kern.timecounter kern.eventtimer
> kern.hz: 1000
> kern.timecounter.tsc_shift: 1
> kern.timecounter.smp_tsc_adjust: 0
> kern.timecounter.smp_tsc: 1
> kern.timecounter.invariant_tsc: 1
> kern.timecounter.fast_gettime: 1
> kern.timecounter.tick: 1
> kern.timecounter.choice: TSC-low(1000) ACPI-fast(900) i8254(0)
> HPET(950) dummy(-1000000)
> kern.timecounter.hardware: TSC-low
> kern.timecounter.alloweddeviation: 5
> kern.timecounter.stepwarnings: 0
> kern.timecounter.tc.TSC-low.quality: 1000
> kern.timecounter.tc.TSC-low.frequency: 1097249250
> kern.timecounter.tc.TSC-low.counter: 2359573202
> kern.timecounter.tc.TSC-low.mask: 4294967295
> kern.timecounter.tc.ACPI-fast.quality: 900
> kern.timecounter.tc.ACPI-fast.frequency: 3579545
> kern.timecounter.tc.ACPI-fast.counter: 9238615
> kern.timecounter.tc.ACPI-fast.mask: 16777215
> kern.timecounter.tc.i8254.quality: 0
> kern.timecounter.tc.i8254.frequency: 1193182
> kern.timecounter.tc.i8254.counter: 9906
> kern.timecounter.tc.i8254.mask: 65535
> kern.timecounter.tc.HPET.quality: 950
> kern.timecounter.tc.HPET.frequency: 14318180
> kern.timecounter.tc.HPET.counter: 2305610093
> kern.timecounter.tc.HPET.mask: 4294967295
> kern.eventtimer.periodic: 0
> kern.eventtimer.timer: HPET
> kern.eventtimer.idletick: 0
> kern.eventtimer.singlemul: 2
> kern.eventtimer.choice: HPET(450) HPET1(440) HPET2(440) LAPIC(400)
> i8254(100) RTC(0)
> kern.eventtimer.et.i8254.quality: 100
> kern.eventtimer.et.i8254.frequency: 1193182
> kern.eventtimer.et.i8254.flags: 1
> kern.eventtimer.et.RTC.quality: 0
> kern.eventtimer.et.RTC.frequency: 32768
> kern.eventtimer.et.RTC.flags: 17
> kern.eventtimer.et.HPET2.quality: 440
> kern.eventtimer.et.HPET2.frequency: 14318180
> kern.eventtimer.et.HPET2.flags: 3
> kern.eventtimer.et.HPET1.quality: 440
> kern.eventtimer.et.HPET1.frequency: 14318180
> kern.eventtimer.et.HPET1.flags: 3
> kern.eventtimer.et.HPET.quality: 450
> kern.eventtimer.et.HPET.frequency: 14318180
> kern.eventtimer.et.HPET.flags: 3
> kern.eventtimer.et.LAPIC.quality: 400
> kern.eventtimer.et.LAPIC.frequency: 99749970
> kern.eventtimer.et.LAPIC.flags: 15
> $ 
> 
> Any hints ?
> 
> Thank you,
> Hrant
> 

Very strange, I've never seen behavior like that.  You're using ntpd
from ports, is it the latest version?

How are you feeding the PPS signal to the system?  Do you know how wide
the PPS pulse is?  I'm wondering if the driver is occasionally missing
an edge of a narrow pulse, although an occasional bad measurement
should get weeded out by ntpd's refclock median filter.  If the pulse
is wider than a few microseconds the whole theory falls apart anyway.

Anyway, I'm a bit focused on the PPS because there were changes to the
serial (uart) PPS capture between 8.4 and 10.x, and I'm responsible for
some of them. :)

-- Ian




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1481556581.1889.322.camel>