Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 20 Oct 2015 14:05:10 +0300
From:      Andriy Gapon <avg@FreeBSD.org>
To:        freebsd-hackers <freebsd-hackers@FreeBSD.org>
Subject:   Re: instability of timekeeping
Message-ID:  <56261FE6.90302@FreeBSD.org>
In-Reply-To: <56261398.60102@FreeBSD.org>
References:  <56261398.60102@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

I performed a small observation.  With ntpd disabled I ran `ntpdate -d` at 10
second intervals in a loop (done via `sleep 10`).  It looks like for about 25
minutes the time offset between a reference server and my machine was quite
stable. But then it sort of jumped about 2.5 seconds between two consecutive
ntpdate invocations.

20 Oct 13:21:02 ntpdate[85157]: ntpdate 4.2.8p3-a (1)
Looking for host ntp.time.in.ua and service ntp
62.149.0.30 reversed to ntp.time.in.ua
transmit(62.149.0.30)
receive(62.149.0.30)
transmit(62.149.0.30)
receive(62.149.0.30)
transmit(62.149.0.30)
receive(62.149.0.30)
transmit(62.149.0.30)
receive(62.149.0.30)
server 62.149.0.30, port 123
stratum 1, precision -20, leap 00, trust 000
refid [GPS], delay 0.03326, dispersion 0.00014
transmitted 4, in filter 4
reference time:    d9d09422.65c26838  Tue, Oct 20 2015 13:21:22.397
originate timestamp: d9d0942f.a03ebee7  Tue, Oct 20 2015 13:21:35.625
transmit timestamp:  d9d09414.e9ed230c  Tue, Oct 20 2015 13:21:08.913
filter delay:  0.03372  0.03491  0.03378  0.03326
         0.00000  0.00000  0.00000  0.00000
filter offset: 26.70845 26.70896 26.70834 26.70832
         0.000000 0.000000 0.000000 0.000000
delay 0.03326, dispersion 0.00014
offset 26.708320

20 Oct 13:21:08 ntpdate[85157]: step time server 62.149.0.30 offset 26.708320 sec

[...]

20 Oct 13:45:20 ntpdate[87088]: ntpdate 4.2.8p3-a (1)
Looking for host ntp.time.in.ua and service ntp
62.149.0.30 reversed to ntp.time.in.ua
transmit(62.149.0.30)
receive(62.149.0.30)
transmit(62.149.0.30)
receive(62.149.0.30)
transmit(62.149.0.30)
receive(62.149.0.30)
transmit(62.149.0.30)
receive(62.149.0.30)
server 62.149.0.30, port 123
stratum 1, precision -20, leap 00, trust 000
refid [GPS], delay 0.03442, dispersion 0.00018
transmitted 4, in filter 4
reference time:    d9d099d3.67703742  Tue, Oct 20 2015 13:45:39.404
originate timestamp: d9d099e1.8a1b6c78  Tue, Oct 20 2015 13:45:53.539
transmit timestamp:  d9d099c6.d34c1e00  Tue, Oct 20 2015 13:45:26.825
filter delay:  0.03481  0.03442  0.03448  0.03458
         0.00000  0.00000  0.00000  0.00000
filter offset: 26.70957 26.70943 26.70913 26.70957
         0.000000 0.000000 0.000000 0.000000
delay 0.03442, dispersion 0.00018
offset 26.709437

20 Oct 13:45:26 ntpdate[87088]: step time server 62.149.0.30 offset 26.709437 sec

20 Oct 13:45:36 ntpdate[87094]: ntpdate 4.2.8p3-a (1)
Looking for host ntp.time.in.ua and service ntp
62.149.0.30 reversed to ntp.time.in.ua
transmit(62.149.0.30)
receive(62.149.0.30)
transmit(62.149.0.30)
receive(62.149.0.30)
transmit(62.149.0.30)
receive(62.149.0.30)
transmit(62.149.0.30)
receive(62.149.0.30)
server 62.149.0.30, port 123
stratum 1, precision -20, leap 00, trust 000
refid [GPS], delay 0.03349, dispersion 0.00012
transmitted 4, in filter 4
reference time:    d9d099e5.63ead89c  Tue, Oct 20 2015 13:45:57.390
originate timestamp: d9d099f4.6364c717  Tue, Oct 20 2015 13:46:12.388
transmit timestamp:  d9d099d7.00939943  Tue, Oct 20 2015 13:45:43.002
filter delay:  0.03349  0.03413  0.03419  0.03455
         0.00000  0.00000  0.00000  0.00000
filter offset: 29.38105 29.38106 29.38134 29.38149
         0.000000 0.000000 0.000000 0.000000
delay 0.03349, dispersion 0.00012
offset 29.381055

20 Oct 13:45:43 ntpdate[87094]: step time server 62.149.0.30 offset 29.381055 sec

On 20/10/2015 13:12, Andriy Gapon wrote:
> 
> I recently replaced a 2-core Athlon II X2 CPU with a same-family Phenom II X4
> CPU and after that I started noticing problems with the timekeeping.  It seems
> that from time to time the jitter becomes so high that ntpd goes nuts or stops
> synchronizing or panics.
> 
> Here how the current event timer and time counter configurations look (slightly
> trimmed):
> $ sysctl kern.timecounter
> kern.timecounter.tsc_shift: 1
> kern.timecounter.smp_tsc_adjust: 0
> kern.timecounter.smp_tsc: 1
> kern.timecounter.invariant_tsc: 1
> kern.timecounter.fast_gettime: 1
> kern.timecounter.tick: 1
> kern.timecounter.choice: TSC-low(800) ACPI-fast(900) HPET(950) i8254(0)
> dummy(-1000000)
> kern.timecounter.hardware: TSC-low
> kern.timecounter.alloweddeviation: 5
> kern.timecounter.stepwarnings: 0
> kern.timecounter.tc.TSC-low.quality: 800
> kern.timecounter.tc.TSC-low.frequency: 1607357461
> kern.timecounter.tc.TSC-low.counter: 2457319922
> kern.timecounter.tc.TSC-low.mask: 4294967295
> kern.timecounter.tc.ACPI-fast.quality: 900
> kern.timecounter.tc.HPET.quality: 950
> kern.timecounter.tc.i8254.quality: 0
> $ sysctl kern.eventtimer
> kern.eventtimer.periodic: 0
> kern.eventtimer.timer: HPET
> kern.eventtimer.idletick: 0
> kern.eventtimer.singlemul: 2
> kern.eventtimer.choice: HPET(450) HPET1(450) HPET2(450) LAPIC(400) i8254(100) RTC(0)
> kern.eventtimer.et.RTC.quality: 0
> kern.eventtimer.et.HPET2.quality: 450
> kern.eventtimer.et.HPET1.quality: 450
> kern.eventtimer.et.HPET.quality: 450
> kern.eventtimer.et.HPET.frequency: 14318180
> kern.eventtimer.et.HPET.flags: 3
> kern.eventtimer.et.i8254.quality: 100
> kern.eventtimer.et.LAPIC.quality: 400
> 
> Please note is that TSC-low time counter is chosen administratively whereas the
> event timer configuration is fully automatic.
> The previous configuration was produced in the same fashion.
> One notable difference is that the previous CPU was 2-core and so two HPET
> timers were virtually combined into a single timer with per-CPU capability.  In
> other words, two HPET timers used two drive two cores.
> The newer CPU has four cores, so there are not enough HPET timers to drive each
> core independently and thus there is no virtual bundling.  Thus, one HPET timer
> drives one core and that core forwards the interrupts to other cores via IPIs as
> necessary.
> 
> But I am far from sure that the stated difference is actually the source of the
> instability.  There could be other hardware-related reasons, of course.
> 
> I wonder if there is a good way to analyze / debug this situation to see what
> exactly is wrong.  For now I am thinking about trying different time counter and
> event timer configurations, but I would prefer a more guided "scientific"
> approach over a blind trial and error one.
> 
> I would appreciate any help, suggestions, hints.
> 
> The CPUs:
> CPU: AMD Athlon(tm) II X2 250 Processor (3013.79-MHz K8-class CPU)
>   Origin="AuthenticAMD"  Id=0x100f62  Family=0x10  Model=0x6  Stepping=2
> 
> Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
>   Features2=0x802009<SSE3,MON,CX16,POPCNT>
>   AMD Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
>   AMD
> Features2=0x37ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT>
>   SVM: Features=0xf<NP,LbrVirt,SVML,NRIPS>
> Revision=1, ASIDs=64
>   TSC: P-state invariant
> 
> CPU: AMD Phenom(tm) II X4 955 Processor (3214.71-MHz K8-class CPU)
>   Origin="AuthenticAMD"  Id=0x100f43  Family=0x10  Model=0x4  Stepping=3
> 
> Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
>   Features2=0x802009<SSE3,MON,CX16,POPCNT>
>   AMD Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
>   AMD
> Features2=0x37ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT>
>   SVM: Features=0xf<NP,LbrVirt,SVML,NRIPS>
> Revision=1, ASIDs=64
>   TSC: P-state invariant
> 


-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?56261FE6.90302>