Date: Tue, 20 Oct 2015 14:05:10 +0300 From: Andriy Gapon <avg@FreeBSD.org> To: freebsd-hackers <freebsd-hackers@FreeBSD.org> Subject: Re: instability of timekeeping Message-ID: <56261FE6.90302@FreeBSD.org> In-Reply-To: <56261398.60102@FreeBSD.org> References: <56261398.60102@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
I performed a small observation. With ntpd disabled I ran `ntpdate -d` at 10 second intervals in a loop (done via `sleep 10`). It looks like for about 25 minutes the time offset between a reference server and my machine was quite stable. But then it sort of jumped about 2.5 seconds between two consecutive ntpdate invocations. 20 Oct 13:21:02 ntpdate[85157]: ntpdate 4.2.8p3-a (1) Looking for host ntp.time.in.ua and service ntp 62.149.0.30 reversed to ntp.time.in.ua transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) server 62.149.0.30, port 123 stratum 1, precision -20, leap 00, trust 000 refid [GPS], delay 0.03326, dispersion 0.00014 transmitted 4, in filter 4 reference time: d9d09422.65c26838 Tue, Oct 20 2015 13:21:22.397 originate timestamp: d9d0942f.a03ebee7 Tue, Oct 20 2015 13:21:35.625 transmit timestamp: d9d09414.e9ed230c Tue, Oct 20 2015 13:21:08.913 filter delay: 0.03372 0.03491 0.03378 0.03326 0.00000 0.00000 0.00000 0.00000 filter offset: 26.70845 26.70896 26.70834 26.70832 0.000000 0.000000 0.000000 0.000000 delay 0.03326, dispersion 0.00014 offset 26.708320 20 Oct 13:21:08 ntpdate[85157]: step time server 62.149.0.30 offset 26.708320 sec [...] 20 Oct 13:45:20 ntpdate[87088]: ntpdate 4.2.8p3-a (1) Looking for host ntp.time.in.ua and service ntp 62.149.0.30 reversed to ntp.time.in.ua transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) server 62.149.0.30, port 123 stratum 1, precision -20, leap 00, trust 000 refid [GPS], delay 0.03442, dispersion 0.00018 transmitted 4, in filter 4 reference time: d9d099d3.67703742 Tue, Oct 20 2015 13:45:39.404 originate timestamp: d9d099e1.8a1b6c78 Tue, Oct 20 2015 13:45:53.539 transmit timestamp: d9d099c6.d34c1e00 Tue, Oct 20 2015 13:45:26.825 filter delay: 0.03481 0.03442 0.03448 0.03458 0.00000 0.00000 0.00000 0.00000 filter offset: 26.70957 26.70943 26.70913 26.70957 0.000000 0.000000 0.000000 0.000000 delay 0.03442, dispersion 0.00018 offset 26.709437 20 Oct 13:45:26 ntpdate[87088]: step time server 62.149.0.30 offset 26.709437 sec 20 Oct 13:45:36 ntpdate[87094]: ntpdate 4.2.8p3-a (1) Looking for host ntp.time.in.ua and service ntp 62.149.0.30 reversed to ntp.time.in.ua transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) transmit(62.149.0.30) receive(62.149.0.30) server 62.149.0.30, port 123 stratum 1, precision -20, leap 00, trust 000 refid [GPS], delay 0.03349, dispersion 0.00012 transmitted 4, in filter 4 reference time: d9d099e5.63ead89c Tue, Oct 20 2015 13:45:57.390 originate timestamp: d9d099f4.6364c717 Tue, Oct 20 2015 13:46:12.388 transmit timestamp: d9d099d7.00939943 Tue, Oct 20 2015 13:45:43.002 filter delay: 0.03349 0.03413 0.03419 0.03455 0.00000 0.00000 0.00000 0.00000 filter offset: 29.38105 29.38106 29.38134 29.38149 0.000000 0.000000 0.000000 0.000000 delay 0.03349, dispersion 0.00012 offset 29.381055 20 Oct 13:45:43 ntpdate[87094]: step time server 62.149.0.30 offset 29.381055 sec On 20/10/2015 13:12, Andriy Gapon wrote: > > I recently replaced a 2-core Athlon II X2 CPU with a same-family Phenom II X4 > CPU and after that I started noticing problems with the timekeeping. It seems > that from time to time the jitter becomes so high that ntpd goes nuts or stops > synchronizing or panics. > > Here how the current event timer and time counter configurations look (slightly > trimmed): > $ sysctl kern.timecounter > kern.timecounter.tsc_shift: 1 > kern.timecounter.smp_tsc_adjust: 0 > kern.timecounter.smp_tsc: 1 > kern.timecounter.invariant_tsc: 1 > kern.timecounter.fast_gettime: 1 > kern.timecounter.tick: 1 > kern.timecounter.choice: TSC-low(800) ACPI-fast(900) HPET(950) i8254(0) > dummy(-1000000) > kern.timecounter.hardware: TSC-low > kern.timecounter.alloweddeviation: 5 > kern.timecounter.stepwarnings: 0 > kern.timecounter.tc.TSC-low.quality: 800 > kern.timecounter.tc.TSC-low.frequency: 1607357461 > kern.timecounter.tc.TSC-low.counter: 2457319922 > kern.timecounter.tc.TSC-low.mask: 4294967295 > kern.timecounter.tc.ACPI-fast.quality: 900 > kern.timecounter.tc.HPET.quality: 950 > kern.timecounter.tc.i8254.quality: 0 > $ sysctl kern.eventtimer > kern.eventtimer.periodic: 0 > kern.eventtimer.timer: HPET > kern.eventtimer.idletick: 0 > kern.eventtimer.singlemul: 2 > kern.eventtimer.choice: HPET(450) HPET1(450) HPET2(450) LAPIC(400) i8254(100) RTC(0) > kern.eventtimer.et.RTC.quality: 0 > kern.eventtimer.et.HPET2.quality: 450 > kern.eventtimer.et.HPET1.quality: 450 > kern.eventtimer.et.HPET.quality: 450 > kern.eventtimer.et.HPET.frequency: 14318180 > kern.eventtimer.et.HPET.flags: 3 > kern.eventtimer.et.i8254.quality: 100 > kern.eventtimer.et.LAPIC.quality: 400 > > Please note is that TSC-low time counter is chosen administratively whereas the > event timer configuration is fully automatic. > The previous configuration was produced in the same fashion. > One notable difference is that the previous CPU was 2-core and so two HPET > timers were virtually combined into a single timer with per-CPU capability. In > other words, two HPET timers used two drive two cores. > The newer CPU has four cores, so there are not enough HPET timers to drive each > core independently and thus there is no virtual bundling. Thus, one HPET timer > drives one core and that core forwards the interrupts to other cores via IPIs as > necessary. > > But I am far from sure that the stated difference is actually the source of the > instability. There could be other hardware-related reasons, of course. > > I wonder if there is a good way to analyze / debug this situation to see what > exactly is wrong. For now I am thinking about trying different time counter and > event timer configurations, but I would prefer a more guided "scientific" > approach over a blind trial and error one. > > I would appreciate any help, suggestions, hints. > > The CPUs: > CPU: AMD Athlon(tm) II X2 250 Processor (3013.79-MHz K8-class CPU) > Origin="AuthenticAMD" Id=0x100f62 Family=0x10 Model=0x6 Stepping=2 > > Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> > Features2=0x802009<SSE3,MON,CX16,POPCNT> > AMD Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!> > AMD > Features2=0x37ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT> > SVM: Features=0xf<NP,LbrVirt,SVML,NRIPS> > Revision=1, ASIDs=64 > TSC: P-state invariant > > CPU: AMD Phenom(tm) II X4 955 Processor (3214.71-MHz K8-class CPU) > Origin="AuthenticAMD" Id=0x100f43 Family=0x10 Model=0x4 Stepping=3 > > Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> > Features2=0x802009<SSE3,MON,CX16,POPCNT> > AMD Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!> > AMD > Features2=0x37ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT> > SVM: Features=0xf<NP,LbrVirt,SVML,NRIPS> > Revision=1, ASIDs=64 > TSC: P-state invariant > -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?56261FE6.90302>