Date: Thu, 11 Oct 2012 15:54:43 +0300 From: Alexander Motin <mav@FreeBSD.org> To: freebsd-stable@FreeBSD.org Subject: Re: time keeps on slipping... slipping... Message-ID: <5076C193.70405@FreeBSD.org> In-Reply-To: <20121011063030.GK1967@funkthat.com> References: <20121008040239.GE1967@funkthat.com> <5075F9F7.1040007@FreeBSD.org> <20121011063030.GK1967@funkthat.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 11.10.2012 09:30, John-Mark Gurney wrote: > Alexander Motin wrote this message on Thu, Oct 11, 2012 at 01:43 +0300: >> On 08.10.2012 07:02, John-Mark Gurney wrote: >>> I recently put together a new machine w/ a SuperMicro H8SCM and an >>> AMD Opteron 4228 HE... I've having an issue where the clock on the >>> machine skips around... The wierd part is that it's very sudden when >>> it happens... ntp sometimes brings it back, but it can't when the clock >>> gets too far ahread (1000 seconds), ntp dies... >>> >>> In order to catch it happening, I ran a sleep 60 loop fetching time >> >from another server that keeps time correctly via: >>> while sleep 60; do echo -n h2:; nc h2 13; date; ntpdate h2.funkthat.com; >>> done >>> >>> here are some snippits: >>> h2:Sun Oct 7 17:12:54 2012^M >>> Sun Oct 7 17:12:54 PDT 2012 >>> 7 Oct 17:12:54 ntpdate[31036]: the NTP socket is in use, exiting >>> h2:Sun Oct 7 17:13:48 2012^M >>> Sun Oct 7 17:20:21 PDT 2012 >>> 7 Oct 17:20:21 ntpdate[31045]: the NTP socket is in use, exiting >>> >>> but then ntp brings it back in sync: >>> h2:Sun Oct 7 17:28:49 2012^M >>> Sun Oct 7 17:35:21 PDT 2012 >>> 7 Oct 17:35:21 ntpdate[31164]: the NTP socket is in use, exiting >>> h2:Sun Oct 7 17:29:49 2012^M >>> Sun Oct 7 17:29:49 PDT 2012 >>> 7 Oct 17:29:49 ntpdate[31170]: the NTP socket is in use, exiting >>> >>> It happens pretty often: >>> Oct 7 00:19:13 gold ntpd[3721]: time reset -785.347912 s >>> Oct 7 00:46:37 gold ntpd[3721]: time reset -392.673256 s >>> Oct 7 01:04:24 gold ntpd[3721]: time reset -785.346533 s >>> Oct 7 15:00:59 gold ntpd[3721]: time reset -392.681720 s >>> Oct 7 16:32:11 gold ntpd[3721]: time reset -392.671268 s >>> Oct 7 17:29:29 gold ntpd[3721]: time reset -392.671752 s >>> Oct 7 18:04:37 gold ntpd[3721]: time reset -785.346987 s >>> >>> but as you can see above, the time slip happens abruptly.. looks like >>> a rounding error or something... >>> >>> I'm now reducing the sleep to 5 seconds... but as you can see the sleep >>> ends a few seconds early and local time suddenly jumped forward 6 >>> minutes 33 seconds... >>> >>> $ sysctl kern.timecounter >>> kern.timecounter.fast_gettime: 1 >>> kern.timecounter.tick: 1 >>> kern.timecounter.choice: TSC-low(1000) ACPI-safe(850) HPET(950) i8254(0) >>> dummy(-1000000) >>> kern.timecounter.hardware: TSC-low >>> kern.timecounter.stepwarnings: 0 >>> kern.timecounter.tc.i8254.mask: 65535 >>> kern.timecounter.tc.i8254.counter: 11598 >>> kern.timecounter.tc.i8254.frequency: 1193182 >>> kern.timecounter.tc.i8254.quality: 0 >>> kern.timecounter.tc.HPET.mask: 4294967295 >>> kern.timecounter.tc.HPET.counter: 3257069245 >>> kern.timecounter.tc.HPET.frequency: 14318180 >>> kern.timecounter.tc.HPET.quality: 950 >>> kern.timecounter.tc.ACPI-safe.mask: 16777215 >>> kern.timecounter.tc.ACPI-safe.counter: 4219134510 >>> kern.timecounter.tc.ACPI-safe.frequency: 3579545 >>> kern.timecounter.tc.ACPI-safe.quality: 850 >>> kern.timecounter.tc.TSC-low.mask: 4294967295 >>> kern.timecounter.tc.TSC-low.counter: 2854866610 >>> kern.timecounter.tc.TSC-low.frequency: 10937740 >>> kern.timecounter.tc.TSC-low.quality: 1000 >>> kern.timecounter.smp_tsc: 1 >>> kern.timecounter.invariant_tsc: 1 >>> $ sysctl kern.eventtimer >>> kern.eventtimer.choice: LAPIC(400) i8254(100) RTC(0) >>> kern.eventtimer.et.LAPIC.flags: 15 >>> kern.eventtimer.et.LAPIC.frequency: 100002217 >>> kern.eventtimer.et.LAPIC.quality: 400 >>> kern.eventtimer.et.i8254.flags: 1 >>> kern.eventtimer.et.i8254.frequency: 1193182 >>> kern.eventtimer.et.i8254.quality: 100 >>> kern.eventtimer.et.RTC.flags: 17 >>> kern.eventtimer.et.RTC.frequency: 32768 >>> kern.eventtimer.et.RTC.quality: 0 >>> kern.eventtimer.periodic: 0 >>> kern.eventtimer.timer: LAPIC >>> kern.eventtimer.activetick: 1 >>> kern.eventtimer.idletick: 0 >>> kern.eventtimer.singlemul: 2 >>> >>> I have switched my timecounter to HPET to see if things are different... >>> >>> Any clues? >> >> Mentioned switching to HPET could tell a lot about the problem. >> Switching event timer also may be interesting. > > Since I switch to HPET, it hasn't happened at all in the last 3 days.. That is probably tells about some problems with TSC timecounter. What is strange to me is time jump size of 5 minutes. TSC timecounter should overflow each few seconds, so single jump should be just that big. > Should I try switching back to TSC and switching event timer? do you > need any other info, or want me to try anything else? You may try to do it to be sure eventtimers are not related to the case. > Oh, forgot to include the specific processor info in my previous > email: > CPU: AMD Opteron(tm) Processor 4228 HE (2800.05-MHz K8-class CPU) > Origin = "AuthenticAMD" Id = 0x600f12 Family = 0x15 Model = 0x1 Stepping = 2 > Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> > Features2=0x1e98220b<SSE3,PCLMULQDQ,MON,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,AESNI,XSAVE,OSXSAVE,AVX> > AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM> > AMD Features2=0x1c9bfff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,XOP,SKINIT,WDT,LWP,FMA4,NodeId,Topology,<b23>,<b24>> > TSC: P-state invariant, performance statistics Unfortunately, I don't know AMD processors specifics. May be jkim@ or avg@ may remember something. As far as I know, kernel should block enter sleep states on AMD CPUs when LAPIC eventtimer is used (by default). In such case I guess TSC should also work fine. But I don't know what other possible sources of asynchronicity may be there. -- Alexander Motin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5076C193.70405>