Date: Mon, 27 Sep 2010 22:11:54 -0700 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: Don Lewis <truckman@FreeBSD.org> Cc: stable@FreeBSD.org Subject: Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime Message-ID: <20100928051154.GA73859@icarus.home.lan> In-Reply-To: <201009280425.o8S4PA0d058131@gw.catspoiler.org> References: <201009280425.o8S4PA0d058131@gw.catspoiler.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Sep 27, 2010 at 09:25:10PM -0700, Don Lewis wrote: > CPU time accounting is broken on one of my machines running 8-STABLE. I > ran a test with a simple program that just loops and consumes CPU time: > > % time ./a.out > 94.544u 0.000s 19:14.10 8.1% 62+2054k 0+0io 0pf+0w > > The display in top shows the process with WCPU at 100%, but TIME > increments very slowly. > > Several hours after booting, I got a bunch of "calcru: runtime went > backwards" messages, but they stopped right away and never appeared > again. > > Aug 23 13:40:07 scratch ntpd[1159]: ntpd 4.2.4p5-a (1) > Aug 23 13:43:18 scratch ntpd[1160]: kernel time sync status change 2001 > Aug 23 18:05:57 scratch dbus-daemon: [system] Reloaded configuration > Aug 23 18:06:16 scratch dbus-daemon: [system] Reloaded configuration > Aug 23 18:12:40 scratch ntpd[1160]: time reset +18.059948 s > [snip] > Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 6836685136 usec to 5425839798 usec for pid 1526 (csh) > Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 4747 usec to 2403 usec for pid 1519 (csh) > Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 5265 usec to 2594 usec for pid 1494 (hald-addon-mouse-sy) > Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 7818 usec to 3734 usec for pid 1488 (console-kit-daemon) > Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 977 usec to 459 usec for pid 1480 (getty) > Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 958 usec to 450 usec for pid 1479 (getty) > Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 957 usec to 449 usec for pid 1478 (getty) > Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 952 usec to 447 usec for pid 1477 (getty) > Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 959 usec to 450 usec for pid 1476 (getty) > Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 975 usec to 458 usec for pid 1475 (getty) > Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 1026 usec to 482 usec for pid 1474 (getty) > Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 1333 usec to 626 usec for pid 1473 (getty) > Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 2469 usec to 1160 usec for pid 1440 (inetd) > Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 719 usec to 690 usec for pid 1402 (sshd) > Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 120486 usec to 56770 usec for pid 1360 (cupsd) > Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 6204 usec to 2914 usec for pid 1289 (dbus-daemon) > Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 179 usec to 84 usec for pid 1265 (moused) > Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 22156 usec to 10407 usec for pid 1041 (nfsd) > Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 1292 usec to 607 usec for pid 1032 (mountd) > Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 8801 usec to 4134 usec for pid 664 (devd) > Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 19 usec to 9 usec for pid 9 (sctp_iterator) > > > If I reboot and run the test again, the CPU time accounting seems to be > working correctly. > % time ./a.out > 1144.226u 0.000s 19:06.62 99.7% 5+168k 0+0io 0pf+0w > > > I'm not sure how long this problem has been present. I do remember > seeing the calcru messages with an August 23rd kernel. I have not seen > the calcru messages when running -CURRENT on the same hardware. I also > have not seen this same problem on my other Athlon 64 box running the > August 23rd kernel. > > Before reboot: > # sysctl kern.timecounter > kern.timecounter.tick: 1 > kern.timecounter.choice: TSC(800) ACPI-fast(1000) i8254(0) dummy(-1000000) > kern.timecounter.hardware: ACPI-fast > kern.timecounter.stepwarnings: 0 > kern.timecounter.tc.i8254.mask: 4294967295 > kern.timecounter.tc.i8254.counter: 3534 > kern.timecounter.tc.i8254.frequency: 1193182 > kern.timecounter.tc.i8254.quality: 0 > kern.timecounter.tc.ACPI-fast.mask: 16777215 > kern.timecounter.tc.ACPI-fast.counter: 8685335 > kern.timecounter.tc.ACPI-fast.frequency: 3579545 > kern.timecounter.tc.ACPI-fast.quality: 1000 > kern.timecounter.tc.TSC.mask: 4294967295 > kern.timecounter.tc.TSC.counter: 2204228369 > kern.timecounter.tc.TSC.frequency: 2500018183 > kern.timecounter.tc.TSC.quality: 800 > kern.timecounter.invariant_tsc: 0 > > After reboot: > % sysctl kern.timecounter > kern.timecounter.tick: 1 > kern.timecounter.choice: TSC(800) ACPI-fast(1000) i8254(0) dummy(-1000000) > kern.timecounter.hardware: ACPI-fast > kern.timecounter.stepwarnings: 0 > kern.timecounter.tc.i8254.mask: 4294967295 > kern.timecounter.tc.i8254.counter: 2241 > kern.timecounter.tc.i8254.frequency: 1193182 > kern.timecounter.tc.i8254.quality: 0 > kern.timecounter.tc.ACPI-fast.mask: 16777215 > kern.timecounter.tc.ACPI-fast.counter: 4636239 > kern.timecounter.tc.ACPI-fast.frequency: 3579545 > kern.timecounter.tc.ACPI-fast.quality: 1000 > kern.timecounter.tc.TSC.mask: 4294967295 > kern.timecounter.tc.TSC.counter: 1429996208 > kern.timecounter.tc.TSC.frequency: 2500020459 > kern.timecounter.tc.TSC.quality: 800 > kern.timecounter.invariant_tsc: 0 > > > > Here's my kernel config file (uni-processor i386 kernel running on an > Athlon 64 x2 CPU): > include GENERIC > > nocpu I486_CPU > nocpu I586_CPU > > nooptions SCHED_4BSD # 4BSD scheduler > options SCHED_ULE > > # Debugging for use in -current > options KDB # Enable kernel debugger support. > options DDB # Support DDB. > options GDB # Support remote GDB. > > nooptions SMP # Symmetric MultiProcessor Kernel > nodevice apic # I/O APIC > > nodevice atapicd # ATAPI CDROM drives > device atapicam # emulate ATAPI devices as SCSI ditto via CAM > > > /var/run/dmesg.boot: > Copyright (c) 1992-2010 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved. > FreeBSD is a registered trademark of The FreeBSD Foundation. > FreeBSD 8.1-STABLE #6: Thu Sep 23 16:03:29 PDT 2010 > dl@scratch.catspoiler.org:/usr/obj/usr/src/sys/GENERICDDB i386 > Timecounter "i8254" frequency 1193182 Hz quality 0 > CPU: AMD Athlon(tm) 64 X2 Dual Core Processor 4800+ (2500.02-MHz 686-class CPU) > Origin = "AuthenticAMD" Id = 0x60fb1 Family = f Model = 6b Stepping = 1 > Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> > Features2=0x2001<SSE3,CX16> > AMD Features=0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!> > AMD Features2=0x11f<LAHF,CMP,SVM,ExtAPIC,CR8,Prefetch> > real memory = 4294967296 (4096 MB) > avail memory = 3607351296 (3440 MB) > kbd1 at kbdmux0 > ACPI Warning: Optional field Pm2ControlBlock has zero address or length: 0x0000000000000000/0x1 (20100331/tbfadt-655) > acpi0: <Nvidia AWRDACPI> on motherboard > acpi0: [ITHREAD] > acpi0: Power Button (fixed) > acpi0: reservation of 0, a0000 (3) failed > acpi0: reservation of 100000, dbdf0000 (3) failed > Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 > acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 > cpu0: <ACPI CPU> on acpi0 > acpi_hpet0: <High Precision Event Timer> iomem 0xfeff0000-0xfeff03ff on acpi0 > device_attach: acpi_hpet0 attach returned 12 > acpi_button0: <Power Button> on acpi0 > pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 > pci0: <ACPI PCI bus> on pcib0 > pci0: <memory, RAM> at device 0.0 (no driver attached) > isab0: <PCI-ISA bridge> at device 1.0 on pci0 > isa0: <ISA bus> on isab0 > pci0: <serial bus, SMBus> at device 1.1 (no driver attached) > pci0: <memory, RAM> at device 1.2 (no driver attached) > ohci0: <OHCI (generic) USB controller> mem 0xfe02f000-0xfe02ffff irq 10 at device 2.0 on pci0 > ohci0: [ITHREAD] > usbus0: <OHCI (generic) USB controller> on ohci0 > ehci0: <EHCI (generic) USB 2.0 controller> mem 0xfe02e000-0xfe02e0ff irq 11 at device 2.1 on pci0 > ehci0: [ITHREAD] > usbus1: EHCI version 1.0 > usbus1: <EHCI (generic) USB 2.0 controller> on ehci0 > ohci1: <OHCI (generic) USB controller> mem 0xfe02d000-0xfe02dfff irq 5 at device 4.0 on pci0 > ohci1: [ITHREAD] > usbus2: <OHCI (generic) USB controller> on ohci1 > ehci1: <EHCI (generic) USB 2.0 controller> mem 0xfe02c000-0xfe02c0ff irq 10 at device 4.1 on pci0 > ehci1: [ITHREAD] > usbus3: EHCI version 1.0 > usbus3: <EHCI (generic) USB 2.0 controller> on ehci1 > atapci0: <nVidia nForce MCP67 UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 6.0 on pci0 > ata0: <ATA channel 0> on atapci0 > ata0: [ITHREAD] > ata1: <ATA channel 1> on atapci0 > ata1: [ITHREAD] > pci0: <multimedia, HDA> at device 7.0 (no driver attached) > pcib1: <ACPI PCI-PCI bridge> at device 8.0 on pci0 > pci1: <ACPI PCI bus> on pcib1 > fwohci0: <Texas Instruments TSB43AB22/A> mem 0xfd0ff000-0xfd0ff7ff,0xfd0f8000-0xfd0fbfff irq 11 at device 7.0 on pci1 > fwohci0: [ITHREAD] > fwohci0: OHCI version 1.10 (ROM=1) > fwohci0: No. of Isochronous channels is 4. > fwohci0: EUI64 00:50:8d:00:00:99:f0:69 > fwohci0: Phy 1394a available S400, 2 ports. > fwohci0: Link S400, max_rec 2048 bytes. > firewire0: <IEEE1394(FireWire) bus> on fwohci0 > dcons_crom0: <dcons configuration ROM> on firewire0 > dcons_crom0: bus_addr 0x1090000 > fwe0: <Ethernet over FireWire> on firewire0 > if_fwe0: Fake Ethernet address: 02:50:8d:99:f0:69 > fwe0: Ethernet address: 02:50:8d:99:f0:69 > fwip0: <IP over FireWire> on firewire0 > fwip0: Firewire address: 00:50:8d:00:00:99:f0:69 @ 0xfffe00000000, S400, maxrec 2048 > fwohci0: Initiate bus reset > fwohci0: fwohci_intr_core: BUS reset > fwohci0: fwohci_intr_core: node_id=0x00000000, SelfID Count=1, CYCLEMASTER mode > ahc0: <Adaptec 19160B Ultra160 SCSI adapter> port 0xcc00-0xccff mem 0xfd0fe000-0xfd0fefff irq 11 at device 9.0 on pci1 > ahc0: [ITHREAD] > aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs > atapci1: <nVidia nForce MCP67 SATA300 controller> port 0x9f0-0x9f7,0xbf0-0xbf3,0x970-0x977,0xb70-0xb73,0xdc00-0xdc0f mem 0xfe026000-0xfe027fff irq 10 at device 9.0 on pci0 > atapci1: [ITHREAD] > atapci1: AHCI v1.10 controller with 4 3Gbps ports, PM supported > ata2: <ATA channel 0> on atapci1 > ata2: [ITHREAD] > ata3: <ATA channel 1> on atapci1 > ata3: [ITHREAD] > ata4: <ATA channel 2> on atapci1 > ata4: [ITHREAD] > ata5: <ATA channel 3> on atapci1 > ata5: [ITHREAD] > nfe0: <NVIDIA nForce MCP67 Networking Adapter> port 0xd800-0xd807 mem 0xfe02b000-0xfe02bfff,0xfe02a000-0xfe02a0ff,0xfe029000-0xfe02900f irq 15 at device 10.0 on pci0 > miibus0: <MII bus> on nfe0 > e1000phy0: <Marvell 88E1116 Gigabit PHY> PHY 1 on miibus0 > e1000phy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto > nfe0: Ethernet address: 00:50:8d:9f:6d:e3 > nfe0: [FILTER] > pcib2: <ACPI PCI-PCI bridge> at device 11.0 on pci0 > pci2: <ACPI PCI bus> on pcib2 > pcib3: <ACPI PCI-PCI bridge> at device 12.0 on pci0 > pci3: <ACPI PCI bus> on pcib3 > atapci2: <SiI 3132 SATA300 controller> port 0xac00-0xac7f mem 0xfdcff000-0xfdcff07f,0xfdcf8000-0xfdcfbfff irq 5 at device 0.0 on pci3 > atapci2: [ITHREAD] > ata6: <ATA channel 0> on atapci2 > ata6: [ITHREAD] > ata7: <ATA channel 1> on atapci2 > ata7: [ITHREAD] > pcib4: <ACPI PCI-PCI bridge> at device 13.0 on pci0 > pci4: <ACPI PCI bus> on pcib4 > pcib5: <ACPI PCI-PCI bridge> at device 14.0 on pci0 > pci5: <ACPI PCI bus> on pcib5 > pcib6: <ACPI PCI-PCI bridge> at device 15.0 on pci0 > pci6: <ACPI PCI bus> on pcib6 > pcib7: <ACPI PCI-PCI bridge> at device 16.0 on pci0 > pci7: <ACPI PCI bus> on pcib7 > pcib8: <ACPI PCI-PCI bridge> at device 17.0 on pci0 > pci8: <ACPI PCI bus> on pcib8 > vgapci0: <VGA-compatible display> mem 0xfb000000-0xfbffffff,0xe0000000-0xefffffff,0xfc000000-0xfcffffff irq 10 at device 18.0 on pci0 > acpi_tz0: <Thermal Zone> on acpi0 > atrtc0: <AT realtime clock> port 0x70-0x73 on acpi0 > fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 > fdc0: [FILTER] > atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 > atkbd0: <AT Keyboard> irq 1 on atkbdc0 > kbd0 at atkbd0 > atkbd0: [GIANT-LOCKED] > atkbd0: [ITHREAD] > psm0: <PS/2 Mouse> irq 12 on atkbdc0 > psm0: [GIANT-LOCKED] > psm0: [ITHREAD] > psm0: model MouseMan+, device ID 0 > pmtimer0 on isa0 > sc0: <System console> at flags 0x100 on isa0 > sc0: VGA <16 virtual consoles, flags=0x300> > vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 > ppc0: parallel port not found. > powernow0: <PowerNow! K8> on cpu0 > Timecounter "TSC" frequency 2500018183 Hz quality 800 > Timecounters tick every 1.000 msec > firewire0: 1 nodes, maxhop <= 0 cable IRM irm(0) (me) > firewire0: bus manager 0 > usbus0: 12Mbps Full Speed USB v1.0 > usbus1: 480Mbps High Speed USB v2.0 > usbus2: 12Mbps Full Speed USB v1.0 > usbus3: 480Mbps High Speed USB v2.0 > ugen0.1: <nVidia> at usbus0 > uhub0: <nVidia OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0 > ugen1.1: <nVidia> at usbus1 > uhub1: <nVidia EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1 > ugen2.1: <nVidia> at usbus2 > uhub2: <nVidia OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2 > ugen3.1: <nVidia> at usbus3 > uhub3: <nVidia EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3 > uhub0: 6 ports with 6 removable, self powered > uhub2: 6 ports with 6 removable, self powered > ad6: 476940MB <SAMSUNG HD501LJ CR100-13> at ata3-master UDMA100 SATA 3Gb/s > uhub1: 6 ports with 6 removable, self powered > uhub3: 6 ports with 6 removable, self powered > unknown: FAILURE - INQUIRY ILLEGAL REQUEST asc=0x24 ascq=0x00 sks=0x40 0x00 0x01 > cd0 at ata0 bus 0 scbus1 target 0 lun 0 > cd0: <TOSHIBA CD-ROM XM-6602B 1017> Removable CD-ROM SCSI-0 device > cd0: 3.300MB/s transfers > cd0: cd present [138590 x 2048 byte records] > da0 at ahc0 bus 0 scbus0 target 0 lun 0 > da0: <SEAGATE ST336706LW 010A> Fixed Direct Access SCSI-3 device > da0: 160.000MB/s transfers (80.000MHz DT, offset 63, 16bit) > da0: Command Queueing enabled > da0: 35003MB (71687370 512 byte sectors: 255H 63S/T 4462C) > Trying to mount root from ufs:/dev/ad6s1a > nfe0: link state changed to UP Do you have something called Cool'n'Quiet enabled in your BIOS? Solely as a test, try disabling it. If this solves the problem, add "device cpufreq" to your kernel configuration, buildkernel/installkernel, and re-enable the option in your BIOS. It's worth a shot anyway. Some reference threads which are old and may be centralised around Intel CPUs but I imagine the same problem can happen on any platform which adjusts clock frequencies. http://lists.freebsd.org/pipermail/freebsd-questions/2006-October/133253.html http://www.mail-archive.com/freebsd-stable@freebsd.org/msg95530.html -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100928051154.GA73859>