From owner-freebsd-hackers@FreeBSD.ORG Tue Jul 18 12:18:07 2006 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 380F716A4DA for ; Tue, 18 Jul 2006 12:18:07 +0000 (UTC) (envelope-from GMcCaughan@synaptics-uk.com) Received: from mx2.synaptics-uk.com (mx2.synaptics-uk.com [194.203.111.209]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6C0AA43D49 for ; Tue, 18 Jul 2006 12:18:06 +0000 (GMT) (envelope-from GMcCaughan@synaptics-uk.com) Received: from firewall.synaptics-uk.com ([194.203.111.212] helo=ukexchange2k.synaptics-inc.local) by mx2.synaptics-uk.com with esmtp (Exim 4.20) id 1G2ozG-0008EA-4D for freebsd-hackers@freebsd.org; Tue, 18 Jul 2006 13:47:42 +0100 Received: from lists.synaptics-uk.com ([172.20.11.6]) by ukexchange2k.synaptics-inc.local with Microsoft SMTPSVC(5.0.2195.6713); Tue, 18 Jul 2006 13:17:34 +0100 Received: from [172.20.11.5] (unknown [172.20.11.5]) by lists.synaptics-uk.com (Postfix) with ESMTP id DE46A17011 for ; Tue, 18 Jul 2006 12:57:56 +0100 (BST) From: Gareth McCaughan To: freebsd-hackers@freebsd.org Date: Tue, 18 Jul 2006 13:17:32 +0100 User-Agent: KMail/1.9.1 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200607181317.33416.gmccaughan@synaptics-uk.com> X-OriginalArrivalTime: 18 Jul 2006 12:17:34.0777 (UTC) FILETIME=[269AD290:01C6AA64] X-Mailman-Approved-At: Tue, 18 Jul 2006 12:27:11 +0000 Subject: "swiN: clock sio" process taking 75% CPU X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Jul 2006 12:18:07 -0000 (I've asked this question on -questions and -stable, with no success; hence I'm taking it to the assembled wizardry of -hackers. A bit of googling suggests that I'm far from the first person to have had a similar problem, though it seems to be worse for me than for the others I've found.) I have a box running 6-STABLE, cvsupped last week. Until recently it was running 5.something and showed the same peculiar behaviour as I'm about to describe. Further back, it used to run 4.x, and I don't recall anything like this happening then. About 6 minutes after booting (on three occasions, but I don't guarantee this doesn't vary), a process (well, a kernel interrupt thread, I guess) that appears in the output of "ps" as "[swi4: clock sio]" begins to use about 3/4 of the machine's CPU. I think it does so more or less instantaneously. It continues to do so indefinitely, so far as I can tell. I'm not aware of anything specific that triggers this, though I suppose there must *be* something. It happens apparently spontaneously, on a lightly loaded machine. Those cycles are genuinely being consumed; other processes run much more slowly than they "should", and take much more wall time than CPU time. I've tried diddling my kernel's HZ value; the behaviour with HZ=100 and with HZ=1000 is the same, so far as I'm able to tell. I've no idea whether it might be relevant, but I have option DEVICE_POLLING turned on; toggling sysctl kern.polling.enable doesn't seem to make any difference. The machine is a very uninteresting single-CPU Athlon box, clocked at 1.6GHz, several years old. Here's its dmesg output, with a few uninteresting bits of information leakage elided. ---------- dmesg output begins ---------- Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.1-STABLE #3: Thu Jul 13 16:46:07 BST 2006 root@xxxxxxx.xxxxxxxx.com:/usr/obj/usr/src/sys/xxxxxxxx ACPI APIC Table: Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Athlon(tm) XP 1900+ (1599.73-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x662 Stepping = 2 Features=0x383fbff AMD Features=0xc0480800 real memory = 536805376 (511 MB) avail memory = 515944448 (492 MB) ioapic0 irqs 0-23 on motherboard netsmb_dev: loaded acpi0: on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 cpu0: on acpi0 acpi_throttle0: on cpu0 acpi_button0: on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 agp0: mem 0xe0000000-0xe3ffffff at device 0.0 on pci0 pcib1: at device 1.0 on pci0 pci1: on pcib1 pci1: at device 0.0 (no driver attached) xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xec00-0xec7f mem 0xdfffff80-0xdfffffff irq 17 at device 6.0 on pci0 miibus0: on xl0 ukphy0: on miibus0 ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto xl0: Ethernet address: 00:04:75:7e:54:fd isab0: at device 17.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 17.1 on pci0 ata0: on atapci0 ata1: on atapci0 uhci0: port 0xdc00-0xdc1f irq 5 at device 17.2 on pci0 uhci0: [GIANT-LOCKED] usb0: on uhci0 usb0: USB revision 1.0 uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: port 0xe000-0xe01f irq 5 at device 17.3 on pci0 uhci1: [GIANT-LOCKED] usb1: on uhci1 usb1: USB revision 1.0 uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered uhci2: port 0xe400-0xe41f irq 5 at device 17.4 on pci0 uhci2: [GIANT-LOCKED] usb2: on uhci2 usb2: USB revision 1.0 uhub2: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered pci0: at device 17.5 (no driver attached) acpi_button1: on acpi0 fdc0: port 0x3f2-0x3f3,0x3f4-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FAST] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A ppc0: port 0x378-0x37f,0x778-0x77b irq 7 drq 3 on acpi0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/9 bytes threshold ppbus0: on ppc0 plip0: on ppbus0 lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] psm0: irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: model Generic PS/2 mouse, device ID 0 pmtimer0 on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounter "TSC" frequency 1599730245 Hz quality 800 Timecounters tick every 10.000 msec ad0: 38166MB at ata0-master UDMA100 acd0: DVDROM at ata1-master UDMA33 Trying to mount root from ufs:/dev/ad0s1a ---------- dmesg output ends ---------- vmstat -i produces pretty normal-looking numbers (this is while running with HZ=100): | interrupt total rate | irq1: atkbd0 3 0 | irq6: fdc0 10 0 | irq14: ata0 2913 1 | irq15: ata1 47 0 | irq17: xl0 7342 4 | cpu0: timer 302649 199 | Total 312964 206 A little more information, in case it's useful: | $ echo; sysctl debug | egrep to_ | debug.to_avg_mpcalls: 2890 | debug.to_avg_mtxcalls: 0 | debug.to_avg_gcalls: 768 | debug.to_avg_depth: 3815 That's with HZ = 100. Here are some numbers from a message in freebsd-ia64, from Marcel Moolenaar, in 2004-07, to someone seeing symptoms like mine. They're meant to be typical healthy numbers. Mine above look somewhat worse, but not insanely so; surely not enough to explain the difference between using 0.3% cpu and using 75%. Marcel also had HZ=100. | % sysctl debug | grep to_avg | debug.to_avg_depth: 2500 | debug.to_avg_gcalls: 1003 | debug.to_avg_mpcalls: 1255 I would be grateful for any insight into what's going wrong and how (if at all) it can be fixed or worked around. I'm not subscribed to -hackers, so would prefer to be cc'ed, but I'll check the web archives and should therefore see any responses that go only to the list. I can diddle fairly freely with the machine if that proves useful, and don't mind tweaking internals if it's diagnostically helpful, but I'm not a FreeBSD kernel expert and may therefore need some help identifying what to tweak and how to interpret the results. Thanks in advance! -- Gareth McCaughan