Date: Sun, 23 Aug 2009 23:48:28 GMT From: Terrence Koeman <root@mediamonks.net> To: freebsd-gnats-submit@FreeBSD.org Subject: i386/138117: spin lock held too long Message-ID: <200908232348.n7NNmSc1033699@www.freebsd.org> Resent-Message-ID: <200908232350.n7NNo13E024802@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 138117 >Category: i386 >Synopsis: spin lock held too long >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-i386 >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sun Aug 23 23:50:01 UTC 2009 >Closed-Date: >Last-Modified: >Originator: Terrence Koeman >Release: 7.2-STABLE >Organization: >Environment: FreeBSD persephone 7.2-STABLE FreeBSD 7.2-STABLE #15: Mon Aug 24 01:00:54 CEST 2009 terrence@persephone:/usr/obj/usr/src/sys/PERSEPHONE-SMP-PAE i386 >Description: When rebooting the kernel panics with 'spin lock held too long' (full text below). This happens every reboot, not incidentally as I've seen it reported by others. I tried the following: -Removed PAE (revert to GENERIC), this decreased the panics from every reboot to about 50% of reboots. -Changed SCHED_ULE to SCHED_4BSD, did not change anything. -Added KDB_UNATTENDED, but the server hangs on the panic. Also the server hangs on boot, after the 10s countdown it hangs for about 2-3 minutes before either booting normally or (1 out of 10 boots) hangs forever (apparently, I've let it sit for hours). I did notice that when the server is in the hung state on booting, I can sometimes make it continue to boot by pressing and holding return on the (usb) keyboard. I've tried other keys, but only the return key sometimes works. It's not a coincidence, I can let it sit 10m or an hour and when then holding return it will continue to boot within 3-5 seconds. Aside from booting the server is rock-solid, I had it running overnight with a concurrent 'make -j 20 buildkernel && make -j 20 buildworld', 'portupgrade -afr' and '/usr/libexec/locate.updatedb' in a continous loop while running apachebench with 1500 connections on it from another machine on the LAN. Aside from an ACPI warning at boot (see below for dmesg) there are no errors whatsoever and also no dumps or possibility to get into the debugger. When ACPI is disabled the server never boots, it hangs at the same point after the 10s countdown but never hangs and does not respond to return. I'd be happy to get more info, but with no debugger or dump I don't know where to start :) Please advise. Panic: --- Rebooting... cpu_reset: Stopping other CPUs spin lock 0xc05a8c00 (sched lock 0) held by 0xd3087000 (tid 100011) too long panic: spin lock held too long cpuid = 0 --- (with about half the panics the 5 lines text above the panic message are garbled with what looks like random ASCII printable data) dmesg: (Server is an IBM xseries 445 with 4x 3Ghz Xeon and 12Gb memory. --- Copyright (c) 1992-2009 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.2-STABLE #15: Mon Aug 24 01:00:54 CEST 2009 terrence@persephone:/usr/obj/usr/src/sys/PERSEPHONE-SMP-PAE Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(TM) CPU 3.00GHz (2993.89-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf29 Stepping = 9 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x4400<CNXT-ID,xTPR> Logical CPUs per core: 2 real memory = 13421772800 (12800 MB) avail memory = 12619841536 (12035 MB) ACPI APIC Table: <IBM SERVIGIL> FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP/HT): APIC ID: 1 cpu2 (AP): APIC ID: 18 cpu3 (AP/HT): APIC ID: 19 cpu4 (AP): APIC ID: 32 cpu5 (AP/HT): APIC ID: 33 cpu6 (AP): APIC ID: 50 cpu7 (AP/HT): APIC ID: 51 ACPI Warning (tbfadt-0505): Optional field "Gpe1Block" has zero address or length: 0 0/4 [20070320] MADT: Forcing active-low polarity and level trigger for SCI ioapic1 <Version 1.1> irqs 44-87 on motherboard ioapic0 <Version 1.1> irqs 0-43 on motherboard kbd1 at kbdmux0 acpi0: <IBM SERVIGIL> on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: reservation of 400, 100 (3) failed Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <32-bit timer at 3.579545MHz> port 0x508-0x50b on acpi0 pcib0: <ACPI Host-PCI bridge> on acpi0 pci0: <ACPI PCI bus> on pcib0 vgapci0: <VGA-compatible display> port 0x1800-0x18ff mem 0xe0000000-0xe7ffffff,0xf0a20000-0xf0a2ffff irq 16 at device 4.0 on pci0 isab0: <PCI-ISA bridge> at device 5.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <VIA 82C686B UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x700-0x70f at device 5.1 on pci0 ata0: <ATA channel 0> on atapci0 ata0: [ITHREAD] ata1: <ATA channel 1> on atapci0 ata1: [ITHREAD] uhci0: <VIA 83C572 USB controller> port 0x1900-0x191f irq 18 at device 5.2 on pci0 uhci0: [GIANT-LOCKED] uhci0: [ITHREAD] usb0: <VIA 83C572 USB controller> on uhci0 usb0: USB revision 1.0 uhub0: <VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0 uhub0: 2 ports with 2 removable, self powered uhci1: <VIA 83C572 USB controller> port 0x1920-0x193f irq 18 at device 5.3 on pci0 uhci1: [GIANT-LOCKED] uhci1: [ITHREAD] usb1: <VIA 83C572 USB controller> on uhci1 usb1: USB revision 1.0 uhub1: <VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb1 uhub1: 2 ports with 2 removable, self powered pci0: <serial bus, SMBus> at device 5.4 (no driver attached) pcib1: <ACPI Host-PCI bridge> on acpi0 pci1: <ACPI PCI bus> on pcib1 bge0: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x2002> mem 0xf0b00000-0xf0b0ffff,0xf0b10000-0xf0b1ffff irq 42 at device 4.0 on pci1 miibus0: <MII bus> on bge0 brgphy0: <BCM5704 10/100/1000baseTX PHY> PHY 1 on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto bge0: Ethernet address: 00:09:6b:e6:39:7f bge0: [ITHREAD] bge1: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x2002> mem 0xf0b20000-0xf0b2ffff,0xf0b30000-0xf0b3ffff irq 11 at device 4.1 on pci1 miibus1: <MII bus> on bge1 brgphy1: <BCM5704 10/100/1000baseTX PHY> PHY 1 on miibus1 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto bge1: Ethernet address: 00:09:6b:16:39:7f bge1: [ITHREAD] pcib2: <ACPI Host-PCI bridge> on acpi0 pci2: <ACPI PCI bus> on pcib2 pcib3: <ACPI Host-PCI bridge> on acpi0 pci5: <ACPI PCI bus> on pcib3 pcib4: <ACPI Host-PCI bridge> on acpi0 pci7: <ACPI PCI bus> on pcib4 ips0: <IBM ServeRAID Adapter> mem 0xf9c00000-0xf9c01fff irq 60 at device 3.0 on pci7 ips0: [ITHREAD] pcib5: <ACPI Host-PCI bridge> on acpi0 pci9: <ACPI PCI bus> on pcib5 fdc0: <floppy drive controller> port 0x3f0-0x3f5 irq 6 drq 2 on acpi0 fdc0: [FILTER] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 cpu0: <ACPI CPU> on acpi0 p4tcc0: <CPU Frequency Thermal Control> on cpu0 cpu1: <ACPI CPU> on acpi0 p4tcc1: <CPU Frequency Thermal Control> on cpu1 cpu2: <ACPI CPU> on acpi0 p4tcc2: <CPU Frequency Thermal Control> on cpu2 cpu3: <ACPI CPU> on acpi0 p4tcc3: <CPU Frequency Thermal Control> on cpu3 cpu4: <ACPI CPU> on acpi0 p4tcc4: <CPU Frequency Thermal Control> on cpu4 cpu5: <ACPI CPU> on acpi0 p4tcc5: <CPU Frequency Thermal Control> on cpu5 cpu6: <ACPI CPU> on acpi0 p4tcc6: <CPU Frequency Thermal Control> on cpu6 cpu7: <ACPI CPU> on acpi0 p4tcc7: <CPU Frequency Thermal Control> on cpu7 pmtimer0 on isa0 orm0: <ISA Option ROMs> at iomem 0xc0000-0xcafff,0xcb000-0xce7ff pnpid ORM0000 on isa0 atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 ukbd0: <Microsoft Natural\M-. Ergonomic Keyboard 4000, class 0/0, rev 2.00/1.73, addr 2> on uhub1 kbd2 at ukbd0 Timecounters tick every 1.000 msec ipfw2 (+ipv6) initialized, divert loadable, nat loadable, rule-based forwarding disabled, default to accept, logging limited to 2000 packets/entry by default acd0: DVDROM <MATSHITADVD-ROM SR-8177/NB21> at ata0-master UDMA33 ips0: adapter type: ServeRAID 4Lx (neo lite) ips0: logical drives: 1 ips0: Logical Drive 0: RAID1 sectors: 286748672, state OK ipsd0: <Logical Drive> on ips0 ipsd0: Logical Drive (140014MB) SMP: AP CPU #7 Launched! SMP: AP CPU #6 Launched! SMP: AP CPU #5 Launched! SMP: AP CPU #4 Launched! SMP: AP CPU #1 Launched! SMP: AP CPU #3 Launched! SMP: AP CPU #2 Launched! Trying to mount root from ufs:/dev/ipsd0s1a bge0: link state changed to UP --- >How-To-Repeat: reboot >Fix: >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200908232348.n7NNmSc1033699>