From owner-freebsd-i386@FreeBSD.ORG Sun Aug 23 23:50:02 2009 Return-Path: Delivered-To: freebsd-i386@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E39C8106568C for ; Sun, 23 Aug 2009 23:50:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id BF6318FC1F for ; Sun, 23 Aug 2009 23:50:01 +0000 (UTC) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n7NNo1OW024803 for ; Sun, 23 Aug 2009 23:50:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n7NNo13E024802; Sun, 23 Aug 2009 23:50:01 GMT (envelope-from gnats) Resent-Date: Sun, 23 Aug 2009 23:50:01 GMT Resent-Message-Id: <200908232350.n7NNo13E024802@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-i386@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Terrence Koeman Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5F2D8106568C for ; Sun, 23 Aug 2009 23:48:29 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21]) by mx1.freebsd.org (Postfix) with ESMTP id 4CCA58FC14 for ; Sun, 23 Aug 2009 23:48:29 +0000 (UTC) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.14.3/8.14.3) with ESMTP id n7NNmStQ033700 for ; Sun, 23 Aug 2009 23:48:28 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.14.3/8.14.3/Submit) id n7NNmSc1033699; Sun, 23 Aug 2009 23:48:28 GMT (envelope-from nobody) Message-Id: <200908232348.n7NNmSc1033699@www.freebsd.org> Date: Sun, 23 Aug 2009 23:48:28 GMT From: Terrence Koeman To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: i386/138117: spin lock held too long X-BeenThere: freebsd-i386@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: I386-specific issues for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 Aug 2009 23:50:02 -0000 >Number: 138117 >Category: i386 >Synopsis: spin lock held too long >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-i386 >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sun Aug 23 23:50:01 UTC 2009 >Closed-Date: >Last-Modified: >Originator: Terrence Koeman >Release: 7.2-STABLE >Organization: >Environment: FreeBSD persephone 7.2-STABLE FreeBSD 7.2-STABLE #15: Mon Aug 24 01:00:54 CEST 2009 terrence@persephone:/usr/obj/usr/src/sys/PERSEPHONE-SMP-PAE i386 >Description: When rebooting the kernel panics with 'spin lock held too long' (full text below). This happens every reboot, not incidentally as I've seen it reported by others. I tried the following: -Removed PAE (revert to GENERIC), this decreased the panics from every reboot to about 50% of reboots. -Changed SCHED_ULE to SCHED_4BSD, did not change anything. -Added KDB_UNATTENDED, but the server hangs on the panic. Also the server hangs on boot, after the 10s countdown it hangs for about 2-3 minutes before either booting normally or (1 out of 10 boots) hangs forever (apparently, I've let it sit for hours). I did notice that when the server is in the hung state on booting, I can sometimes make it continue to boot by pressing and holding return on the (usb) keyboard. I've tried other keys, but only the return key sometimes works. It's not a coincidence, I can let it sit 10m or an hour and when then holding return it will continue to boot within 3-5 seconds. Aside from booting the server is rock-solid, I had it running overnight with a concurrent 'make -j 20 buildkernel && make -j 20 buildworld', 'portupgrade -afr' and '/usr/libexec/locate.updatedb' in a continous loop while running apachebench with 1500 connections on it from another machine on the LAN. Aside from an ACPI warning at boot (see below for dmesg) there are no errors whatsoever and also no dumps or possibility to get into the debugger. When ACPI is disabled the server never boots, it hangs at the same point after the 10s countdown but never hangs and does not respond to return. I'd be happy to get more info, but with no debugger or dump I don't know where to start :) Please advise. Panic: --- Rebooting... cpu_reset: Stopping other CPUs spin lock 0xc05a8c00 (sched lock 0) held by 0xd3087000 (tid 100011) too long panic: spin lock held too long cpuid = 0 --- (with about half the panics the 5 lines text above the panic message are garbled with what looks like random ASCII printable data) dmesg: (Server is an IBM xseries 445 with 4x 3Ghz Xeon and 12Gb memory. --- Copyright (c) 1992-2009 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.2-STABLE #15: Mon Aug 24 01:00:54 CEST 2009 terrence@persephone:/usr/obj/usr/src/sys/PERSEPHONE-SMP-PAE Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(TM) CPU 3.00GHz (2993.89-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf29 Stepping = 9 Features=0xbfebfbff Features2=0x4400 Logical CPUs per core: 2 real memory = 13421772800 (12800 MB) avail memory = 12619841536 (12035 MB) ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP/HT): APIC ID: 1 cpu2 (AP): APIC ID: 18 cpu3 (AP/HT): APIC ID: 19 cpu4 (AP): APIC ID: 32 cpu5 (AP/HT): APIC ID: 33 cpu6 (AP): APIC ID: 50 cpu7 (AP/HT): APIC ID: 51 ACPI Warning (tbfadt-0505): Optional field "Gpe1Block" has zero address or length: 0 0/4 [20070320] MADT: Forcing active-low polarity and level trigger for SCI ioapic1 irqs 44-87 on motherboard ioapic0 irqs 0-43 on motherboard kbd1 at kbdmux0 acpi0: on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: reservation of 400, 100 (3) failed Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <32-bit timer at 3.579545MHz> port 0x508-0x50b on acpi0 pcib0: on acpi0 pci0: on pcib0 vgapci0: port 0x1800-0x18ff mem 0xe0000000-0xe7ffffff,0xf0a20000-0xf0a2ffff irq 16 at device 4.0 on pci0 isab0: at device 5.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x700-0x70f at device 5.1 on pci0 ata0: on atapci0 ata0: [ITHREAD] ata1: on atapci0 ata1: [ITHREAD] uhci0: port 0x1900-0x191f irq 18 at device 5.2 on pci0 uhci0: [GIANT-LOCKED] uhci0: [ITHREAD] usb0: on uhci0 usb0: USB revision 1.0 uhub0: on usb0 uhub0: 2 ports with 2 removable, self powered uhci1: port 0x1920-0x193f irq 18 at device 5.3 on pci0 uhci1: [GIANT-LOCKED] uhci1: [ITHREAD] usb1: on uhci1 usb1: USB revision 1.0 uhub1: on usb1 uhub1: 2 ports with 2 removable, self powered pci0: at device 5.4 (no driver attached) pcib1: on acpi0 pci1: on pcib1 bge0: mem 0xf0b00000-0xf0b0ffff,0xf0b10000-0xf0b1ffff irq 42 at device 4.0 on pci1 miibus0: on bge0 brgphy0: PHY 1 on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto bge0: Ethernet address: 00:09:6b:e6:39:7f bge0: [ITHREAD] bge1: mem 0xf0b20000-0xf0b2ffff,0xf0b30000-0xf0b3ffff irq 11 at device 4.1 on pci1 miibus1: on bge1 brgphy1: PHY 1 on miibus1 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto bge1: Ethernet address: 00:09:6b:16:39:7f bge1: [ITHREAD] pcib2: on acpi0 pci2: on pcib2 pcib3: on acpi0 pci5: on pcib3 pcib4: on acpi0 pci7: on pcib4 ips0: mem 0xf9c00000-0xf9c01fff irq 60 at device 3.0 on pci7 ips0: [ITHREAD] pcib5: on acpi0 pci9: on pcib5 fdc0: port 0x3f0-0x3f5 irq 6 drq 2 on acpi0 fdc0: [FILTER] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 cpu0: on acpi0 p4tcc0: on cpu0 cpu1: on acpi0 p4tcc1: on cpu1 cpu2: on acpi0 p4tcc2: on cpu2 cpu3: on acpi0 p4tcc3: on cpu3 cpu4: on acpi0 p4tcc4: on cpu4 cpu5: on acpi0 p4tcc5: on cpu5 cpu6: on acpi0 p4tcc6: on cpu6 cpu7: on acpi0 p4tcc7: on cpu7 pmtimer0 on isa0 orm0: at iomem 0xc0000-0xcafff,0xcb000-0xce7ff pnpid ORM0000 on isa0 atkbdc0: at port 0x60,0x64 on isa0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 ukbd0: on uhub1 kbd2 at ukbd0 Timecounters tick every 1.000 msec ipfw2 (+ipv6) initialized, divert loadable, nat loadable, rule-based forwarding disabled, default to accept, logging limited to 2000 packets/entry by default acd0: DVDROM at ata0-master UDMA33 ips0: adapter type: ServeRAID 4Lx (neo lite) ips0: logical drives: 1 ips0: Logical Drive 0: RAID1 sectors: 286748672, state OK ipsd0: on ips0 ipsd0: Logical Drive (140014MB) SMP: AP CPU #7 Launched! SMP: AP CPU #6 Launched! SMP: AP CPU #5 Launched! SMP: AP CPU #4 Launched! SMP: AP CPU #1 Launched! SMP: AP CPU #3 Launched! SMP: AP CPU #2 Launched! Trying to mount root from ufs:/dev/ipsd0s1a bge0: link state changed to UP --- >How-To-Repeat: reboot >Fix: >Release-Note: >Audit-Trail: >Unformatted: