From owner-freebsd-i386@FreeBSD.ORG Wed Feb 27 16:50:01 2008 Return-Path: Delivered-To: freebsd-i386@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 736FA1065671 for ; Wed, 27 Feb 2008 16:50:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 6E1098FC13 for ; Wed, 27 Feb 2008 16:50:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m1RGo1mq033725 for ; Wed, 27 Feb 2008 16:50:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m1RGo1aD033724; Wed, 27 Feb 2008 16:50:01 GMT (envelope-from gnats) Resent-Date: Wed, 27 Feb 2008 16:50:01 GMT Resent-Message-Id: <200802271650.m1RGo1aD033724@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-i386@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Jim Pingle Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7F16C106566C for ; Wed, 27 Feb 2008 16:48:50 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21]) by mx1.freebsd.org (Postfix) with ESMTP id 8B6D48FC14 for ; Wed, 27 Feb 2008 16:48:50 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.14.2/8.14.2) with ESMTP id m1RGk2uc004380 for ; Wed, 27 Feb 2008 16:46:02 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.14.2/8.14.1/Submit) id m1RGk2o3004379; Wed, 27 Feb 2008 16:46:02 GMT (envelope-from nobody) Message-Id: <200802271646.m1RGk2o3004379@www.freebsd.org> Date: Wed, 27 Feb 2008 16:46:02 GMT From: Jim Pingle To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: i386/121148: Repeatable sysctl crash (Fatal Trap 12) with ACPI enabled X-BeenThere: freebsd-i386@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: I386-specific issues for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Feb 2008 16:50:01 -0000 >Number: 121148 >Category: i386 >Synopsis: Repeatable sysctl crash (Fatal Trap 12) with ACPI enabled >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-i386 >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Wed Feb 27 16:50:00 UTC 2008 >Closed-Date: >Last-Modified: >Originator: Jim Pingle >Release: 7.0-PRERELEASE (RELENG_7) >Organization: HPC Internet Services >Environment: FreeBSD test1.hpcisp.com 7.0-PRERELEASE FreeBSD 7.0-PRERELEASE #1: Thu Feb 14 14:08:02 EST 2008 root@test1.hpcisp.com:/usr/obj/usr/src/sys/TEST i386 >Description: SuperMicro SuperServer 6022L-6 will not fully boot RELENG_7 unless I booth with ACPI disabled. RELENG_7_0 does not crash on the same hardware with the same config. Crash is as follows: Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 03 fault virtual address = 0x2043455c fault code = supervisor read, page not present instruction pointer = 0x20:0xc0742c86 stack pointer = 0x28:0xe8cada0c frame pointer = 0x28:0xe8cada38 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 68 (sysctl) trap number = 12 panic: page fault cpuid = 3 Uptime: 6s Physical memory: 2035 MB Dumping 65 MB: 50 34 18 2 The crash happens just after the "Entropy harvesting..." line, before swap is started. As you can see in the crash output, the offending process is sysctl. I can boot to single user mode, but if I issue sysctl -a while there, it also crashes. When sysctl -a is run in single user mode, the last three lines before the crash are (transcribed by hand, no serial console available): dev.pcib.3.%location: handle=\_SB_.PCI3 dev.pcib.3.%pnpinfo: _HID=PNP0A03 UID=3 dev.pcib.3.%parent: acpi0 With a working RELENG_7_0 the lines immediately following this are: dev.pcib.4.%desc: ACPI Host-PCI bridge dev.pcib.4.%driver: pcib dev.pcib.4.%location: handle=\_SB_.PCI4 dev.pcib.4.%pnpinfo: _HID=PNP0A03 _UID=4 dev.pcib.4.%parent: acpi0 I tried a binary search of the source tree to narrow down the crash. I found that one possible vector for the crash was introduced between 2007/12/19 20:00:00 (booted OK) and 2007/12/19 23:59:00 (crashed), which left me with only a handful of files to test. By process of elimination, I found that if I backed some changes out in src/sys/i386/i386/machdep.c, the crash stopped. src/sys/i386/i386/machdep.c v1.658 2007/08/09 njl - Boots OK src/sys/i386/i386/machdep.c v1.658.2.1 2007/12/19 rpaulo - Crashes The confusing part (to me) is that my next step was to update all the way to RELENG_7 as of yesterday, then back out those same changes, but the crash still happened. So either I misidentified the cause of the crash -- which is quite possible -- or it was reintroduced in some other change (or both!). kgdb output from vmcore.0: Unread portion of the kernel message buffer: Copyright (c) 1992-2008 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.0-PRERELEASE #0: Mon Feb 25 15:22:54 EST 2008 root@test1.hpcisp.com:/usr/obj/usr/src/sys/GENERIC Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) XEON(TM) CPU 2.00GHz (1999.94-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf24 Stepping = 4 Features=0x3febfbff Logical CPUs per core: 2 real memory = 2147418112 (2047 MB) avail memory = 2091872256 (1994 MB) ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 ACPI Warning (tbfadt-0505): Optional field "Gpe1Block" has zero address or length: 0 0/8 [20070320] MADT: Forcing active-low polarity and level trigger for SCI ioapic0 irqs 0-15 on motherboard ioapic1 irqs 16-31 on motherboard ioapic2 irqs 32-47 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) hptrr: HPT RocketRAID controller driver v1.1 (Feb 25 2008 15:20:56) acpi0: on motherboard ACPI Warning (dswload-0794): Type override - [DEB_] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [MLIB] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [IO__] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [DATA] had invalid type (String) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [SIO_] had invalid type (String) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [SB__] had invalid type (String) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [PM__] had invalid type (String) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [ICNT] had invalid type (String) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [ACPI] had invalid type (String) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [IORG] had invalid type (String) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [SB__] had invalid type (String) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [PM__] had invalid type (String) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [SIO_] had invalid type (String) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [PM__] had invalid type (String) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [BIOS] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [CMOS] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [KBC_] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [OEM_] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320] acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, 7ff00000 (3) failed Timecounter "ACPI-safe" frequency 3579545 Hz quality 850 acpi_timer0: <32-bit timer at 3.579545MHz> port 0x508-0x50b on acpi0 cpu0: on acpi0 p4tcc0: on cpu0 cpu1: on acpi0 p4tcc1: on cpu1 cpu2: on acpi0 p4tcc2: on cpu2 cpu3: on acpi0 p4tcc3: on cpu3 acpi_button0: on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 vgapci0: port 0xa800-0xa8ff mem 0xfd000000-0xfdffffff,0xfe5ff000-0xfe5fffff irq 18 at device 2.0 on pci0 fxp0: port 0xae80-0xaebf mem 0xfe5fc000-0xfe5fcfff,0xfe580000-0xfe59ffff irq 17 at device 4.0 on pci0 miibus0: on fxp0 inphy0: PHY 1 on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp0: Ethernet address: 00:30:48:20:a3:9e fxp0: [ITHREAD] fxp1: port 0xaf00-0xaf3f mem 0xfe5fd000-0xfe5fdfff,0xfe5a0000-0xfe5bffff irq 19 at device 5.0 on pci0 miibus1: on fxp1 inphy1: PHY 1 on miibus1 inphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp1: Ethernet address: 00:30:48:20:a3:9f fxp1: [ITHREAD] isab0: at device 15.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 15.1 on pci0 ata0: on atapci0 ata0: [ITHREAD] ata1: on atapci0 ata1: [ITHREAD] ohci0: mem 0xfe5fe000-0xfe5fefff irq 10 at device 15.2 on pci0 ohci0: [GIANT-LOCKED] ohci0: [ITHREAD] usb0: OHCI version 1.0, legacy support usb0: SMM does not respond, resetting usb0: on ohci0 usb0: USB revision 1.0 uhub0: <(0x1166) OHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0 uhub0: 4 ports with 4 removable, self powered pcib1: on acpi0 pci1: on pcib1 pcib2: on acpi0 pci2: on pcib2 pcib3: on acpi0 pci3: on pcib3 pcib4: on acpi0 pci4: on pcib4 asr0: mem 0xfeb00000-0xfebfffff,0xfb000000-0xfbffffff,0xf8000000-0xf9ffffff irq 29 at device 3.0 on pci4 asr0: [GIANT-LOCKED] asr0: [ITHREAD] asr0: ADAPTEC 2005S FW Rev. 380E, 2 channel, 2000 CCBs, Protocol I2O atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] psm0: irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: [ITHREAD] psm0: model NetMouse/NetScroll Optical, device ID 0 fdc0: port 0x3f2-0x3f3,0x3f4-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FILTER] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio0: [FILTER] sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A sio1: [FILTER] pmtimer0 on isa0 orm0: at iomem 0xc0000-0xc7fff,0xc8000-0xcdfff,0xce000-0xcefff,0xcf000-0xcffff pnpid ORM0000 on isa0 ppc0: at port 0x378-0x37f irq 7 on isa0 ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/8 bytes threshold ppbus0: on ppc0 ppbus0: [ITHREAD] plip0: on ppbus0 lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 ppc0: [GIANT-LOCKED] ppc0: [ITHREAD] sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 1.000 msec hptrr: no controller detected. acd0: CDROM at ata1-master UDMA33 da0 at asr0 bus 0 target 0 lun 0 da0: Fixed Direct Access SCSI-2 device ses0 at asr0 bus 0 target 6 lun 0 ses0: Fixed Processor SCSI-2 device SMP: AP CPU #3 Launched! SMP: AP CPU #2 Launched! SMP: AP CPU #1 Launched! Trying to mount root from ufs:/dev/da0s1a <118>Loading configuration files. <118>kernel dumps on /dev/da0s1b <118>Entropy harvesting: <118> interrupts <118> ethernet <118> point_to_point Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 03 fault virtual address = 0x2043455c fault code = supervisor read, page not present instruction pointer = 0x20:0xc0742c86 stack pointer = 0x28:0xe8cada0c frame pointer = 0x28:0xe8cada38 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 68 (sysctl) trap number = 12 panic: page fault cpuid = 3 Uptime: 6s Physical memory: 2035 MB Dumping 65 MB: 50 34 18 2 #0 doadump () at pcpu.h:195 195 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:195 #1 0xc073a688 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #2 0xc073a941 in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:563 #3 0xc0a19dc0 in trap_fatal (frame=0xe8cad9cc, eva=541279580) at /usr/src/sys/i386/i386/trap.c:899 #4 0xc0a1a030 in trap_pfault (frame=0xe8cad9cc, usermode=0, eva=541279580) at /usr/src/sys/i386/i386/trap.c:812 #5 0xc0a1a9ad in trap (frame=0xe8cad9cc) at /usr/src/sys/i386/i386/trap.c:490 #6 0xc0a01cab in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #7 0xc0742c86 in sysctl_sysctl_next_ls (lsp=Variable "lsp" is not available. ) at /usr/src/sys/kern/kern_sysctl.c:630 #8 0xc0742d46 in sysctl_sysctl_next_ls (lsp=Variable "lsp" is not available. ) at /usr/src/sys/kern/kern_sysctl.c:618 #9 0xc0742d83 in sysctl_sysctl_next_ls (lsp=Variable "lsp" is not available. ) at /usr/src/sys/kern/kern_sysctl.c:630 #10 0xc0742d83 in sysctl_sysctl_next_ls (lsp=Variable "lsp" is not available. ) at /usr/src/sys/kern/kern_sysctl.c:630 #11 0xc0742de6 in sysctl_sysctl_next (oidp=0xc0b4c940, arg1=0xe8cadc1c, arg2=4, req=0xe8cadba4) at /usr/src/sys/kern/kern_sysctl.c:651 #12 0xc07436f2 in sysctl_root (oidp=Variable "oidp" is not available. ) at /usr/src/sys/kern/kern_sysctl.c:1306 #13 0xc074382e in userland_sysctl (td=0xc5574210, name=0xe8cadc14, namelen=6, old=0xbfbfe4e8, oldlenp=0xbfbfe598, inkernel=0, new=0x0, newlen=0, retval=0xe8cadc10, flags=0) at /usr/src/sys/kern/kern_sysctl.c:1401 #14 0xc0744462 in __sysctl (td=0xc5574210, uap=0xe8cadcfc) at /usr/src/sys/kern/kern_sysctl.c:1336 #15 0xc0a1a378 in syscall (frame=0xe8cadd38) at /usr/src/sys/i386/i386/trap.c:1035 #16 0xc0a01d10 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:196 #17 0x00000033 in ?? () Previous frame inner to this frame (corrupt stack?) This is a testing machine that is only being used to evaluate 7.0 for use on similar hardware. I can take whatever debugging steps that are needed, just let me know what information is necessary to help resolve the issue. I tried posting this information to the -STABLE list, but received no replies. System is running with the most current BIOS available from the OEM. RAM tested OK with memtest86+ left running for a day or so. >How-To-Repeat: Attempt to boot with a RELENG_7 world/kernel on a SuperMicro SuperServer 6022L-6 with ACPI enabled. Alternately, boot to single user mode and issue "sysctl -a". Crashes every time in the exact same place. >Fix: Workaround is to run with ACPI disabled, but that is not desired. One part of the crash was possibly introduced with rev v1.658.2.1 of src/sys/i386/i386/machdep.c, but I am unable to repeat that fix on recent RELENG_7 sources. >Release-Note: >Audit-Trail: >Unformatted: