Date: Sun, 14 Nov 2010 20:11:18 GMT From: Loic Pefferkorn <loic-freebsd@loicp.eu> To: freebsd-gnats-submit@FreeBSD.org Subject: kern/152250: [patch] Kernel panic when hw.ciss.expose_hidden_physical is set Message-ID: <201011142011.oAEKBIAH018826@www.freebsd.org> Resent-Message-ID: <201011142020.oAEKK83C027504@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 152250 >Category: kern >Synopsis: [patch] Kernel panic when hw.ciss.expose_hidden_physical is set >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sun Nov 14 20:20:08 UTC 2010 >Closed-Date: >Last-Modified: >Originator: Loic Pefferkorn >Release: 7.2-RELEASE >Organization: >Environment: FreeBSD squeak.estat 7.2-STABLE FreeBSD 7.2-STABLE #5: Sun Nov 14 20:35:21 CET 2010 root@squeak.estat:/usr/obj/usr/src/sys/GENERIC amd64 >Description: HP ProLiant DL360 G6 server with an HP StorageWorks MSL4048 Tape Library # grep ciss /boot/loader.conf hw.ciss.expose_hidden_physical=1 When the tunable hw.ciss.expose_hidden_physical is set at boot time, I have a kernel panic: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x8 fault code = supervisor read data, page not present instruction pointer = 0x8:0xffffffff80201686 stack pointer = 0x10:0xffffff807c6ab930 frame pointer = 0x10:0x400 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 77 (sysctl) trap number = 12 panic: page fault cpuid = 0 Uptime: 6s Physical memory: 4073 MB Dumping 1230 MB: Backtrace from the core dump: (kgdb) bt #0 doadump () at pcpu.h:195 #1 0x0000000000000004 in ?? () #2 0xffffffff8054cff9 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 #3 0xffffffff8054d402 in panic (fmt=0x104 <Address 0x104 out of bounds>) at /usr/src/sys/kern/kern_shutdown.c:574 #4 0xffffffff80812563 in trap_fatal (frame=0xffffff0003eb4390, eva=Variable "eva" is not available. ) at /usr/src/sys/amd64/amd64/trap.c:756 #5 0xffffffff80812935 in trap_pfault (frame=0xffffff807c6ab880, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:672 #6 0xffffffff80813274 in trap (frame=0xffffff807c6ab880) at /usr/src/sys/amd64/amd64/trap.c:443 #7 0xffffffff807fd2ce in calltrap () at /usr/src/sys/amd64/amd64/exception.S:218 #8 0xffffffff80201686 in acpi_child_pnpinfo_str_method (cbdev=Variable "cbdev" is not available. ) at /usr/src/sys/dev/acpica/acpi.c:850 #9 0xffffffff805753c9 in device_sysctl_handler (oidp=Variable "oidp" is not available. ) at /usr/src/sys/kern/subr_bus.c:260 #10 0xffffffff8055654f in sysctl_root (oidp=Variable "oidp" is not available. ) at /usr/src/sys/kern/kern_sysctl.c:1419 #11 0xffffffff805578c5 in userland_sysctl (td=0x0, name=0xffffff807c6abac0, namelen=4, old=0x0, oldlenp=Variable "oldlenp" is not available. ) at /usr/src/sys/kern/kern_sysctl.c:1522 #12 0xffffffff80557ad2 in __sysctl (td=0xffffff0003eb4390, uap=0xffffff807c6abbf0) at /usr/src/sys/kern/kern_sysctl.c:1449 #13 0xffffffff80812bb7 in syscall (frame=0xffffff807c6abc80) at /usr/src/sys/amd64/amd64/trap.c:899 #14 0xffffffff807fd4db in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:339 #15 0x0000000800719cac in ?? () Previous frame inner to this frame (corrupt stack?) Faulty instruction: (kgdb) x/i 0xffffffff80201686 0xffffffff80201686 <acpi_child_pnpinfo_str_method+70>: mov 0x8(%rbx),%edx >How-To-Repeat: With the same hardware, put hw.ciss.expose_hidden_physical=1 in loader.conf and reboot. >Fix: Last called function is acpi_child_pnpinfo_str_method in sys/dev/acpica/acpi.c static int acpi_child_pnpinfo_str_method(device_t cbdev, device_t child, char *buf, size_t buflen) { ACPI_BUFFER adbuf = {ACPI_ALLOCATE_BUFFER, NULL}; ACPI_DEVICE_INFO *adinfo; struct acpi_device *dinfo = device_get_ivars(child); char *end; int error; error = AcpiGetObjectInfo(dinfo->ad_handle, &adbuf); adinfo = (ACPI_DEVICE_INFO *) adbuf.Pointer; if (error) snprintf(buf, buflen, "unknown"); else snprintf(buf, buflen, "_HID=%s _UID=%lu", (adinfo->Valid & ACPI_VALID_HID) ? adinfo->HardwareId.Value : "none", (adinfo->Valid & ACPI_VALID_UID) ? strtoul(adinfo->UniqueId.Value, &end, 10) : 0); if (adinfo) AcpiOsFree(adinfo); return (0); } buf is modified accordingly to "error" value. I have found adbuf.Pointer to be set to 0x0 while "error" was set to a zero value. Therefore, references to adinfo struct in snprintf have 0x0 as base. "error" value is not set correctly. Let's see why in AcpiGetObjectInfo, in sys/contrib/dev/acpica/nsxfname.c Node = AcpiNsMapHandleToNode (Handle); if (!Node) { (void) AcpiUtReleaseMutex (ACPI_MTX_NAMESPACE); goto Cleanup; } (...) Cleanup: ACPI_FREE (Info); if (CidList) { ACPI_FREE (CidList); } return (Status); If AcpiNsMapHandleToNode fails, we release a mutex and go to Cleanup:, which does not update Status value before return. Status value hence is the one from AcpiUtAcquireMutex called earlier, which is wrong. Setting Status to AE_BAD_PARAMETER before going to Cleanup fix the issue (I found that AE_BAD_PARAMETER is used elsewhere in the kernel in similar flows when AcpiNsMapHandleToNode is called). 7.0 to 7.3 are affected, patch is attached. Hope I'm right :) Patch attached with submission follows: --- src/sys/contrib/dev/acpica/nsxfname.c.orig 2010-11-14 20:51:57.000000000 +0100 +++ src/sys/contrib/dev/acpica/nsxfname.c 2010-11-14 20:50:46.000000000 +0100 @@ -361,6 +361,7 @@ if (!Node) { (void) AcpiUtReleaseMutex (ACPI_MTX_NAMESPACE); + Status = AE_BAD_PARAMETER; goto Cleanup; } >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201011142011.oAEKBIAH018826>