Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 9 Jan 2010 02:43:45 GMT
From:      Joshua Wise <jwise@andrew.cmu.edu>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   kern/142510: FreeBSD 8.0-RELEASE panic'ed after removing SATA (ahci) drive
Message-ID:  <201001090243.o092hjlZ000410@www.freebsd.org>
Resent-Message-ID: <201001090250.o092o1Fa045279@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         142510
>Category:       kern
>Synopsis:       FreeBSD 8.0-RELEASE panic'ed after removing SATA (ahci) drive
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Jan 09 02:50:01 UTC 2010
>Closed-Date:
>Last-Modified:
>Originator:     Joshua Wise
>Release:        8.0-RELEASE
>Organization:
self
>Environment:
[root@moroso ~]# uname -a
FreeBSD moroso.emarhavil.com 8.0-RELEASE FreeBSD 8.0-RELEASE #0: Sat Nov 21 15:02:08 UTC 2009     root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64

>Description:
After I pulled a drive, the system subsequently panic()'ed.  It was in use by ZFS at the time, but not all cases of pulling a drive while it's in use by ZFS make the machine panic.

Unfortunately, I don't have a dumpfile, but I do have a line number from at least one of the two traps that showed up that may indicate how the kernel came to the inconsistent state.

Here's a dump from the console (typed by hand):

[root@moroso /dev]# camcontrol eject ada2
Error received from stop unit command
[root@moroso /dev]# camcontrol rescan a(ada2:ahcich2:0:0:0) lost device
(ada2:ahcich2:0:0:0): Invalidating pack
(ada2:ahchch2:0:0:0): Synchronize cache failed
(ada2:ahcich2:0:0:0): removing device entry


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address = 0x48
fault code = supervisor write data, page not present
instruction pointer = 0x20:0xffffffff80571105
stack pointer = 0x28:0xffffff8000077b60
stack pointer = 0x28:0xffffff8000077b70
code segment = base 0x0, limit 0xfffff, type 0x1b
             = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 2 (g_event)
trap number = 12
panic: page fault
cpuid = 1
Uptime: 34m43s
(ada0:ahcich0:0:0:0): Synchronize cache failed

Dump failed. Partition too small.
Automatic reboot in 15 seconds - press a key on the console to abort


Fatal trap 9: general protection fault while in kernel mode
cpuid = 3; apic id = 03
instruction pointer = 0x20:0xffffffff80194cdc
stack pointer = 0x28:0xffffff80e7d6aa60
frame pointer = 0x28:0xffffff80e7d6aa90
code segment = base 0x0, limit 0xfffff, type 0x1b
             = dpl 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 12 (irq258: ahci0)
trap number = 9
--> Press a key on the console to reboot,
--> or switch off the system now.

And, here's a kgdb session looking up the appropriate symbols:
[joshua@moroso /boot/kernel]$ kgdb kernel.symbols
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
(kgdb) list *0x20:0xffffffff80571105
A syntax error in expression, near `:0xffffffff80571105'.
(kgdb) list *0xffffffff80571105
0xffffffff80571105 is in _mtx_lock_flags (atomic.h:158).
153     atomic.h: No such file or directory.
        in atomic.h
(kgdb) list *0xffffffff80194cdc
0xffffffff80194cdc is in xpt_done (/usr/src/sys/cam/cam_xpt.c:4197).
4192                    /*
4193                     * Queue up the request for handling by our SWI handler
4194                     * any of the "non-immediate" type of ccbs.
4195                     */
4196                    sim = done_ccb->ccb_h.path->bus->sim;
4197                    switch (done_ccb->ccb_h.path->periph->type) {
4198                    case CAM_PERIPH_BIO:
4199                            TAILQ_INSERT_TAIL(&sim->sim_doneq, &done_ccb->ccb_h,
4200                                              sim_links.tqe);
4201                            done_ccb->ccb_h.pinfo.index = CAM_DONEQ_INDEX;
(kgdb)

Sorry for the lack of detail; hopefully that line number should be of some use, though.
>How-To-Repeat:
Not entirely known.  See description.
>Fix:


>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201001090243.o092hjlZ000410>