Date: Sat, 9 Jan 2010 02:43:45 GMT From: Joshua Wise <jwise@andrew.cmu.edu> To: freebsd-gnats-submit@FreeBSD.org Subject: kern/142510: FreeBSD 8.0-RELEASE panic'ed after removing SATA (ahci) drive Message-ID: <201001090243.o092hjlZ000410@www.freebsd.org> Resent-Message-ID: <201001090250.o092o1Fa045279@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 142510 >Category: kern >Synopsis: FreeBSD 8.0-RELEASE panic'ed after removing SATA (ahci) drive >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sat Jan 09 02:50:01 UTC 2010 >Closed-Date: >Last-Modified: >Originator: Joshua Wise >Release: 8.0-RELEASE >Organization: self >Environment: [root@moroso ~]# uname -a FreeBSD moroso.emarhavil.com 8.0-RELEASE FreeBSD 8.0-RELEASE #0: Sat Nov 21 15:02:08 UTC 2009 root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 >Description: After I pulled a drive, the system subsequently panic()'ed. It was in use by ZFS at the time, but not all cases of pulling a drive while it's in use by ZFS make the machine panic. Unfortunately, I don't have a dumpfile, but I do have a line number from at least one of the two traps that showed up that may indicate how the kernel came to the inconsistent state. Here's a dump from the console (typed by hand): [root@moroso /dev]# camcontrol eject ada2 Error received from stop unit command [root@moroso /dev]# camcontrol rescan a(ada2:ahcich2:0:0:0) lost device (ada2:ahcich2:0:0:0): Invalidating pack (ada2:ahchch2:0:0:0): Synchronize cache failed (ada2:ahcich2:0:0:0): removing device entry Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x48 fault code = supervisor write data, page not present instruction pointer = 0x20:0xffffffff80571105 stack pointer = 0x28:0xffffff8000077b60 stack pointer = 0x28:0xffffff8000077b70 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 2 (g_event) trap number = 12 panic: page fault cpuid = 1 Uptime: 34m43s (ada0:ahcich0:0:0:0): Synchronize cache failed Dump failed. Partition too small. Automatic reboot in 15 seconds - press a key on the console to abort Fatal trap 9: general protection fault while in kernel mode cpuid = 3; apic id = 03 instruction pointer = 0x20:0xffffffff80194cdc stack pointer = 0x28:0xffffff80e7d6aa60 frame pointer = 0x28:0xffffff80e7d6aa90 code segment = base 0x0, limit 0xfffff, type 0x1b = dpl 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq258: ahci0) trap number = 9 --> Press a key on the console to reboot, --> or switch off the system now. And, here's a kgdb session looking up the appropriate symbols: [joshua@moroso /boot/kernel]$ kgdb kernel.symbols GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... (kgdb) list *0x20:0xffffffff80571105 A syntax error in expression, near `:0xffffffff80571105'. (kgdb) list *0xffffffff80571105 0xffffffff80571105 is in _mtx_lock_flags (atomic.h:158). 153 atomic.h: No such file or directory. in atomic.h (kgdb) list *0xffffffff80194cdc 0xffffffff80194cdc is in xpt_done (/usr/src/sys/cam/cam_xpt.c:4197). 4192 /* 4193 * Queue up the request for handling by our SWI handler 4194 * any of the "non-immediate" type of ccbs. 4195 */ 4196 sim = done_ccb->ccb_h.path->bus->sim; 4197 switch (done_ccb->ccb_h.path->periph->type) { 4198 case CAM_PERIPH_BIO: 4199 TAILQ_INSERT_TAIL(&sim->sim_doneq, &done_ccb->ccb_h, 4200 sim_links.tqe); 4201 done_ccb->ccb_h.pinfo.index = CAM_DONEQ_INDEX; (kgdb) Sorry for the lack of detail; hopefully that line number should be of some use, though. >How-To-Repeat: Not entirely known. See description. >Fix: >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201001090243.o092hjlZ000410>