Date: Thu, 16 Nov 2017 06:04:25 +0000 From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 223699] ZFS drive loss during write operation causes kernel panic Message-ID: <bug-223699-8@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=223699 Bug ID: 223699 Summary: ZFS drive loss during write operation causes kernel panic Product: Base System Version: 11.1-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: freebsd-bugs@FreeBSD.org Reporter: abrahamd@cat.pdx.edu Environment: OS: FreeBSD 11.1-RELEASE-p4 Board: Supermicro X10DRH-i Manufacturer: Silicon Mechanics RAID Controller: LSI 9341-8i HBA Description: Server has an attached storage zpool consisting of 8 disks. (Separate from the root pool, which is on a different controller.) The storage pool is configured with RAIDZ2 fault tolerance. zpool status before crash is healthy. When doing routine zfs setup testing (pulling a disk to verify pool integrity), while a write operation is in progress to the storage pool, a kernel panic is experienced. This behavior has been observed to be consistently repeatable. How-to-repeat: Boot server with attached storage zpool. Begin a write operation to storage zpool (we use 'yes > file'). Pull a disk from the storage zpool to simulate drive loss. Kernel panic follows. (repeated multiple times in succession in our testing while diagnosing issue.) Trace follows: flows01# kgdb kernel.debug /var/crash/vmcore.0 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: mfi0: I/O error, cmd=0xfffffe000148d760, status=0xc, scsi_status=0 mfi0: sense error 0, sense_key 0, asc 0, ascq 0 mfisyspd0: hard error cmd=write 927680-927765 Fatal trap 12: page fault while in kernel mode cpuid = 11; apic id = 0b fault virtual address = 0x8 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff809b9f74 stack pointer = 0x28:0xfffffe0f84318930 frame pointer = 0x28:0xfffffe0f84318970 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq264: mfi0) trap number = 12 panic: page fault cpuid = 11 KDB: stack backtrace: #0 0xffffffff80aadac7 at kdb_backtrace+0x67 #1 0xffffffff80a6bba6 at vpanic+0x186 #2 0xffffffff80a6ba13 at panic+0x43 #3 0xffffffff80edf832 at trap_fatal+0x322 #4 0xffffffff80edf889 at trap_pfault+0x49 #5 0xffffffff80edf0c6 at trap+0x286 #6 0xffffffff80ec36d1 at calltrap+0x8 #7 0xffffffff80620f2c at mfi_tbolt_complete_cmd+0x13c #8 0xffffffff80620d94 at mfi_intr_tbolt+0x54 #9 0xffffffff80a321ec at intr_event_execute_handlers+0xec #10 0xffffffff80a324d6 at ithread_loop+0xd6 #11 0xffffffff80a2f845 at fork_exit+0x85 #12 0xffffffff80ec3c0e at fork_trampoline+0xe Uptime: 1m1s Dumping 2498 out of 65230 MB:mfi0: cmd_tbolt 0xfffff8000fa0f880 has invalid sync_cmd_idx=128 - skipping ..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/zfs.ko.debug...done. done. Loaded symbols for /boot/kernel/zfs.ko Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /usr/lib/debug//boot/kernel/opensolaris.ko.debug...done. done. Loaded symbols for /boot/kernel/opensolaris.ko Reading symbols from /boot/kernel/ums.ko...Reading symbols from /usr/lib/debug//boot/kernel/ums.ko.debug...done. done. Loaded symbols for /boot/kernel/ums.ko #0 0xffffffff80a6b98a in doadump (textdump=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:311 311 dumping--; (kgdb) list *0xffffffff809b9f74 0xffffffff809b9f74 is in g_disk_done (/usr/src/sys/geom/geom_disk.c:252). 247 default: 248 break; 249 } 250 bp2->bio_inbed++; 251 if (bp2->bio_children == bp2->bio_inbed) { 252 mtx_unlock(&sc->done_mtx); 253 bp2->bio_resid = bp2->bio_bcount - bp2->bio_completed; 254 g_io_deliver(bp2, bp2->bio_error); 255 } else 256 mtx_unlock(&sc->done_mtx); Current language: auto; currently minimal -- You are receiving this mail because: You are the assignee for the bug.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-223699-8>
