From owner-freebsd-bugs@freebsd.org Thu Nov 16 06:04:25 2017 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EA6ABDBE7A8 for ; Thu, 16 Nov 2017 06:04:25 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D95CE6E20B for ; Thu, 16 Nov 2017 06:04:25 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id vAG64PZS010628 for ; Thu, 16 Nov 2017 06:04:25 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 223699] ZFS drive loss during write operation causes kernel panic Date: Thu, 16 Nov 2017 06:04:25 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.1-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: abrahamd@cat.pdx.edu X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Nov 2017 06:04:26 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D223699 Bug ID: 223699 Summary: ZFS drive loss during write operation causes kernel panic Product: Base System Version: 11.1-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: freebsd-bugs@FreeBSD.org Reporter: abrahamd@cat.pdx.edu Environment: OS: FreeBSD 11.1-RELEASE-p4 Board: Supermicro X10DRH-i Manufacturer: Silicon Mechanics RAID Controller: LSI 9341-8i HBA Description: Server has an attached storage zpool consisting of 8 disks. (Separate from = the root pool, which is on a different controller.) The storage pool is configu= red with RAIDZ2 fault tolerance. zpool status before crash is healthy. When doi= ng routine zfs setup testing (pulling a disk to verify pool integrity), while a write operation is in progress to the storage pool, a kernel panic is experienced. This behavior has been observed to be consistently repeatable. How-to-repeat: Boot server with attached storage zpool. Begin a write operation to storage zpool (we use 'yes > file'). Pull a disk from the storage zpool to simulate drive loss. Kernel panic follows. (repeated multiple times in succession in= our testing while diagnosing issue.) Trace follows: flows01# kgdb kernel.debug /var/crash/vmcore.0 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain condition= s. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: mfi0: I/O error, cmd=3D0xfffffe000148d760, status=3D0xc, scsi_status=3D0 mfi0: sense error 0, sense_key 0, asc 0, ascq 0 mfisyspd0: hard error cmd=3Dwrite 927680-927765 Fatal trap 12: page fault while in kernel mode cpuid =3D 11; apic id =3D 0b fault virtual address =3D 0x8 fault code =3D supervisor read data, page not present instruction pointer =3D 0x20:0xffffffff809b9f74 stack pointer =3D 0x28:0xfffffe0f84318930 frame pointer =3D 0x28:0xfffffe0f84318970 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 12 (irq264: mfi0) trap number =3D 12 panic: page fault cpuid =3D 11 KDB: stack backtrace: #0 0xffffffff80aadac7 at kdb_backtrace+0x67 #1 0xffffffff80a6bba6 at vpanic+0x186 #2 0xffffffff80a6ba13 at panic+0x43 #3 0xffffffff80edf832 at trap_fatal+0x322 #4 0xffffffff80edf889 at trap_pfault+0x49 #5 0xffffffff80edf0c6 at trap+0x286 #6 0xffffffff80ec36d1 at calltrap+0x8 #7 0xffffffff80620f2c at mfi_tbolt_complete_cmd+0x13c #8 0xffffffff80620d94 at mfi_intr_tbolt+0x54 #9 0xffffffff80a321ec at intr_event_execute_handlers+0xec #10 0xffffffff80a324d6 at ithread_loop+0xd6 #11 0xffffffff80a2f845 at fork_exit+0x85 #12 0xffffffff80ec3c0e at fork_trampoline+0xe Uptime: 1m1s Dumping 2498 out of 65230 MB:mfi0: cmd_tbolt 0xfffff8000fa0f880 has invalid sync_cmd_idx=3D128 - skipping ..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/zfs.ko.debug...done. done. Loaded symbols for /boot/kernel/zfs.ko Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /usr/lib/debug//boot/kernel/opensolaris.ko.debug...done. done. Loaded symbols for /boot/kernel/opensolaris.ko Reading symbols from /boot/kernel/ums.ko...Reading symbols from /usr/lib/debug//boot/kernel/ums.ko.debug...done. done. Loaded symbols for /boot/kernel/ums.ko #0 0xffffffff80a6b98a in doadump (textdump=3D) at /usr/src/sys/kern/kern_shutdown.c:311 311 dumping--; (kgdb) list *0xffffffff809b9f74 0xffffffff809b9f74 is in g_disk_done (/usr/src/sys/geom/geom_disk.c:252). 247 default: 248 break; 249 } 250 bp2->bio_inbed++; 251 if (bp2->bio_children =3D=3D bp2->bio_inbed) { 252 mtx_unlock(&sc->done_mtx); 253 bp2->bio_resid =3D bp2->bio_bcount - bp2->bio_compl= eted; 254 g_io_deliver(bp2, bp2->bio_error); 255 } else 256 mtx_unlock(&sc->done_mtx); Current language: auto; currently minimal --=20 You are receiving this mail because: You are the assignee for the bug.=