From owner-freebsd-stable@freebsd.org Sat Sep 30 17:25:17 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1DD93E2B60F for ; Sat, 30 Sep 2017 17:25:17 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A76FF758D6 for ; Sat, 30 Sep 2017 17:25:16 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mh0.gentlemail.de (ezra.dcm1.omnilan.net [IPv6:2a00:e10:2800::a135]) by mx0.gentlemail.de (8.14.5/8.14.5) with ESMTP id v8UHPEhR053652 for ; Sat, 30 Sep 2017 19:25:14 +0200 (CEST) (envelope-from freebsd@omnilan.de) Received: from titan.inop.mo1.omnilan.net (s1.omnilan.de [217.91.127.234]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id 74A7FC0B; Sat, 30 Sep 2017 19:25:14 +0200 (CEST) Message-ID: <59CFD37A.8080009@omnilan.de> Date: Sat, 30 Sep 2017 19:25:14 +0200 From: Harry Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: Re: panic: Solaris(panic): blkptr invalid CHECKSUM1 References: <59CFC6A6.6030600@omnilan.de> In-Reply-To: <59CFC6A6.6030600@omnilan.de> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]); Sat, 30 Sep 2017 19:25:14 +0200 (CEST) X-Milter: Spamilter (Reciever: mx0.gentlemail.de; Sender-ip: ; Sender-helo: mh0.gentlemail.de; ) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 Sep 2017 17:25:17 -0000 Bezüglich Harry Schmalzbauer's Nachricht vom 30.09.2017 18:30 (localtime): > Bad surprise. > Most likely I forgot to stop a PCIe-Passthrough NIC before shutting down > that (byhve(8)) guest – jhb@ helped my identifying this as the root > cause for sever memory corruptions I regularly had (on stable-11). > > Now this time, corruption affected ZFS's RAM area, obviously. > > What I haven't expected is the panic. > The machine has memory disk as root, so luckily I still can boot (from > ZFS, –> mdpreload rootfs) into single user mode, but early rc stage > (most likely mounting ZFS datasets) leads to the following panic: > > Trying to mount root from ufs:/dev/ufs/cetusROOT []... > panic: Solaris(panic): blkptr at 0xfffffe0005b6b000 has invalid CHECKSUM 1 > cpuid = 1 > KDB: stack backtrace: > #0 0xffffffff805e3837 at kdb_backtrace+0x67 > #1 0xffffffff805a2286 at vpanic+0x186 > #2 0xffffffff805a20f3 at panic+0x43 > #3 0xffffffff81570192 at vcmn_err+0xc2 > #4 0xffffffff812d7dda at zfs_panic_recover+0x5a > #5 0xffffffff812ff49b at zfs_blkptr_verify+0x8b > #6 0xffffffff812ff72c at zio_read+0x2c > #7 0xffffffff812761de at arc_read+0x6de > #8 0xffffffff81298b4d at traverse_prefetch_metadata+0xbd > #9 0xffffffff812980ed at traverse_visitbp+0x39d > #10 0xffffffff81298c27 at traverse_dnode+0xc7 > #11 0xffffffff812984a3 at traverse_visitbp+0x753 > #12 0xffffffff8129788b at traverse_impl+0x22b > #13 0xffffffff81297afc at traverse_pool+0x5c > #14 0xffffffff812cce06 at spa_load+0x1c06 > #15 0xffffffff812cc302 at spa_load+0x1102 > #16 0xffffffff812cac6e at spa_load_best+0x6e > #17 0xffffffff812c73a1 at spa_open_common+0x101 > Uptime: 37s > Dumping 1082 out of 15733 MB:..2%..… > Dump complete > mps0: Sending StopUnit: path (xpt0:mps0:0:2:ffffffff): handle 12 > mps0: Incrementing SSU count > … > > Haven't done any scrub attempts yet – expectation is to get all datasets > of the striped mirror pool back... > > Any hints highly appreciated. Now it seems I'm in really big trouble. Regular import doesn't work (also not if booted from cd9660). I get all pools listed, but trying to import (unmounted) leads to the same panic as initialy reported – because rc is just doning the same. I booted into single user mode (which works since the bootpool isn't affected and root is a memory disk from the bootpool) and set vfs.zfs.recover=1. But this time I don't even get the list of pools to import 'zpool' import instantaniously leads to that panic: Solaris: WARNING: blkptr at 0xfffffe0005a8e000 has invalid CHECKSUM 1 Solaris: WARNING: blkptr at 0xfffffe0005a8e000 has invalid COMPRESS 0 Solaris: WARNING: blkptr at 0xfffffe0005a8e000 DVA 0 has invalid VDEV 2337865727 Solaris: WARNING: blkptr at 0xfffffe0005a8e000 DVA 1 has invalid VDEV 289407040 Solaris: WARNING: blkptr at 0xfffffe0005a8e000 DVA 2 has invalid VDEV 3959586324 Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x50 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff812de904 stack pointer = 0x28:0xfffffe043f6bcbc0 frame pointer = 0x28:0xfffffe043f6bcbc0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 44 (zpool) trap number = 12 panic: page fault cpuid = 0 KDB: stack backtrace: #0 0xffffffff805e3837 at kdb_backtrace+0x67 #1 0xffffffff805a2286 at vpanic+0x186 #2 0xffffffff805a20f3 at panic+0x43 #3 0xffffffff808a4922 at trap_fatal+0x322 #4 0xffffffff808a4979 at trap_pfault+0x49 #5 0xffffffff808a41f8 at trap+0x298 #6 0xffffffff80889fb1 at calltrap+0x8 #7 0xffffffff812e58a3 at vdev_mirror_child_select+0x53 #8 0xffffffff812e535e at vdev_mirror_io_start+0x2ee #9 0xffffffff81303aa1 at zio_vdev_io_start+0x161 #10 0xffffffff8130054c at zio_execute+0xac #11 0xffffffff812ffe7b at zio_nowait+0xcb #12 0xffffffff812761f3 at arc_read+0x6f3 #13 0xffffffff81298b4d at traverse_prefetch_metadata+0xbd #14 0xffffffff812980ed at traverse_visitbp+0x39d #15 0xffffffff81298c27 at traverse_dnode+0xc7 #16 0xffffffff812984a3 at traverse_visitbp+0x753 #17 0xffffffff8129788b at traverse_impl+0x22b Now I hope any ZFS guru can help me out. Needless to mention that the bits on this mirrored pool are important for me – no productive data, but lots of intermediate... Thanks, -harry