Date: Sat, 30 Sep 2017 19:25:14 +0200 From: Harry Schmalzbauer <freebsd@omnilan.de> To: freebsd-stable@freebsd.org Subject: Re: panic: Solaris(panic): blkptr invalid CHECKSUM1 Message-ID: <59CFD37A.8080009@omnilan.de> In-Reply-To: <59CFC6A6.6030600@omnilan.de> References: <59CFC6A6.6030600@omnilan.de>
next in thread | previous in thread | raw e-mail | index | archive | help
Bezüglich Harry Schmalzbauer's Nachricht vom 30.09.2017 18:30 (localtime): > Bad surprise. > Most likely I forgot to stop a PCIe-Passthrough NIC before shutting down > that (byhve(8)) guest – jhb@ helped my identifying this as the root > cause for sever memory corruptions I regularly had (on stable-11). > > Now this time, corruption affected ZFS's RAM area, obviously. > > What I haven't expected is the panic. > The machine has memory disk as root, so luckily I still can boot (from > ZFS, –> mdpreload rootfs) into single user mode, but early rc stage > (most likely mounting ZFS datasets) leads to the following panic: > > Trying to mount root from ufs:/dev/ufs/cetusROOT []... > panic: Solaris(panic): blkptr at 0xfffffe0005b6b000 has invalid CHECKSUM 1 > cpuid = 1 > KDB: stack backtrace: > #0 0xffffffff805e3837 at kdb_backtrace+0x67 > #1 0xffffffff805a2286 at vpanic+0x186 > #2 0xffffffff805a20f3 at panic+0x43 > #3 0xffffffff81570192 at vcmn_err+0xc2 > #4 0xffffffff812d7dda at zfs_panic_recover+0x5a > #5 0xffffffff812ff49b at zfs_blkptr_verify+0x8b > #6 0xffffffff812ff72c at zio_read+0x2c > #7 0xffffffff812761de at arc_read+0x6de > #8 0xffffffff81298b4d at traverse_prefetch_metadata+0xbd > #9 0xffffffff812980ed at traverse_visitbp+0x39d > #10 0xffffffff81298c27 at traverse_dnode+0xc7 > #11 0xffffffff812984a3 at traverse_visitbp+0x753 > #12 0xffffffff8129788b at traverse_impl+0x22b > #13 0xffffffff81297afc at traverse_pool+0x5c > #14 0xffffffff812cce06 at spa_load+0x1c06 > #15 0xffffffff812cc302 at spa_load+0x1102 > #16 0xffffffff812cac6e at spa_load_best+0x6e > #17 0xffffffff812c73a1 at spa_open_common+0x101 > Uptime: 37s > Dumping 1082 out of 15733 MB:..2%..… > Dump complete > mps0: Sending StopUnit: path (xpt0:mps0:0:2:ffffffff): handle 12 > mps0: Incrementing SSU count > … > > Haven't done any scrub attempts yet – expectation is to get all datasets > of the striped mirror pool back... > > Any hints highly appreciated. Now it seems I'm in really big trouble. Regular import doesn't work (also not if booted from cd9660). I get all pools listed, but trying to import (unmounted) leads to the same panic as initialy reported – because rc is just doning the same. I booted into single user mode (which works since the bootpool isn't affected and root is a memory disk from the bootpool) and set vfs.zfs.recover=1. But this time I don't even get the list of pools to import 'zpool' import instantaniously leads to that panic: Solaris: WARNING: blkptr at 0xfffffe0005a8e000 has invalid CHECKSUM 1 Solaris: WARNING: blkptr at 0xfffffe0005a8e000 has invalid COMPRESS 0 Solaris: WARNING: blkptr at 0xfffffe0005a8e000 DVA 0 has invalid VDEV 2337865727 Solaris: WARNING: blkptr at 0xfffffe0005a8e000 DVA 1 has invalid VDEV 289407040 Solaris: WARNING: blkptr at 0xfffffe0005a8e000 DVA 2 has invalid VDEV 3959586324 Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x50 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff812de904 stack pointer = 0x28:0xfffffe043f6bcbc0 frame pointer = 0x28:0xfffffe043f6bcbc0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 44 (zpool) trap number = 12 panic: page fault cpuid = 0 KDB: stack backtrace: #0 0xffffffff805e3837 at kdb_backtrace+0x67 #1 0xffffffff805a2286 at vpanic+0x186 #2 0xffffffff805a20f3 at panic+0x43 #3 0xffffffff808a4922 at trap_fatal+0x322 #4 0xffffffff808a4979 at trap_pfault+0x49 #5 0xffffffff808a41f8 at trap+0x298 #6 0xffffffff80889fb1 at calltrap+0x8 #7 0xffffffff812e58a3 at vdev_mirror_child_select+0x53 #8 0xffffffff812e535e at vdev_mirror_io_start+0x2ee #9 0xffffffff81303aa1 at zio_vdev_io_start+0x161 #10 0xffffffff8130054c at zio_execute+0xac #11 0xffffffff812ffe7b at zio_nowait+0xcb #12 0xffffffff812761f3 at arc_read+0x6f3 #13 0xffffffff81298b4d at traverse_prefetch_metadata+0xbd #14 0xffffffff812980ed at traverse_visitbp+0x39d #15 0xffffffff81298c27 at traverse_dnode+0xc7 #16 0xffffffff812984a3 at traverse_visitbp+0x753 #17 0xffffffff8129788b at traverse_impl+0x22b Now I hope any ZFS guru can help me out. Needless to mention that the bits on this mirrored pool are important for me – no productive data, but lots of intermediate... Thanks, -harry
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?59CFD37A.8080009>