Date: Thu, 29 Oct 2020 09:18:40 +0200 From: Andriy Gapon <avg@FreeBSD.org> To: Christian Kratzer <ck@cksoft.de>, freebsd-fs@freebsd.org Subject: Re: 12.1-RELEASE-p7 panic in zio_free_issue_4_6 Message-ID: <474d086c-5a36-0db5-974f-ccfa0acbd871@FreeBSD.org> In-Reply-To: <a6a55583-f7b8-ee59-e3c7-4d1fcc5b1de8@cksoft.de> References: <a6a55583-f7b8-ee59-e3c7-4d1fcc5b1de8@cksoft.de>
next in thread | previous in thread | raw e-mail | index | archive | help
On 28/10/2020 15:41, Christian Kratzer wrote: > Hi, > > one of my servers with 12.1-RELEASE-p7 started crashing with following > > Fatal trap 12: page fault while in kernel mode > cpuid = 19; apic id = 31 > fault virtual address = 0x30 > fault code = supervisor write data, page not present > instruction pointer = 0x20:0xffffffff826877f4 > stack pointer = 0x28:0xfffffe011cefeaa0 > frame pointer = 0x28:0xfffffe011cefeaa0 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 0 (zio_free_issue_2_3) > trap number = 12 > panic: page fault > cpuid = 19 > time = 1603797129 > KDB: stack backtrace: > #0 0xffffffff80c1d2f7 at kdb_backtrace+0x67 > #1 0xffffffff80bd062d at vpanic+0x19d > #2 0xffffffff80bd0483 at panic+0x43 > #3 0xffffffff810a8dcc at trap_fatal+0x39c > #4 0xffffffff810a8e19 at trap_pfault+0x49 > #5 0xffffffff810a840f at trap+0x29f > #6 0xffffffff81081c9c at calltrap+0x8 > #7 0xffffffff8272a903 at zio_ddt_free+0x53 > #8 0xffffffff82727b7c at zio_execute+0xac > #9 0xffffffff80c2fad4 at taskqueue_run_locked+0x154 > #10 0xffffffff80c30e08 at taskqueue_thread_loop+0x98 > #11 0xffffffff80b90c43 at fork_exit+0x83 > #12 0xffffffff81082cde at fork_trampoline+0xe > Uptime: 1m12s > Automatic reboot in 15 seconds - press a key on the console to abort > > > I traced thigs down to importing one of the zpools. I suspect that you have a silent corruption on that pool (perhaps because of non-ECC RAM?). What you see can happen if a block pointer has a deduplication bit set, but the block is not actually deduplicated or deduplication has never been enabled at all. It would help -- with analysis -- to get a vmcore (kernel crash dump) and to install the corresponding kernel debug symbols (if not already). As to recovery, I think that the best solution is to import the pool read-only and to copy important data elsewhere. Then re-create the pool. -- Andriy
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?474d086c-5a36-0db5-974f-ccfa0acbd871>