Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 29 Oct 2020 09:18:40 +0200
From:      Andriy Gapon <avg@FreeBSD.org>
To:        Christian Kratzer <ck@cksoft.de>, freebsd-fs@freebsd.org
Subject:   Re: 12.1-RELEASE-p7 panic in zio_free_issue_4_6
Message-ID:  <474d086c-5a36-0db5-974f-ccfa0acbd871@FreeBSD.org>
In-Reply-To: <a6a55583-f7b8-ee59-e3c7-4d1fcc5b1de8@cksoft.de>
References:  <a6a55583-f7b8-ee59-e3c7-4d1fcc5b1de8@cksoft.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On 28/10/2020 15:41, Christian Kratzer wrote:
> Hi,
> 
> one of my servers with 12.1-RELEASE-p7 started crashing with following
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 19; apic id = 31
> fault virtual address   = 0x30
> fault code              = supervisor write data, page not present
> instruction pointer     = 0x20:0xffffffff826877f4
> stack pointer           = 0x28:0xfffffe011cefeaa0
> frame pointer           = 0x28:0xfffffe011cefeaa0
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 0 (zio_free_issue_2_3)
> trap number             = 12
> panic: page fault
> cpuid = 19
> time = 1603797129
> KDB: stack backtrace:
> #0 0xffffffff80c1d2f7 at kdb_backtrace+0x67
> #1 0xffffffff80bd062d at vpanic+0x19d
> #2 0xffffffff80bd0483 at panic+0x43
> #3 0xffffffff810a8dcc at trap_fatal+0x39c
> #4 0xffffffff810a8e19 at trap_pfault+0x49
> #5 0xffffffff810a840f at trap+0x29f
> #6 0xffffffff81081c9c at calltrap+0x8
> #7 0xffffffff8272a903 at zio_ddt_free+0x53
> #8 0xffffffff82727b7c at zio_execute+0xac
> #9 0xffffffff80c2fad4 at taskqueue_run_locked+0x154
> #10 0xffffffff80c30e08 at taskqueue_thread_loop+0x98
> #11 0xffffffff80b90c43 at fork_exit+0x83
> #12 0xffffffff81082cde at fork_trampoline+0xe
> Uptime: 1m12s
> Automatic reboot in 15 seconds - press a key on the console to abort
> 
> 
> I traced thigs down to importing one of the zpools.

I suspect that you have a silent corruption on that pool (perhaps because of
non-ECC RAM?).
What you see can happen if a block pointer has a deduplication bit set, but the
block is not actually deduplicated or deduplication has never been enabled at all.

It would help -- with analysis -- to get a vmcore (kernel crash dump) and to
install the corresponding kernel debug symbols (if not already).

As to recovery, I think that the best solution is to import the pool read-only
and to copy important data elsewhere.  Then re-create the pool.

-- 
Andriy



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?474d086c-5a36-0db5-974f-ccfa0acbd871>