Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 19 Oct 2018 13:22:21 -0400
From:      Mark Johnston <markj@freebsd.org>
To:        Sebastian Wojtczak <sebastian.wojtczak@gmail.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: FreeBSD 11.2 kernel crash when dd
Message-ID:  <20181019172221.GA21156@raichu>
In-Reply-To: <CAEfQnDmWE8mq8=XqvPu3zn0S0kOka274T7O7_GXXT=Xg3ObcgA@mail.gmail.com>
References:  <CAEfQnDmWE8mq8=XqvPu3zn0S0kOka274T7O7_GXXT=Xg3ObcgA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Oct 19, 2018 at 01:10:15PM +0200, Sebastian Wojtczak wrote:
> Hi,
> 
> I would like to report a kernel crash while dd on ssd drive.
> 
> Just found that my PC crashed several times during below command:
> dd if=/dev/ada2 of=file_name bs=10m.
> 
> I was trying to make an image from my ssd drive. Once dump file hit size
> 41G or 52G kernel crashes and reboot the system.
> 
> Oct 18 12:30:11 username syslogd: kernel boot file is /boot/kernel/kernel
> Oct 18 12:30:11 username kernel:
> Oct 18 12:30:11 username kernel:
> Oct 18 12:30:11 username kernel: Fatal trap 12: page fault while in kernel
> mode
> Oct 18 12:30:11 username kernel: cpuid = 1; apic id = 01
> Oct 18 12:30:11 username kernel: fault virtual address  = 0x5a
> Oct 18 12:30:11 username kernel: fault code             = supervisor read
> data, page not present
> Oct 18 12:30:11 username kernel: instruction pointer    =
> 0x20:0xffffffff80e67f6d
> Oct 18 12:30:11 username kernel: stack pointer          =
> 0x28:0xfffffe084b408f40
> Oct 18 12:30:11 username kernel: frame pointer          =
> 0x28:0xfffffe084b408f80
> Oct 18 12:30:11 username kernel: code segment           = base 0x0, limit
> 0xfffff, type 0x1b
> Oct 18 12:30:11 username kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
> Oct 18 12:30:11 username kernel: processor eflags       = interrupt
> enabled, resume, IOPL = 0
> Oct 18 12:30:11 username kernel: current process                = 0
> (zio_write_issue_8)
> Oct 18 12:30:11 username kernel: trap number            = 12
> Oct 18 12:30:11 username kernel: panic: page fault
> Oct 18 12:30:11 username kernel: cpuid = 1
> Oct 18 12:30:11 username kernel: KDB: stack backtrace:
> Oct 18 12:30:11 username kernel: #0 0xffffffff80b50087 at kdb_backtrace+0x67
> Oct 18 12:30:11 username kernel: #1 0xffffffff80b099f7 at vpanic+0x177
> Oct 18 12:30:11 username kernel: #2 0xffffffff80b09873 at panic+0x43
> Oct 18 12:30:11 username kernel: #3 0xffffffff80fe105f at trap_fatal+0x35f
> Oct 18 12:30:11 username kernel: #4 0xffffffff80fe10b9 at trap_pfault+0x49
> Oct 18 12:30:11 username kernel: #5 0xffffffff80fe0887 at trap+0x2c7
> Oct 18 12:30:11 username kernel: #6 0xffffffff80fc04cc at calltrap+0x8
> Oct 18 12:30:11 username kernel: #7 0xffffffff80e56df2 at kmem_back+0xf2
> Oct 18 12:30:11 username kernel: #8 0xffffffff80e56cd0 at kmem_malloc+0x60
> Oct 18 12:30:11 username kernel: #9 0xffffffff80e4e752 at
> keg_alloc_slab+0xe2
> Oct 18 12:30:11 username kernel: #10 0xffffffff80e5118e at
> keg_fetch_slab+0x14e
> Oct 18 12:30:11 username kernel: #11 0xffffffff80e509a4 at
> zone_fetch_slab+0x64
> Oct 18 12:30:11 username kernel: #12 0xffffffff80e50a7f at zone_import+0x3f
> Oct 18 12:30:11 username kernel: #13 0xffffffff80e4d199 at
> uma_zalloc_arg+0x3d9
> Oct 18 12:30:11 username kernel: #14 0xffffffff832d2ab2 at
> zio_write_compress+0x1e2
> Oct 18 12:30:11 username kernel: #15 0xffffffff832d174c at zio_execute+0xac
> Oct 18 12:30:11 username kernel: #16 0xffffffff80b617e4 at
> taskqueue_run_locked+0x154
> Oct 18 12:30:11 username kernel: #17 0xffffffff80b62918 at
> taskqueue_thread_loop+0x98
> Oct 18 12:30:11 username kernel: Uptime: 5m50s
> 
> One virtual machine is started with bhyve at startup but even if I shutdown
> it, same crash happen. Disabling vmm does not help but only extend time to
> crash during ssd dump.
> 
> Current zfs setup is zraid on 3 (500GB) hdd drives with compress=on. Drive
> ada0 is not part of zraid and is not attached/mount what ever.
> 
> Any help how to investigate it is appreciated.

The stack suggests a bug in the kmem_* KPI, but I'm having trouble
seeing the problem.  In particular, the fault address suggests that we
crashed while testing (m->flags & PG_ZERO) == 0, but it shouldn't be
possible for m to be NULL there.  My attempts to reproduce this on
12-CURRENT haven't yielded anything yet.  Would you (or anyone else
seeing the problem) be willing to share a kernel dump?  I'd need the
vmcore, the contents of /boot/kernel and /usr/lib/debug/boot/kernel.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20181019172221.GA21156>