FreeBSD Mail Archives

Date:      Wed, 26 Apr 2023 23:24:29 +0000
From:      bugzilla-noreply@freebsd.org
To:        fs@FreeBSD.org
Subject:   [Bug 267028] kernel panics when booting with both (zfs,ko or vboxnetflt,ko or acpi_wmi.ko) and amdgpu.ko
Message-ID:  <bug-267028-3630-O90aROuen1@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-267028-3630@https.bugs.freebsd.org/bugzilla/>
References:  <bug-267028-3630@https.bugs.freebsd.org/bugzilla/>

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D267028

--- Comment #172 from Mark Millard <marklmi26-fbsd@yahoo.com> ---
One of the things that makes this hard to analyze is
that the first failure quickly leads to other failures
and most of the evidence for for the later failure.
For example, in the following note that original
trap number is 12 but the backtrace is for/after
a later trap, of type-number 22 instead. There
is very little information directly about the
original trap type-number 12:

Fatal trap 12: page fault while in kernel mode
cpuid =3D 0; apic id =3D 00
fault virtual address   =3D 0x0
fault code              =3D supervisor read data, page not present
instruction pointer     =3D 0x20:0xffffffff80bf3727
stack pointer           =3D 0x28:0xfffffe000e1a7ba0
frame pointer           =3D 0x28:0xfffffe000e1a7bd0
code segment            =3D base 0x0, limit 0xfffff, type 0x1b
                        =3D DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        =3D interrupt enabled, resume, IOPL =3D 0
current process         =3D 1 (init)
trap number             =3D 12
WARNING !drm_modeset_is_locked(&crtc->mutex) failed at
/usr/ports/graphics/drm-510-kmod/work/drm-kmod-drm_v5.10.163_4/drivers/gpu/=
drm/drm_atomic_helper.c:619
. . .
WARNING !drm_modeset_is_locked(&plane->mutex) failed at
/usr/ports/graphics/drm-510-kmod/work/drm-kmod-drm_v5.10.163_4/drivers/gpu/=
drm/drm_atomic_helper.c:894
kernel trap 22 with interrupts disabled
                            kernel trap 22 with interrupts disabled
 panic: page fault
cpuid =3D 0
time =3D 1682435560
KDB: stack backtrace:
#0 0xffffffff80c66ee5 at kdb_backtrace+0x65
#1 0xffffffff80c1bbef at vpanic+0x17f
#2 0xffffffff80c1ba63 at panic+0x43
#3 0xffffffff810addf5 at trap_fatal+0x385
#4 0xffffffff810ade4f at trap_pfault+0x4f
#5 0xffffffff81084fd8 at calltrap+0x8
#6 0xffffffff8261d251 at spl_nvlist_free+0x61
#7 0xffffffff826dd740 at fm_nvlist_destroy+0x20
#8 0xffffffff827b6e95 at zfs_zevent_post_cb+0x15
#9 0xffffffff826dcd02 at zfs_zevent_drain+0x62
#10 0xffffffff826dcbf8 at zfs_zevent_drain_all+0x58
#11 0xffffffff826dede9 at fm_fini+0x19
#12 0xffffffff82713b94 at spa_fini+0x54
#13 0xffffffff827be303 at zfs_kmod_fini+0x33
#14 0xffffffff8262fb3b at zfs_shutdown+0x2b
#15 0xffffffff80c1b76c at kern_reboot+0x3dc
#16 0xffffffff80c1b381 at sys_reboot+0x411
#17 0xffffffff810ae6ec at amd64_syscall+0x10c
. . .

The primary hint about what code execution context lead
to the original instance of trap type 12 above is
basically:

instruction pointer     =3D 0x20:0xffffffff80bf3727

amdgpu does not leave in place a clean context for
debugging kernel crashes. Trying to keep the video
context operational for a kernel that has crashed,
while not messing up the analysis context for the
original problem is problematical.

My guess would be that normal analysis of such tries
to have the problem occur in a virtual machine sort
of context where another (outer) context is available
that is independent and can look at the details from
outside the failing context. But even that would
require the failing context in the VM to stop before
amdgpu or the like messed up the evidence in the VM.
(Not that I've ever done that type of evidence
gathering.)

--=20
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.=

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-267028-3630-O90aROuen1>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation