Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 27 Dec 2024 03:34:45 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 267028] kernel panics when booting with both (zfs,ko or vboxnetflt,ko or acpi_wmi.ko) and amdgpu.ko
Message-ID:  <bug-267028-227-tuk7H1CsKT@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-267028-227@https.bugs.freebsd.org/bugzilla/>
References:  <bug-267028-227@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D267028

--- Comment #307 from Mark Millard <marklmi26-fbsd@yahoo.com> ---
(In reply to Mark Millard from comment #306)

Going backwards through part of the list node allocations (before the
node is filled in but showing the contains and modname addresses that
are to be assgined in each case. . .

(kgdb)  print modlist_newmod_hist[modlist_newmod_hist_pos]
$7 =3D {modAddr =3D 0xfffff8000471eac0, containerAddr =3D 0xfffff800038caa8=
0,
modnameAddr =3D 0xffffffff82ea6025 "amdgpu_raven_vcn_bin_fw", version =3D 1}
(kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos-1]
$8 =3D {modAddr =3D 0xfffff8000471e900, containerAddr =3D 0xfffff800038cac0=
0,
modnameAddr =3D 0xffffffff82e62026 "amdgpu_raven_mec2_bin_fw", version =3D =
1}
(kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos-2]
$9 =3D {modAddr =3D 0xfffff800046581c0, containerAddr =3D 0xfffff8000464a60=
0,
modnameAddr =3D 0xffffffff82e1e010 "amdgpu_raven_mec_bin_fw", version =3D 1}
(kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos-3]
$10 =3D {modAddr =3D 0xfffff80004574040, containerAddr =3D 0xfffff800038c90=
00,
modnameAddr =3D 0xffffffff82e12009 "amdgpu_raven_rlc_bin_fw", version =3D 1}
(kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos-4]
$11 =3D {modAddr =3D 0xfffff80004574100, containerAddr =3D 0xfffff800038c93=
00,
modnameAddr =3D 0xffffffff829f6010 "amdgpu_raven_ce_bin_fw", version =3D 1}
(kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos-5]
$12 =3D {modAddr =3D 0xfffff800036f00c0, containerAddr =3D 0xfffff80004ad6c=
00,
modnameAddr =3D 0xffffffff829ef000 "amdgpu_raven_me_bin_fw", version =3D 1}
(kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos-6]
$13 =3D {modAddr =3D 0xfffff8000471e980, containerAddr =3D 0xfffff800038c94=
80,
modnameAddr =3D 0xffffffff829e7025 "amdgpu_raven_pfp_bin_fw", version =3D 1}

Going backwards through that part of list later, after the failure:

(kgdb) print *(modlist_t)0xfffff8000471eac0
$24 =3D {link =3D {tqe_next =3D 0x0, tqe_prev =3D 0xfffff8000471e900}, cont=
ainer =3D
0xfffff800038caa80, name =3D 0xffffffff82ea6025 "amdgpu_raven_vcn_bin_fw",
version =3D 1}
(kgdb) print *(modlist_t)0xfffff8000471e900
$25 =3D {link =3D {tqe_next =3D 0xfffff8000471eac0, tqe_prev =3D 0xfffff800=
046581c0},
container =3D 0xfffff800038cac00, name =3D 0xffffffff82e62026
"amdgpu_raven_mec2_bin_fw", version =3D 1}
. . .
(kgdb) print *(modlist_t)0xfffff800046581c0
$27 =3D {link =3D {tqe_next =3D 0xfffff8000471e900, tqe_prev =3D 0xfffff800=
04574040},
container =3D 0xfffff8000464a600, name =3D 0xffffffff82e1e010
"amdgpu_raven_mec_bin_fw", version =3D 1}
(kgdb) print *(modlist_t)0xfffff80004574040
$28 =3D {link =3D {tqe_next =3D 0xfffff800046581c0, tqe_prev =3D 0xfffff800=
04574100},
container =3D 0xfffff800038c9000, name =3D 0xffffffff82e12009
"amdgpu_raven_rlc_bin_fw", version =3D 1}
(kgdb) print *(modlist_t)0xfffff80004574100
$29 =3D {link =3D {tqe_next =3D 0xfffff80004574040, tqe_prev =3D 0xfffff800=
036f00c0},
container =3D 0xfffff800038c9300, name =3D 0xffffffff829f6010
"amdgpu_raven_ce_bin_fw", version =3D 1}
(kgdb) print *(modlist_t)0xfffff800036f00c0
$30 =3D {link =3D {tqe_next =3D 0xfffff80000000007, tqe_prev =3D 0xfffff800=
0471e980},
container =3D 0xfffff80004ad6c00, name =3D 0xffffffff829ef000
"amdgpu_raven_me_bin_fw", version =3D 1}

NOTE THE BAD tqe_next=3D=3D 0xfffff80000000007 ABOVE.

(kgdb) print *(modlist_t)0xfffff8000471e980
$31 =3D {link =3D {tqe_next =3D 0xfffff800036f00c0, tqe_prev =3D 0xfffff800=
036f0100},
container =3D 0xfffff800038c9480, name =3D 0xffffffff829e7025
"amdgpu_raven_pfp_bin_fw", version =3D 1}

So: all the nodes are there but just one ends up with the odd
tqe_next=3D=3D 0xfffff80000000007 corruption.

There was no allocation that returned 0xfffff80000000007 (not recorded
and I'd set up for such a value to panix just after the allocation).

Something replaced the intended:
*(modlist_t)0xfffff800036f00c0.link.tqe_next =3D=3D 0xfffff80004574100
with:
*(modlist_t)0xfffff800036f00c0.link.tqe_next =3D=3D 0xfffff80000000007

The scans of the list were okay as of setting up each of
(listed in execution order, not backwards list order):

"amdgpu_raven_ce_bin_fw"
"amdgpu_raven_rlc_bin_fw"
"amdgpu_raven_mec_bin_fw"
"amdgpu_raven_mec2_bin_fw"
"amdgpu_raven_vcn_bin_fw"

But as of (the first afater "amdgpu_raven_vcn_bin_fw"):
"acpi_wmi"

The list had the corrupted link.tqe_next associated with
"amdgpu_raven_me_bin_fw".

This suggests at/after the generation of:

drmn0: successfully loaded firmware image 'amdgpu/raven_vcn.bin'

during the generation of the sequence:

<6>[drm] Found VCN firmware Version ENC: 1.13 DEC: 2 VEP: 0 Revision: 4
drmn0: Will use PSP to load VCN firmware
<6>[drm] reserve 0x400000 from 0xf47fc00000 for PSP TMR
drmn0: RAS: optional ras ta ucode is not available
drmn0: RAP: optional rap ta ucode is not available
<6>[drm] kiq ring mec 2 pipe 1 q 0
<6>[drm] DM_PPLIB: values for F clock
<6>[drm] DM_PPLIB:       400000 in kHz, 3649 in mV
<6>[drm] DM_PPLIB:       933000 in kHz, 4074 in mV
<6>[drm] DM_PPLIB:       1200000 in kHz, 4399 in mV
<6>[drm] DM_PPLIB:       1333000 in kHz, 4399 in mV
<6>[drm] DM_PPLIB: values for DCF clock
<6>[drm] DM_PPLIB:       300000 in kHz, 3649 in mV
<6>[drm] DM_PPLIB:       600000 in kHz, 4074 in mV
<6>[drm] DM_PPLIB:       626000 in kHz, 4250 in mV
<6>[drm] DM_PPLIB:       654000 in kHz, 4399 in mV
<6>[drm] Display Core initialized with v3.2.104!
lkpi_iic0: <LinuxKPI I2C> on drmn0
iicbus0: <Philips I2C bus> on lkpi_iic0
iic0: <I2C generic I/O> on iicbus0
lkpi_iic1: <LinuxKPI I2C> on drmn0
iicbus1: <Philips I2C bus> on lkpi_iic1
iic1: <I2C generic I/O> on iicbus1
<6>[drm] VCN decode and encode initialized successfully(under SPG Mode).
drmn0: SE 1, SH per SE 1, CU per SH 11, active_cu_number 8
<6>[drm] fb mappable at 0x60BCA000
<6>[drm] vram apper at 0x60000000
<6>[drm] size 8294400
<6>[drm] fb depth is 24
<6>[drm]    pitch is 7680
VT: Replacing driver "vga" with new "fb".
start FB_INFO:
type=3D11 height=3D1080 width=3D1920 depth=3D32
pbase=3D0x60bca000 vbase=3D0xfffff80060bca000
name=3Ddrmn0 flags=3D0x0 stride=3D7680 bpp=3D32
end FB_INFO
drmn0: ring gfx uses VM inv eng 0 on hub 0
drmn0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
drmn0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
drmn0: ring comp_1.2.0 uses VM inv eng 5 on hub 0
drmn0: ring comp_1.3.0 uses VM inv eng 6 on hub 0
drmn0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
drmn0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
drmn0: ring comp_1.2.1 uses VM inv eng 9 on hub 0
drmn0: ring comp_1.3.1 uses VM inv eng 10 on hub 0
drmn0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
drmn0: ring sdma0 uses VM inv eng 0 on hub 1
drmn0: ring vcn_dec uses VM inv eng 1 on hub 1
drmn0: ring vcn_enc0 uses VM inv eng 4 on hub 1
drmn0: ring vcn_enc1 uses VM inv eng 5 on hub 1
drmn0: ring jpeg_dec uses VM inv eng 6 on hub 1
vgapci0: child drmn0 requested pci_get_powerstate
sysctl_warn_reuse: can't re-use a leaf (hw.dri.debug)!
<6>[drm] Initialized amdgpu 3.40.0 20150101 for drmn0 on minor 0

Or the very early stages of setting up: acpi_wmi.ko

The mismatch was detected during the first modlist_lookup for
the found_modules list for the setup of acpi_wmi.ko.

The "during" text seems to happen during activity from
the likes of:

/wrkdirs/usr/ports/graphics/drm-510-kmod/work/drm-kmod-drm_v5.10.163_7/driv=
ers/gpu/drm/amd/amdgpu/amdgpu_vcn.c

(given the raven firmware is in use as well?).

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-267028-227-tuk7H1CsKT>