Date: Fri, 27 Dec 2024 03:34:45 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 267028] kernel panics when booting with both (zfs,ko or vboxnetflt,ko or acpi_wmi.ko) and amdgpu.ko Message-ID: <bug-267028-227-tuk7H1CsKT@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-267028-227@https.bugs.freebsd.org/bugzilla/> References: <bug-267028-227@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D267028 --- Comment #307 from Mark Millard <marklmi26-fbsd@yahoo.com> --- (In reply to Mark Millard from comment #306) Going backwards through part of the list node allocations (before the node is filled in but showing the contains and modname addresses that are to be assgined in each case. . . (kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos] $7 =3D {modAddr =3D 0xfffff8000471eac0, containerAddr =3D 0xfffff800038caa8= 0, modnameAddr =3D 0xffffffff82ea6025 "amdgpu_raven_vcn_bin_fw", version =3D 1} (kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos-1] $8 =3D {modAddr =3D 0xfffff8000471e900, containerAddr =3D 0xfffff800038cac0= 0, modnameAddr =3D 0xffffffff82e62026 "amdgpu_raven_mec2_bin_fw", version =3D = 1} (kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos-2] $9 =3D {modAddr =3D 0xfffff800046581c0, containerAddr =3D 0xfffff8000464a60= 0, modnameAddr =3D 0xffffffff82e1e010 "amdgpu_raven_mec_bin_fw", version =3D 1} (kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos-3] $10 =3D {modAddr =3D 0xfffff80004574040, containerAddr =3D 0xfffff800038c90= 00, modnameAddr =3D 0xffffffff82e12009 "amdgpu_raven_rlc_bin_fw", version =3D 1} (kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos-4] $11 =3D {modAddr =3D 0xfffff80004574100, containerAddr =3D 0xfffff800038c93= 00, modnameAddr =3D 0xffffffff829f6010 "amdgpu_raven_ce_bin_fw", version =3D 1} (kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos-5] $12 =3D {modAddr =3D 0xfffff800036f00c0, containerAddr =3D 0xfffff80004ad6c= 00, modnameAddr =3D 0xffffffff829ef000 "amdgpu_raven_me_bin_fw", version =3D 1} (kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos-6] $13 =3D {modAddr =3D 0xfffff8000471e980, containerAddr =3D 0xfffff800038c94= 80, modnameAddr =3D 0xffffffff829e7025 "amdgpu_raven_pfp_bin_fw", version =3D 1} Going backwards through that part of list later, after the failure: (kgdb) print *(modlist_t)0xfffff8000471eac0 $24 =3D {link =3D {tqe_next =3D 0x0, tqe_prev =3D 0xfffff8000471e900}, cont= ainer =3D 0xfffff800038caa80, name =3D 0xffffffff82ea6025 "amdgpu_raven_vcn_bin_fw", version =3D 1} (kgdb) print *(modlist_t)0xfffff8000471e900 $25 =3D {link =3D {tqe_next =3D 0xfffff8000471eac0, tqe_prev =3D 0xfffff800= 046581c0}, container =3D 0xfffff800038cac00, name =3D 0xffffffff82e62026 "amdgpu_raven_mec2_bin_fw", version =3D 1} . . . (kgdb) print *(modlist_t)0xfffff800046581c0 $27 =3D {link =3D {tqe_next =3D 0xfffff8000471e900, tqe_prev =3D 0xfffff800= 04574040}, container =3D 0xfffff8000464a600, name =3D 0xffffffff82e1e010 "amdgpu_raven_mec_bin_fw", version =3D 1} (kgdb) print *(modlist_t)0xfffff80004574040 $28 =3D {link =3D {tqe_next =3D 0xfffff800046581c0, tqe_prev =3D 0xfffff800= 04574100}, container =3D 0xfffff800038c9000, name =3D 0xffffffff82e12009 "amdgpu_raven_rlc_bin_fw", version =3D 1} (kgdb) print *(modlist_t)0xfffff80004574100 $29 =3D {link =3D {tqe_next =3D 0xfffff80004574040, tqe_prev =3D 0xfffff800= 036f00c0}, container =3D 0xfffff800038c9300, name =3D 0xffffffff829f6010 "amdgpu_raven_ce_bin_fw", version =3D 1} (kgdb) print *(modlist_t)0xfffff800036f00c0 $30 =3D {link =3D {tqe_next =3D 0xfffff80000000007, tqe_prev =3D 0xfffff800= 0471e980}, container =3D 0xfffff80004ad6c00, name =3D 0xffffffff829ef000 "amdgpu_raven_me_bin_fw", version =3D 1} NOTE THE BAD tqe_next=3D=3D 0xfffff80000000007 ABOVE. (kgdb) print *(modlist_t)0xfffff8000471e980 $31 =3D {link =3D {tqe_next =3D 0xfffff800036f00c0, tqe_prev =3D 0xfffff800= 036f0100}, container =3D 0xfffff800038c9480, name =3D 0xffffffff829e7025 "amdgpu_raven_pfp_bin_fw", version =3D 1} So: all the nodes are there but just one ends up with the odd tqe_next=3D=3D 0xfffff80000000007 corruption. There was no allocation that returned 0xfffff80000000007 (not recorded and I'd set up for such a value to panix just after the allocation). Something replaced the intended: *(modlist_t)0xfffff800036f00c0.link.tqe_next =3D=3D 0xfffff80004574100 with: *(modlist_t)0xfffff800036f00c0.link.tqe_next =3D=3D 0xfffff80000000007 The scans of the list were okay as of setting up each of (listed in execution order, not backwards list order): "amdgpu_raven_ce_bin_fw" "amdgpu_raven_rlc_bin_fw" "amdgpu_raven_mec_bin_fw" "amdgpu_raven_mec2_bin_fw" "amdgpu_raven_vcn_bin_fw" But as of (the first afater "amdgpu_raven_vcn_bin_fw"): "acpi_wmi" The list had the corrupted link.tqe_next associated with "amdgpu_raven_me_bin_fw". This suggests at/after the generation of: drmn0: successfully loaded firmware image 'amdgpu/raven_vcn.bin' during the generation of the sequence: <6>[drm] Found VCN firmware Version ENC: 1.13 DEC: 2 VEP: 0 Revision: 4 drmn0: Will use PSP to load VCN firmware <6>[drm] reserve 0x400000 from 0xf47fc00000 for PSP TMR drmn0: RAS: optional ras ta ucode is not available drmn0: RAP: optional rap ta ucode is not available <6>[drm] kiq ring mec 2 pipe 1 q 0 <6>[drm] DM_PPLIB: values for F clock <6>[drm] DM_PPLIB: 400000 in kHz, 3649 in mV <6>[drm] DM_PPLIB: 933000 in kHz, 4074 in mV <6>[drm] DM_PPLIB: 1200000 in kHz, 4399 in mV <6>[drm] DM_PPLIB: 1333000 in kHz, 4399 in mV <6>[drm] DM_PPLIB: values for DCF clock <6>[drm] DM_PPLIB: 300000 in kHz, 3649 in mV <6>[drm] DM_PPLIB: 600000 in kHz, 4074 in mV <6>[drm] DM_PPLIB: 626000 in kHz, 4250 in mV <6>[drm] DM_PPLIB: 654000 in kHz, 4399 in mV <6>[drm] Display Core initialized with v3.2.104! lkpi_iic0: <LinuxKPI I2C> on drmn0 iicbus0: <Philips I2C bus> on lkpi_iic0 iic0: <I2C generic I/O> on iicbus0 lkpi_iic1: <LinuxKPI I2C> on drmn0 iicbus1: <Philips I2C bus> on lkpi_iic1 iic1: <I2C generic I/O> on iicbus1 <6>[drm] VCN decode and encode initialized successfully(under SPG Mode). drmn0: SE 1, SH per SE 1, CU per SH 11, active_cu_number 8 <6>[drm] fb mappable at 0x60BCA000 <6>[drm] vram apper at 0x60000000 <6>[drm] size 8294400 <6>[drm] fb depth is 24 <6>[drm] pitch is 7680 VT: Replacing driver "vga" with new "fb". start FB_INFO: type=3D11 height=3D1080 width=3D1920 depth=3D32 pbase=3D0x60bca000 vbase=3D0xfffff80060bca000 name=3Ddrmn0 flags=3D0x0 stride=3D7680 bpp=3D32 end FB_INFO drmn0: ring gfx uses VM inv eng 0 on hub 0 drmn0: ring comp_1.0.0 uses VM inv eng 1 on hub 0 drmn0: ring comp_1.1.0 uses VM inv eng 4 on hub 0 drmn0: ring comp_1.2.0 uses VM inv eng 5 on hub 0 drmn0: ring comp_1.3.0 uses VM inv eng 6 on hub 0 drmn0: ring comp_1.0.1 uses VM inv eng 7 on hub 0 drmn0: ring comp_1.1.1 uses VM inv eng 8 on hub 0 drmn0: ring comp_1.2.1 uses VM inv eng 9 on hub 0 drmn0: ring comp_1.3.1 uses VM inv eng 10 on hub 0 drmn0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0 drmn0: ring sdma0 uses VM inv eng 0 on hub 1 drmn0: ring vcn_dec uses VM inv eng 1 on hub 1 drmn0: ring vcn_enc0 uses VM inv eng 4 on hub 1 drmn0: ring vcn_enc1 uses VM inv eng 5 on hub 1 drmn0: ring jpeg_dec uses VM inv eng 6 on hub 1 vgapci0: child drmn0 requested pci_get_powerstate sysctl_warn_reuse: can't re-use a leaf (hw.dri.debug)! <6>[drm] Initialized amdgpu 3.40.0 20150101 for drmn0 on minor 0 Or the very early stages of setting up: acpi_wmi.ko The mismatch was detected during the first modlist_lookup for the found_modules list for the setup of acpi_wmi.ko. The "during" text seems to happen during activity from the likes of: /wrkdirs/usr/ports/graphics/drm-510-kmod/work/drm-kmod-drm_v5.10.163_7/driv= ers/gpu/drm/amd/amdgpu/amdgpu_vcn.c (given the raven firmware is in use as well?). --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-267028-227-tuk7H1CsKT>