Date: Sun, 15 Dec 2024 22:01:19 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 267028] kernel panics when booting with both (zfs,ko or vboxnetflt,ko or acpi_wmi.ko) and amdgpu.ko Message-ID: <bug-267028-227-6araGZRz9I@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-267028-227@https.bugs.freebsd.org/bugzilla/> References: <bug-267028-227@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D267028 --- Comment #235 from Mark Millard <marklmi26-fbsd@yahoo.com> --- For the 3 node sequence (last partially-good and then just-junk): $208 =3D {link =3D {tqe_next =3D 0xfffff80004607a00, tqe_prev =3D 0xfffff80= 00465bc80}, container =3D 0xfffff80003868c00, name =3D 0xffffffff82e1e000 <xgpu_fiji_mgcg_cgcg_init+368> "amdgpu_raven_mec_bin_fw",=20 version =3D 1} $209 =3D {link =3D {tqe_next =3D 0xfffff80000000007, tqe_prev =3D 0xfffff80= 00465bbc0}, container =3D 0xfffff80004b29600, name =3D 0xffffffff82e62026 <se_mask+242> "amdgpu_raven_mec2_bin_fw", version =3D 1} $210 =3D {link =3D {tqe_next =3D 0xeef3f000e2c3f0, tqe_prev =3D 0xff54f000e= ef3f0}, container =3D 0x322ff0003287f0, name =3D 0xe987f000fea5f0 <error: Cannot ac= cess memory at address 0xe987f000fea5f0>,=20 version =3D 15660016} it looks like the: $209 =3D {link =3D {tqe_next =3D 0xfffff80000000007, is the earliest example of (evidence of) corruption. The address is outside of (smaller address than) the kernel start: Local exec file: `/usr/home/root/failing-kernel-files/boot/kernel/kernel', file type elf64-x86-64-freebsd. Entry point: 0xffffffff8038e000 0xffffffff802002a8 - 0xffffffff802002b5 is .interp Having 0000000007 also looks odd. However, the rest of that node: tqe_prev =3D 0xfffff8000465bbc0}, container =3D 0xfffff80004b29600, name =3D 0xffffffff82e62026 <se_mask+242> "amdgpu_raven_mec2_bin_fw", version =3D 1} does not look to have any obvious problems with its content. The contents of the container are shown as: $214 =3D {ops =3D 0xfffff80003164000, refs =3D 1, userrefs =3D 0, flags =3D= 1, link =3D {tqe_next =3D 0xfffff8000469ed80, tqe_prev =3D 0xfffff80003868c18}, filenam= e =3D 0xfffff80004b22120 "amdgpu_raven_mec2_bin.ko",=20 pathname =3D 0xfffff80004607a40 "/boot/modules/amdgpu_raven_mec2_bin.ko",= id =3D 20, address =3D 0xffffffff82e61000 <link_enc_regs+1520> "\203\376\001tL\270= \026", size =3D 276456, ctors_addr =3D 0x0,=20 ctors_size =3D 0, dtors_addr =3D 0x0, dtors_size =3D 0, ndeps =3D 3, deps= =3D 0xfffff80004b220e0, common =3D {stqh_first =3D 0x0, stqh_last =3D 0xfffff80004b29680}, modules =3D {tqh_first =3D 0xfffff80004b1ff00,=20 tqh_last =3D 0xfffff80004b1ff10}, loaded =3D {tqe_next =3D 0x0, tqe_pre= v =3D 0x0}, loadcnt =3D 20, nenabled =3D 0, fbt_nentries =3D 0} which also seems to not have obvious problems. The type of vmcore.* does not provide threads, stack content, or backtrace information. Nor is there any indication of any detailed point for when the tqe_next =3D 0xfffff80000000007 became the case. It is not necessarily obvious if the list was longer before the 0xfffff80000000007 became the case. There does not seem to be a way to tell if the corrupted value might be becuase of "raven" specific code vs. more general code. It would be interesting to know if an alternate card type has the problem vs. not. As for the raven context, getting vmcore.* captures that fail at a different stage, such as the failure that mentioned acpi_wmi but did not get a vmcore.* , would help indicate if where the corruption happens in the list moves around (relative to other content). --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-267028-227-6araGZRz9I>