From nobody Wed Apr 26 23:24:29 2023 X-Original-To: fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Q6FM20JS1z47hNy for ; Wed, 26 Apr 2023 23:24:30 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Q6FM16M5Rz3Q4r for ; Wed, 26 Apr 2023 23:24:29 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1682551469; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rAvNpoJVKt+ZTajxWO1UTRL+jGfizlNxaoNaS9Vm9rg=; b=rDLYnsdK2TXIhB8VAZDqt/irwbeOtedSNCpnglOeTZKcDob9qOlk5hp8RLYDuclZtet/xC yBM2N5aU6Cby50F8cqASlJMG98nH8q24SmLzLItMCtGjh/EXWv5SIu/pz3HvsMvg+PG2dN Q6WWFK3VSScHqiy0EDeYJ7wTlEdtKoqyr0qBobe+iGhhojM0sbF8jz+F/Nz8G/TilMMgPg rr4/V7t5l/wdI/8KQusIVobUn2Sz8CMYUVNR09E3VQ+YDL06LVxYuSFV7a74X2AvhJpGhv ryK23pOMih/k+d80T730PlM6YYqnIGIlWC0RAJ9pHD6xnG5Lmd8ANIyRk7Q/4g== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1682551469; a=rsa-sha256; cv=none; b=a9GO2I3+o+X1dKDnNEN5nuCUrv3hT/ZUcbtM823e1FjcQ/lWrQJ+X1jNleiKIAHPrez+CY IFqbLU80wKqR4OeSEB/7z/O2XL37mglUK8T5ss4hhPG6zkM7k6BDYfir/PMcZJAOZ5/+91 ZYyAhky2ftc3qF2glnZt+chX8s+aaaqYieE1FjKAO+r36e4Y5GZ9QkAfMpu547yX5nwj4w YhIexF9UNxNWH5Ouuanr7VpTbfq6OydE/VjfhJIjYSESe1ouQ7jdmzhZWGf3bNb5A8eAvv fA8vgumXcmN78NbAEhyPhVdyqohZ90vnwkMezD7NAuT++A6E6zDdcstaNu8Y5g== Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4Q6FM15Rryztr6 for ; Wed, 26 Apr 2023 23:24:29 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 33QNOTIw095910 for ; Wed, 26 Apr 2023 23:24:29 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 33QNOTib095909 for fs@FreeBSD.org; Wed, 26 Apr 2023 23:24:29 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: fs@FreeBSD.org Subject: [Bug 267028] kernel panics when booting with both (zfs,ko or vboxnetflt,ko or acpi_wmi.ko) and amdgpu.ko Date: Wed, 26 Apr 2023 23:24:29 +0000 X-Bugzilla-Reason: CC AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 13.1-RELEASE X-Bugzilla-Keywords: crash, needs-qa X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: marklmi26-fbsd@yahoo.com X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: fs@FreeBSD.org X-Bugzilla-Flags: maintainer-feedback? maintainer-feedback? X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 X-ThisMailContainsUnwantedMimeParts: N https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D267028 --- Comment #172 from Mark Millard --- One of the things that makes this hard to analyze is that the first failure quickly leads to other failures and most of the evidence for for the later failure. For example, in the following note that original trap number is 12 but the backtrace is for/after a later trap, of type-number 22 instead. There is very little information directly about the original trap type-number 12: Fatal trap 12: page fault while in kernel mode cpuid =3D 0; apic id =3D 00 fault virtual address =3D 0x0 fault code =3D supervisor read data, page not present instruction pointer =3D 0x20:0xffffffff80bf3727 stack pointer =3D 0x28:0xfffffe000e1a7ba0 frame pointer =3D 0x28:0xfffffe000e1a7bd0 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 1 (init) trap number =3D 12 WARNING !drm_modeset_is_locked(&crtc->mutex) failed at /usr/ports/graphics/drm-510-kmod/work/drm-kmod-drm_v5.10.163_4/drivers/gpu/= drm/drm_atomic_helper.c:619 . . . WARNING !drm_modeset_is_locked(&plane->mutex) failed at /usr/ports/graphics/drm-510-kmod/work/drm-kmod-drm_v5.10.163_4/drivers/gpu/= drm/drm_atomic_helper.c:894 kernel trap 22 with interrupts disabled kernel trap 22 with interrupts disabled panic: page fault cpuid =3D 0 time =3D 1682435560 KDB: stack backtrace: #0 0xffffffff80c66ee5 at kdb_backtrace+0x65 #1 0xffffffff80c1bbef at vpanic+0x17f #2 0xffffffff80c1ba63 at panic+0x43 #3 0xffffffff810addf5 at trap_fatal+0x385 #4 0xffffffff810ade4f at trap_pfault+0x4f #5 0xffffffff81084fd8 at calltrap+0x8 #6 0xffffffff8261d251 at spl_nvlist_free+0x61 #7 0xffffffff826dd740 at fm_nvlist_destroy+0x20 #8 0xffffffff827b6e95 at zfs_zevent_post_cb+0x15 #9 0xffffffff826dcd02 at zfs_zevent_drain+0x62 #10 0xffffffff826dcbf8 at zfs_zevent_drain_all+0x58 #11 0xffffffff826dede9 at fm_fini+0x19 #12 0xffffffff82713b94 at spa_fini+0x54 #13 0xffffffff827be303 at zfs_kmod_fini+0x33 #14 0xffffffff8262fb3b at zfs_shutdown+0x2b #15 0xffffffff80c1b76c at kern_reboot+0x3dc #16 0xffffffff80c1b381 at sys_reboot+0x411 #17 0xffffffff810ae6ec at amd64_syscall+0x10c . . . The primary hint about what code execution context lead to the original instance of trap type 12 above is basically: instruction pointer =3D 0x20:0xffffffff80bf3727 amdgpu does not leave in place a clean context for debugging kernel crashes. Trying to keep the video context operational for a kernel that has crashed, while not messing up the analysis context for the original problem is problematical. My guess would be that normal analysis of such tries to have the problem occur in a virtual machine sort of context where another (outer) context is available that is independent and can look at the details from outside the failing context. But even that would require the failing context in the VM to stop before amdgpu or the like messed up the evidence in the VM. (Not that I've ever done that type of evidence gathering.) --=20 You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug.=