Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 3 Jul 2024 08:24:14 +0900
From:      Tomoaki AOKI <junchoon@dec.sakura.ne.jp>
To:        stable@freebsd.org
Subject:   Re: x11/nvidia-driver fails on 14-STABLE/amd64
Message-ID:  <20240703082414.572553dabee65d0f6dd129a1@dec.sakura.ne.jp>
In-Reply-To: <2458ffc88ffac503076c06cccafa0dc0@chen.org.nz>
References:  <2458ffc88ffac503076c06cccafa0dc0@chen.org.nz>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 02 Jul 2024 22:11:45 +0000
jonc@chen.org.nz wrote:

> Hi,
> 
> I updated my STABLE-14/amd64 to 1a0314d6e30554fc2b07caa5121b00956f416cc4 (ctladm: Fix a race....), and it appears that the latest kernel update breaks x11/nvidia-driver. The system panics when X starts up. Just to be sure I have rebuild and resinstalled x11/nvidia-driver with the updated /usr/src present. /var/log/messages has the following errors:
> 
> Jul  3 09:50:29 stormbringer kernel: ACPI Warning: \_SB.PC00.PEG1.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20221020/nsarguments-212)
> Jul  3 09:50:29 stormbringer kernel: Firmware Error (ACPI): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20221020/dsfield-352)
> Jul  3 09:50:29 stormbringer kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20221020/dswload2-639)
> Jul  3 09:50:29 stormbringer kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20221020/psparse-689)
> Jul  3 09:51:52 stormbringer syslogd: kernel boot file is /boot/kernel/kernel
> Jul  3 09:51:52 stormbringer kernel: NVRM: GPU at PCI:0000:01:00: GPU-db6a2e9b-ba08-3668-c104-d55596af9efb
> Jul  3 09:51:52 stormbringer kernel: NVRM: Xid (PCI:0000:01:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
> Jul  3 09:51:52 stormbringer kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
> Jul  3 09:51:52 stormbringer kernel: 
> Jul  3 09:51:52 stormbringer syslogd: last message repeated 1 times
> Jul  3 09:51:52 stormbringer kernel: Fatal trap 12: page fault while in kernel mode
> Jul  3 09:51:52 stormbringer kernel: cpuid = 14; apic id = 38
> Jul  3 09:51:52 stormbringer kernel: fault virtual address  = 0x0
> Jul  3 09:51:52 stormbringer kernel: fault code     = supervisor read data, page not present
> Jul  3 09:51:52 stormbringer kernel: instruction pointer    = 0x20:0xffffffff85bae56c
> Jul  3 09:51:52 stormbringer kernel: stack pointer          = 0x28:0xfffffe01a894e5e0
> Jul  3 09:51:52 stormbringer kernel: frame pointer          = 0x28:0xfffffe01adc85ce0
> Jul  3 09:51:52 stormbringer kernel: code segment       = base 0x0, limit 0xfffff, type 0x1b
> Jul  3 09:51:52 stormbringer kernel:            = DPL 0, pres 1, long 1, def32 0, gran 1
> Jul  3 09:51:52 stormbringer kernel: processor eflags   = interrupt enabled, resume, IOPL = 0
> Jul  3 09:51:52 stormbringer kernel: current process        = 1954 (Xorg)
> Jul  3 09:51:52 stormbringer kernel: rdi: fffffe01a951f000 rsi: fffffe01ae26f000 rdx: 0000000000000001
> Jul  3 09:51:52 stormbringer kernel: rcx: 0000000000000000  r8: 00000000000000c0  r9: fffffe01adc858f0
> Jul  3 09:51:52 stormbringer kernel: rax: 0000000000000000 rbx: fffffe01ae26f000 rbp: fffffe01adc85ce0
> Jul  3 09:51:52 stormbringer kernel: r10: 000000005237a738 r11: 0000000066847626 r12: 0000000000000000
> Jul  3 09:51:52 stormbringer kernel: r13: fffffe01a951f000 r14: 0000000000000001 r15: fffffe01ade09008
> Jul  3 09:51:52 stormbringer kernel: trap number        = 12
> Jul  3 09:51:52 stormbringer kernel: panic: page fault
> Jul  3 09:51:52 stormbringer kernel: cpuid = 14
> Jul  3 09:51:52 stormbringer kernel: time = 1719957030
> Jul  3 09:51:52 stormbringer kernel: KDB: stack backtrace:
> Jul  3 09:51:52 stormbringer kernel: #0 0xffffffff80b8002d at kdb_backtrace+0x5d
> Jul  3 09:51:52 stormbringer kernel: #1 0xffffffff80b32c51 at vpanic+0x131
> Jul  3 09:51:52 stormbringer kernel: #2 0xffffffff80b32b13 at panic+0x43
> Jul  3 09:51:52 stormbringer kernel: #3 0xffffffff8100194b at trap_fatal+0x40b
> Jul  3 09:51:52 stormbringer kernel: #4 0xffffffff81001996 at trap_pfault+0x46
> Jul  3 09:51:52 stormbringer kernel: #5 0xffffffff80fd8458 at calltrap+0x8
> 
> When I reverted to my previous kernel, X started up without any issues.
> 
> Cheers
> --
> Jonathan Chen <jonc@chen.org.nz>

Did you tried rebuilding x11/nvidia-driver from ports AFTER INSTALLING
NEW KERNEL AND WORLD?

If yes, any of commits AFTER commit
620a6a54bb7bb6e1c5607092b6ec49e353e0925f [1] should broke something.
(As I'm on the commit and x11/nvidia-driver 555.58 (overrided
DISTVERSION and setting NO_CHECKSUM=YES on build) to try this new
feature branch of driver) isworking fine.
This case, if your old build is older than this and if you want fix for
FreeBSD-SA-24:04.openssh, the above-mentioned commit is worth trying.

Additional note:
If you are using graphics/nvidia-drm-[515|61]-kmod port, 
you need to apply the patch attached at Bug 279539 [2] to build.

And if you want to test 555 series of nvidia-drm*-kmod driver, you need
to apply the diff at Differential revision D45400 of Phablicator [3],
too.

[1]
https://cgit.freebsd.org/src/commit/?h=stable/14&id=620a6a54bb7bb6e1c5607092b6ec49e353e0925f

[2] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=279539

[3] https://reviews.freebsd.org/D45400

-- 
Tomoaki AOKI    <junchoon@dec.sakura.ne.jp>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20240703082414.572553dabee65d0f6dd129a1>