From nobody Thu Jul 4 00:10:07 2024 X-Original-To: stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4WDxqd6lYSz5GkvY for ; Thu, 04 Jul 2024 00:10:21 +0000 (UTC) (envelope-from junchoon@dec.sakura.ne.jp) Received: from www121.sakura.ne.jp (www121.sakura.ne.jp [153.125.133.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4WDxqZ3pZkz3xlX for ; Thu, 4 Jul 2024 00:10:17 +0000 (UTC) (envelope-from junchoon@dec.sakura.ne.jp) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=dec.sakura.ne.jp header.s=s2405 header.b=lL0vBkvs; dmarc=pass (policy=none) header.from=dec.sakura.ne.jp; spf=pass (mx1.freebsd.org: domain of junchoon@dec.sakura.ne.jp designates 153.125.133.21 as permitted sender) smtp.mailfrom=junchoon@dec.sakura.ne.jp Received: from kalamity.joker.local (123-1-21-232.area1b.commufa.jp [123.1.21.232]) (authenticated bits=0) by www121.sakura.ne.jp (8.17.1/8.17.1/[SAKURA-WEB]/20201212) with ESMTPA id 4640A7gB037067 for ; Thu, 4 Jul 2024 09:10:08 +0900 (JST) (envelope-from junchoon@dec.sakura.ne.jp) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=dec.sakura.ne.jp; s=s2405; t=1720051808; bh=e0/nL6Bpr1YMafI4nszPTkm9VpgGG+yHwyWUsJXlshs=; h=Date:From:To:Subject:In-Reply-To:References; b=lL0vBkvs8MV9gYCUOshPgwFjCiKwzTZbWryryAtwK+g07hS9u2rD+YNWzmDeVkyti 3FbwMVXDjbyS4QjEGJNIh+jDTQ/0/vHK9ljybHZ6I6MwCSXWFhZ8zdDx05fxZ0sMRc CxyF/rjL+eLF0br8G0eo0SO03345wTn3W/s6HaRU= Date: Thu, 4 Jul 2024 09:10:07 +0900 From: Tomoaki AOKI To: stable@freebsd.org Subject: Re: x11/nvidia-driver fails on 14-STABLE/amd64 Message-Id: <20240704091007.5dc5f7a41bf12f8f764a896d@dec.sakura.ne.jp> In-Reply-To: <20240703082414.572553dabee65d0f6dd129a1@dec.sakura.ne.jp> References: <2458ffc88ffac503076c06cccafa0dc0@chen.org.nz> <20240703082414.572553dabee65d0f6dd129a1@dec.sakura.ne.jp> Organization: Junchoon corps X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; amd64-portbld-freebsd14.0) List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-stable@freebsd.org Sender: owner-freebsd-stable@FreeBSD.org Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.40 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.998]; DMARC_POLICY_ALLOW(-0.50)[dec.sakura.ne.jp,none]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+ip4:153.125.133.16/28]; R_DKIM_ALLOW(-0.20)[dec.sakura.ne.jp:s=s2405]; ONCE_RECEIVED(0.10)[]; MIME_GOOD(-0.10)[text/plain]; RCPT_COUNT_ONE(0.00)[1]; ASN(0.00)[asn:7684, ipnet:153.125.128.0/18, country:JP]; MIME_TRACE(0.00)[0:+]; HAS_ORG_HEADER(0.00)[]; ARC_NA(0.00)[]; RCVD_COUNT_ONE(0.00)[1]; RCVD_VIA_SMTP_AUTH(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; MLMMJ_DEST(0.00)[stable@freebsd.org]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[stable@freebsd.org]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_TRACE(0.00)[dec.sakura.ne.jp:+] X-Rspamd-Queue-Id: 4WDxqZ3pZkz3xlX On Wed, 3 Jul 2024 08:24:14 +0900 Tomoaki AOKI wrote: > On Tue, 02 Jul 2024 22:11:45 +0000 > jonc@chen.org.nz wrote: > > > Hi, > > > > I updated my STABLE-14/amd64 to 1a0314d6e30554fc2b07caa5121b00956f416cc4 (ctladm: Fix a race....), and it appears that the latest kernel update breaks x11/nvidia-driver. The system panics when X starts up. Just to be sure I have rebuild and resinstalled x11/nvidia-driver with the updated /usr/src present. /var/log/messages has the following errors: > > > > Jul 3 09:50:29 stormbringer kernel: ACPI Warning: \_SB.PC00.PEG1.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20221020/nsarguments-212) > > Jul 3 09:50:29 stormbringer kernel: Firmware Error (ACPI): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20221020/dsfield-352) > > Jul 3 09:50:29 stormbringer kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20221020/dswload2-639) > > Jul 3 09:50:29 stormbringer kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20221020/psparse-689) > > Jul 3 09:51:52 stormbringer syslogd: kernel boot file is /boot/kernel/kernel > > Jul 3 09:51:52 stormbringer kernel: NVRM: GPU at PCI:0000:01:00: GPU-db6a2e9b-ba08-3668-c104-d55596af9efb > > Jul 3 09:51:52 stormbringer kernel: NVRM: Xid (PCI:0000:01:00): 79, pid='', name=, GPU has fallen off the bus. > > Jul 3 09:51:52 stormbringer kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus. > > Jul 3 09:51:52 stormbringer kernel: > > Jul 3 09:51:52 stormbringer syslogd: last message repeated 1 times > > Jul 3 09:51:52 stormbringer kernel: Fatal trap 12: page fault while in kernel mode > > Jul 3 09:51:52 stormbringer kernel: cpuid = 14; apic id = 38 > > Jul 3 09:51:52 stormbringer kernel: fault virtual address = 0x0 > > Jul 3 09:51:52 stormbringer kernel: fault code = supervisor read data, page not present > > Jul 3 09:51:52 stormbringer kernel: instruction pointer = 0x20:0xffffffff85bae56c > > Jul 3 09:51:52 stormbringer kernel: stack pointer = 0x28:0xfffffe01a894e5e0 > > Jul 3 09:51:52 stormbringer kernel: frame pointer = 0x28:0xfffffe01adc85ce0 > > Jul 3 09:51:52 stormbringer kernel: code segment = base 0x0, limit 0xfffff, type 0x1b > > Jul 3 09:51:52 stormbringer kernel: = DPL 0, pres 1, long 1, def32 0, gran 1 > > Jul 3 09:51:52 stormbringer kernel: processor eflags = interrupt enabled, resume, IOPL = 0 > > Jul 3 09:51:52 stormbringer kernel: current process = 1954 (Xorg) > > Jul 3 09:51:52 stormbringer kernel: rdi: fffffe01a951f000 rsi: fffffe01ae26f000 rdx: 0000000000000001 > > Jul 3 09:51:52 stormbringer kernel: rcx: 0000000000000000 r8: 00000000000000c0 r9: fffffe01adc858f0 > > Jul 3 09:51:52 stormbringer kernel: rax: 0000000000000000 rbx: fffffe01ae26f000 rbp: fffffe01adc85ce0 > > Jul 3 09:51:52 stormbringer kernel: r10: 000000005237a738 r11: 0000000066847626 r12: 0000000000000000 > > Jul 3 09:51:52 stormbringer kernel: r13: fffffe01a951f000 r14: 0000000000000001 r15: fffffe01ade09008 > > Jul 3 09:51:52 stormbringer kernel: trap number = 12 > > Jul 3 09:51:52 stormbringer kernel: panic: page fault > > Jul 3 09:51:52 stormbringer kernel: cpuid = 14 > > Jul 3 09:51:52 stormbringer kernel: time = 1719957030 > > Jul 3 09:51:52 stormbringer kernel: KDB: stack backtrace: > > Jul 3 09:51:52 stormbringer kernel: #0 0xffffffff80b8002d at kdb_backtrace+0x5d > > Jul 3 09:51:52 stormbringer kernel: #1 0xffffffff80b32c51 at vpanic+0x131 > > Jul 3 09:51:52 stormbringer kernel: #2 0xffffffff80b32b13 at panic+0x43 > > Jul 3 09:51:52 stormbringer kernel: #3 0xffffffff8100194b at trap_fatal+0x40b > > Jul 3 09:51:52 stormbringer kernel: #4 0xffffffff81001996 at trap_pfault+0x46 > > Jul 3 09:51:52 stormbringer kernel: #5 0xffffffff80fd8458 at calltrap+0x8 > > > > When I reverted to my previous kernel, X started up without any issues. > > > > Cheers > > -- > > Jonathan Chen > > Did you tried rebuilding x11/nvidia-driver from ports AFTER INSTALLING > NEW KERNEL AND WORLD? > > If yes, any of commits AFTER commit > 620a6a54bb7bb6e1c5607092b6ec49e353e0925f [1] should broke something. > (As I'm on the commit and x11/nvidia-driver 555.58 (overrided > DISTVERSION and setting NO_CHECKSUM=YES on build) to try this new > feature branch of driver) isworking fine. > This case, if your old build is older than this and if you want fix for > FreeBSD-SA-24:04.openssh, the above-mentioned commit is worth trying. > > Additional note: > If you are using graphics/nvidia-drm-[515|61]-kmod port, > you need to apply the patch attached at Bug 279539 [2] to build. > > And if you want to test 555 series of nvidia-drm*-kmod driver, you need > to apply the diff at Differential revision D45400 of Phablicator [3], > too. > > [1] > https://cgit.freebsd.org/src/commit/?h=stable/14&id=620a6a54bb7bb6e1c5607092b6ec49e353e0925f > > [2] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=279539 > > [3] https://reviews.freebsd.org/D45400 > > -- > Tomoaki AOKI Updated stable/14 to commit 342053a66c161c12f6887efac913c80040959ae8, which is the next commit to the reported one. X starts as usual. So any of commits between 620a6a54bb7bb6e1c5607092b6ec49e353e0925f and 342053a66c161c12f6887efac913c80040959ae8 doesn't seem to matter, at least for 555.58 of new feature branch x11/nvidia-driver. And note that I'm running on nvidia discrete GPU, disabling Intel iGPU in my CPU via firmware configuration. And the same version of the driver is working on main branch of base at commit 59c21ed6e811c753f7806766ba45a5bfa71ae2ed. As main branch is just a test bed environment, it's not yet updated to the commit fixing openssh. My daily driver is stable/14. BTW, how are you start X? And on which commit was your working kernel built from? If you auto start X on boot via something like xdm, it could mask the cause of the panic, at least one reason. If you are loading nvidia[-modeset].ko via /boot/loader.conf, never attempt to do so. Remove or comment out the line from /boot/loader.conf and add the module on kld_list variable in /etc/rc.conf[.local]. If you start X from command line with startx and the cause was as above, you should see the panic on loading the module. If your previous working kernel was old enough to be called "giant step", i.e. from older stable branch like stable/12, things could be much more complexed. -- Tomoaki AOKI