From nobody Fri Sep 1 22:00:03 2023 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RcsQq6qTZz4rngx for ; Fri, 1 Sep 2023 22:00:19 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-lj1-x231.google.com (mail-lj1-x231.google.com [IPv6:2a00:1450:4864:20::231]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4RcsQq4jfBz4p21 for ; Fri, 1 Sep 2023 22:00:19 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-lj1-x231.google.com with SMTP id 38308e7fff4ca-2b9c907bc68so41984921fa.2 for ; Fri, 01 Sep 2023 15:00:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20230601.gappssmtp.com; s=20230601; t=1693605616; x=1694210416; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=K70ykC4hB4GIkHijSszz/UCI7rmr0fx3bk6vEt1d8QE=; b=MheMzSjiH9TJVSR4mhH41geTn1jDiuzbmhZwH3qqwUwL0ciOB/qPcFz7micg2QuFps V2IYZTkjQLDr0brcZwyHx5UAPWuiJN2JuhscJmSL/ThNnWsj2sdWAbW8aG77+cbOEez2 fqjD3Fpj+2hOVnwFQdAU0Rtb1OTMjJZ+frxaC9GZ0iK+KW5Tz55TaBceM0PbhBy7IQ8A UtuoiqtAAp9CuIElcsdH0Rc8ZpRGhDNECosm6wNcYNh5KiKCJ1Cmw8M04L9NU4gObHig MaAs6TxGaLYCgeEdQCyMEF7KynezjC2OAx/eldv+LShzAZtbZaqNb9ANSdQItHozsxrk emEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693605616; x=1694210416; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=K70ykC4hB4GIkHijSszz/UCI7rmr0fx3bk6vEt1d8QE=; b=P1wqIIcFpVPTlmaKHXRY5tZAPj3xOHZHhdP8NBty7G2sAzVmegWAF/Vyab2Meo/GaT RqpW5ToRXSEDinEewe/7FPq9rKqh2u59ernnwb58gX1cYN7Rn98m7qeimALZzKKXBEYs OvnxmgSVYWDf0AnmvSmV1OTxancAeAO9MOqSRQ7+oBK709CdAFn+sCd/S3Dj3YzX5G5n 2MRiMDj4V8vK0hf6Tz7S3kOcgxe50iQGODqqnZhesWsrImwin6voJqTtywfo4lBvtndw UmwpQiSrcT09501OEvUhA9YJMoUlcYUPnUfmCN3mpoDTCp/zXTqEMavTBoBwcgtH9BtI x9dA== X-Gm-Message-State: AOJu0Yyl9WdSD+tdTesqnPqtzA7+yxwYLVSqGYl6OmxA9EhR+GCJOun3 TplLErL3pCqwvHpG+PbI+DdV/UMCCUgMvWOswpUd+siRXOTA+rTG X-Google-Smtp-Source: AGHT+IGhofgHEE6qCcnuBqEFRKh2Fk2+ug5kGta2nW8kT032pEJ4qPteOOMGf/JD3/GNbc/lGGizcs+nXniuGROHQkY= X-Received: by 2002:a2e:8388:0:b0:2bc:bcc6:d4ad with SMTP id x8-20020a2e8388000000b002bcbcc6d4admr2518766ljg.21.1693605615114; Fri, 01 Sep 2023 15:00:15 -0700 (PDT) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 References: <20230901130314.460f91bf@ernst.home> <87edjixf6v.wl-herbert@gojira.at> <20230901160441.038539cd@ernst.home> <87cyz2x6x1.wl-herbert@gojira.at> <20230901182134.23b8d5f3@ernst.home> In-Reply-To: <20230901182134.23b8d5f3@ernst.home> From: Warner Losh Date: Fri, 1 Sep 2023 16:00:03 -0600 Message-ID: Subject: Re: FreeBSD-15 kernel panic when the amdtemp device is in the kernel To: garyj@gmx.de Cc: freebsd-current@freebsd.org Content-Type: multipart/alternative; boundary="000000000000143aca06045349c9" X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] X-Rspamd-Queue-Id: 4RcsQq4jfBz4p21 --000000000000143aca06045349c9 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I think that the problem is that admsmn has probed, but not attached (or failed to attach for some reason), so we find the device, but it's not initialized yet, so when we call amdsmn_read, it tries to lock a mutex that's not yet initialized. Not sure why this is happening, or why loading it as modules fixes it... But since I don't have the hardware, I can't help more. Sorry. Warner On Fri, Sep 1, 2023 at 10:21=E2=80=AFAM Gary Jennejohn wrote= : > On Fri, 01 Sep 2023 17:14:02 +0200 > "Herbert J. Skuhra" wrote: > > > On Fri, 01 Sep 2023 16:04:41 +0200, Gary Jennejohn wrote: > > > > > > On Fri, 01 Sep 2023 14:15:20 +0200 > > > "Herbert J. Skuhra" wrote: > > > > > > > On Fri, 01 Sep 2023 13:03:14 +0200, Gary Jennejohn wrote: > > > > > > > > > > I have a laptop wioth a AMD Ryzen 5 and a tower with a AMD Ryzen = 7 > 3700X. > > > > > > > > > > These are respectively Zen 1 and Zen 2 CPUs. > > > > > > > > > > I built a kernel on both computers using the FreeBSD-15 source > tree. > > > > > > > > > > If I include the amdtemp device in my kernel file BOTH computers > end up > > > > > with a kernel panic while trying to attach the amdtemp device. > > > > > > > > > > If I remove amdtemp both computers boot without any issues. > > > > > > > > > > I suspect that this commit is the cause: > > > > > > > > > > commit 323a94afb6236bcec3a07721566aec6f2ea2b209 > > > > > Author: Akio Morita > > > > > Date: Tue Aug 1 22:32:12 2023 +0200 > > > > > > > > > > amdsmn(4), amdtemp(4): add support for Zen 4 > > > > > > > > > > Zen 4 support, tested on Ryzen 9 7900 > > > > > > > > > > Reviewed by: imp (previous version), mhorne > > > > > Approved by: mhorne > > > > > Obtained from: > http://jyurai.ddo.jp/~amorita/diary/?date=3D20221102#p01 > > > > > Differential Revision: https://reviews.freebsd.org/D41049 > > > > > > > > Thanks for sharing your findings. > > > > > > > > Now I probably know why my old kernel from stable/13 no longer boot= ed > > > > after updating to stable/14. I've create a new kernel config and > > > > forgot to add "device amdtemp" & "device amdsmn" and forgot about t= he > > > > issue. After removing only "device amdtemp" from my old kernel conf= ig > > > > it boots again. > > > > > > > > Unfortunately reverting this commit (git revert -n 323a94afb623) > > > > doesn't resolve this issue. Old kernel does not boot if "device > > > > amdtemp" is enabled. Probably wrong commit or I am doing somethig > > > > wrong!? > > > > > > > > > > Strange. My FreeBSD-14 kernel boots with device amdtemp (which > automatically > > > results in amdsmn being included). It's FreeBSD-15 which fails for m= e. > > > > 1. 'kload amdtemp' works: > > 12 1 0xffffffff81e7c000 3160 amdtemp.ko > > 13 1 0xffffffff81e80000 2138 amdsmn.ko > > > > amdsmn0: on hostb0 > > amdtemp0: on hostb0 > > > > 2. GENERIC boots fine. The following kernel does not: > > > > include GENERIC > > > > ident TEST > > device amdtemp > > > > 3. Unfortunately this is a remote server without a serial console. I > > don't get a crashdump and I can't find anything in /var/log/messages. > > > > 4. I have no good revision for stable/14 and main. On main I always > > use GENERIC-NODEBUG. :-( > > > > Thanks, Herbert! kldload'ing amdsmn and amdtemp really does work! > > Now I can run FBSD-15 :) > > -- > Gary Jennejohn > > --000000000000143aca06045349c9 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I think that the problem is that admsmn has probed, but no= t attached (or failed to attach for some reason), so we find the device, bu= t it's not initialized yet, so when we call amdsmn_read, it tries to lo= ck a mutex that's not yet initialized.

Not sure why = this=C2=A0is happening, or why loading it as modules fixes it...
=
But since I don't have the hardware, I can't help mo= re. Sorry.

Warner

On Fri, Sep 1, 2023 at 10:2= 1=E2=80=AFAM Gary Jennejohn <garyj@gmx.d= e> wrote:
On Fri, 01 Sep 2023 17:14:02 +0200
"Herbert J. Skuhra" <herbert@gojira.at> wrote:

> On Fri, 01 Sep 2023 16:04:41 +0200, Gary Jennejohn wrote:
> >
> > On Fri, 01 Sep 2023 14:15:20 +0200
> > "Herbert J. Skuhra" <herbert@gojira.at> wrote:
> >
> > > On Fri, 01 Sep 2023 13:03:14 +0200, Gary Jennejohn wrote: > > > >
> > > > I have a laptop wioth a AMD Ryzen 5 and a tower with a = AMD Ryzen 7 3700X.
> > > >
> > > > These are respectively Zen 1 and Zen 2 CPUs.
> > > >
> > > > I built a kernel on both computers using the FreeBSD-15= source tree.
> > > >
> > > > If I include the amdtemp device in my kernel file BOTH = computers end up
> > > > with a kernel panic while trying to attach the amdtemp = device.
> > > >
> > > > If I remove amdtemp both computers boot without any iss= ues.
> > > >
> > > > I suspect that this commit is the cause:
> > > >
> > > > commit 323a94afb6236bcec3a07721566aec6f2ea2b209
> > > > Author: Akio Morita <akio.morita@kek.jp>
> > > > Date:=C2=A0 =C2=A0Tue Aug 1 22:32:12 2023 +0200
> > > >
> > > >=C2=A0 =C2=A0 =C2=A0amdsmn(4), amdtemp(4): add support f= or Zen 4
> > > >
> > > >=C2=A0 =C2=A0 =C2=A0Zen 4 support, tested on Ryzen 9 790= 0
> > > >
> > > >=C2=A0 =C2=A0 =C2=A0Reviewed by:=C2=A0 =C2=A0 imp (previ= ous version), mhorne
> > > >=C2=A0 =C2=A0 =C2=A0Approved by:=C2=A0 =C2=A0 mhorne
> > > >=C2=A0 =C2=A0 =C2=A0Obtained from:=C2=A0 http://jyurai.ddo.jp/~amorita/diary/?date=3D20221102#p01
> > > >=C2=A0 =C2=A0 =C2=A0Differential Revision:=C2=A0
https://reviews.freebsd.org/D41049
> > >
> > > Thanks for sharing your findings.
> > >
> > > Now I probably know why my old kernel from stable/13 no long= er booted
> > > after updating to stable/14. I've create a new kernel co= nfig and
> > > forgot to add "device amdtemp" & "device = amdsmn" and forgot about the
> > > issue. After removing only "device amdtemp" from m= y old kernel config
> > > it boots again.
> > >
> > > Unfortunately reverting this commit (git revert -n 323a94afb= 623)
> > > doesn't resolve this issue. Old kernel does not boot if = "device
> > > amdtemp" is enabled. Probably wrong commit or I am doin= g somethig
> > > wrong!?
> > >
> >
> > Strange.=C2=A0 My FreeBSD-14 kernel boots with device amdtemp (wh= ich automatically
> > results in amdsmn being included).=C2=A0 It's FreeBSD-15 whic= h fails for me.
>
> 1. 'kload amdtemp' works:
>=C2=A0 =C2=A0 12=C2=A0 =C2=A0 1 0xffffffff81e7c000=C2=A0 =C2=A0 =C2=A03= 160 amdtemp.ko
>=C2=A0 =C2=A0 13=C2=A0 =C2=A0 1 0xffffffff81e80000=C2=A0 =C2=A0 =C2=A02= 138 amdsmn.ko
>
>=C2=A0 =C2=A0 amdsmn0: <AMD Family 17h System Management Network>= on hostb0
>=C2=A0 =C2=A0 amdtemp0: <AMD CPU On-Die Thermal Sensors> on hostb= 0
>
> 2. GENERIC boots fine. The following kernel does not:
>
>=C2=A0 =C2=A0 include GENERIC
>
>=C2=A0 =C2=A0 ident=C2=A0 =C2=A0 =C2=A0 TEST
>=C2=A0 =C2=A0 device=C2=A0 =C2=A0 =C2=A0amdtemp
>
> 3. Unfortunately this is a remote server without a serial console. I > don't get a crashdump and I can't find anything in /var/log/me= ssages.
>
> 4. I have no good revision for stable/14 and main. On main I always > use GENERIC-NODEBUG. :-(
>

Thanks, Herbert!=C2=A0 kldload'ing amdsmn and amdtemp really does work!=

Now I can run FBSD-15 :)

--
Gary Jennejohn

--000000000000143aca06045349c9--