Date: Fri, 1 Sep 2023 16:00:03 -0600 From: Warner Losh <imp@bsdimp.com> To: garyj@gmx.de Cc: freebsd-current@freebsd.org Subject: Re: FreeBSD-15 kernel panic when the amdtemp device is in the kernel Message-ID: <CANCZdfqXanzJU8Ki0vdjce6wieZrJvbiqs%2Bkqen6d2VQd=pzAQ@mail.gmail.com> In-Reply-To: <20230901182134.23b8d5f3@ernst.home> References: <20230901130314.460f91bf@ernst.home> <87edjixf6v.wl-herbert@gojira.at> <20230901160441.038539cd@ernst.home> <87cyz2x6x1.wl-herbert@gojira.at> <20230901182134.23b8d5f3@ernst.home>
next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000143aca06045349c9 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I think that the problem is that admsmn has probed, but not attached (or failed to attach for some reason), so we find the device, but it's not initialized yet, so when we call amdsmn_read, it tries to lock a mutex that's not yet initialized. Not sure why this is happening, or why loading it as modules fixes it... But since I don't have the hardware, I can't help more. Sorry. Warner On Fri, Sep 1, 2023 at 10:21=E2=80=AFAM Gary Jennejohn <garyj@gmx.de> wrote= : > On Fri, 01 Sep 2023 17:14:02 +0200 > "Herbert J. Skuhra" <herbert@gojira.at> wrote: > > > On Fri, 01 Sep 2023 16:04:41 +0200, Gary Jennejohn wrote: > > > > > > On Fri, 01 Sep 2023 14:15:20 +0200 > > > "Herbert J. Skuhra" <herbert@gojira.at> wrote: > > > > > > > On Fri, 01 Sep 2023 13:03:14 +0200, Gary Jennejohn wrote: > > > > > > > > > > I have a laptop wioth a AMD Ryzen 5 and a tower with a AMD Ryzen = 7 > 3700X. > > > > > > > > > > These are respectively Zen 1 and Zen 2 CPUs. > > > > > > > > > > I built a kernel on both computers using the FreeBSD-15 source > tree. > > > > > > > > > > If I include the amdtemp device in my kernel file BOTH computers > end up > > > > > with a kernel panic while trying to attach the amdtemp device. > > > > > > > > > > If I remove amdtemp both computers boot without any issues. > > > > > > > > > > I suspect that this commit is the cause: > > > > > > > > > > commit 323a94afb6236bcec3a07721566aec6f2ea2b209 > > > > > Author: Akio Morita <akio.morita@kek.jp> > > > > > Date: Tue Aug 1 22:32:12 2023 +0200 > > > > > > > > > > amdsmn(4), amdtemp(4): add support for Zen 4 > > > > > > > > > > Zen 4 support, tested on Ryzen 9 7900 > > > > > > > > > > Reviewed by: imp (previous version), mhorne > > > > > Approved by: mhorne > > > > > Obtained from: > http://jyurai.ddo.jp/~amorita/diary/?date=3D20221102#p01 > > > > > Differential Revision: https://reviews.freebsd.org/D41049 > > > > > > > > Thanks for sharing your findings. > > > > > > > > Now I probably know why my old kernel from stable/13 no longer boot= ed > > > > after updating to stable/14. I've create a new kernel config and > > > > forgot to add "device amdtemp" & "device amdsmn" and forgot about t= he > > > > issue. After removing only "device amdtemp" from my old kernel conf= ig > > > > it boots again. > > > > > > > > Unfortunately reverting this commit (git revert -n 323a94afb623) > > > > doesn't resolve this issue. Old kernel does not boot if "device > > > > amdtemp" is enabled. Probably wrong commit or I am doing somethig > > > > wrong!? > > > > > > > > > > Strange. My FreeBSD-14 kernel boots with device amdtemp (which > automatically > > > results in amdsmn being included). It's FreeBSD-15 which fails for m= e. > > > > 1. 'kload amdtemp' works: > > 12 1 0xffffffff81e7c000 3160 amdtemp.ko > > 13 1 0xffffffff81e80000 2138 amdsmn.ko > > > > amdsmn0: <AMD Family 17h System Management Network> on hostb0 > > amdtemp0: <AMD CPU On-Die Thermal Sensors> on hostb0 > > > > 2. GENERIC boots fine. The following kernel does not: > > > > include GENERIC > > > > ident TEST > > device amdtemp > > > > 3. Unfortunately this is a remote server without a serial console. I > > don't get a crashdump and I can't find anything in /var/log/messages. > > > > 4. I have no good revision for stable/14 and main. On main I always > > use GENERIC-NODEBUG. :-( > > > > Thanks, Herbert! kldload'ing amdsmn and amdtemp really does work! > > Now I can run FBSD-15 :) > > -- > Gary Jennejohn > > --000000000000143aca06045349c9 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr">I think that the problem is that admsmn has probed, but no= t attached (or failed to attach for some reason), so we find the device, bu= t it's not initialized yet, so when we call amdsmn_read, it tries to lo= ck a mutex that's not yet initialized.<div><br></div><div>Not sure why = this=C2=A0is happening, or why loading it as modules fixes it...</div><div>= <br></div><div>But since I don't have the hardware, I can't help mo= re. Sorry.</div><div><br></div><div>Warner</div></div><br><div class=3D"gma= il_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Fri, Sep 1, 2023 at 10:2= 1=E2=80=AFAM Gary Jennejohn <<a href=3D"mailto:garyj@gmx.de">garyj@gmx.d= e</a>> wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin= :0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"= >On Fri, 01 Sep 2023 17:14:02 +0200<br> "Herbert J. Skuhra" <<a href=3D"mailto:herbert@gojira.at" targ= et=3D"_blank">herbert@gojira.at</a>> wrote:<br> <br> > On Fri, 01 Sep 2023 16:04:41 +0200, Gary Jennejohn wrote:<br> > ><br> > > On Fri, 01 Sep 2023 14:15:20 +0200<br> > > "Herbert J. Skuhra" <<a href=3D"mailto:herbert@gojir= a.at" target=3D"_blank">herbert@gojira.at</a>> wrote:<br> > ><br> > > > On Fri, 01 Sep 2023 13:03:14 +0200, Gary Jennejohn wrote:<br= > > > > ><br> > > > > I have a laptop wioth a AMD Ryzen 5 and a tower with a = AMD Ryzen 7 3700X.<br> > > > ><br> > > > > These are respectively Zen 1 and Zen 2 CPUs.<br> > > > ><br> > > > > I built a kernel on both computers using the FreeBSD-15= source tree.<br> > > > ><br> > > > > If I include the amdtemp device in my kernel file BOTH = computers end up<br> > > > > with a kernel panic while trying to attach the amdtemp = device.<br> > > > ><br> > > > > If I remove amdtemp both computers boot without any iss= ues.<br> > > > ><br> > > > > I suspect that this commit is the cause:<br> > > > ><br> > > > > commit 323a94afb6236bcec3a07721566aec6f2ea2b209<br> > > > > Author: Akio Morita <<a href=3D"mailto:akio.morita@k= ek.jp" target=3D"_blank">akio.morita@kek.jp</a>><br> > > > > Date:=C2=A0 =C2=A0Tue Aug 1 22:32:12 2023 +0200<br> > > > ><br> > > > >=C2=A0 =C2=A0 =C2=A0amdsmn(4), amdtemp(4): add support f= or Zen 4<br> > > > ><br> > > > >=C2=A0 =C2=A0 =C2=A0Zen 4 support, tested on Ryzen 9 790= 0<br> > > > ><br> > > > >=C2=A0 =C2=A0 =C2=A0Reviewed by:=C2=A0 =C2=A0 imp (previ= ous version), mhorne<br> > > > >=C2=A0 =C2=A0 =C2=A0Approved by:=C2=A0 =C2=A0 mhorne<br> > > > >=C2=A0 =C2=A0 =C2=A0Obtained from:=C2=A0 <a href=3D"http= ://jyurai.ddo.jp/~amorita/diary/?date=3D20221102#p01" rel=3D"noreferrer" ta= rget=3D"_blank">http://jyurai.ddo.jp/~amorita/diary/?date=3D20221102#p01</a= ><br> > > > >=C2=A0 =C2=A0 =C2=A0Differential Revision:=C2=A0 <a href= =3D"https://reviews.freebsd.org/D41049" rel=3D"noreferrer" target=3D"_blank= ">https://reviews.freebsd.org/D41049</a><br> > > ><br> > > > Thanks for sharing your findings.<br> > > ><br> > > > Now I probably know why my old kernel from stable/13 no long= er booted<br> > > > after updating to stable/14. I've create a new kernel co= nfig and<br> > > > forgot to add "device amdtemp" & "device = amdsmn" and forgot about the<br> > > > issue. After removing only "device amdtemp" from m= y old kernel config<br> > > > it boots again.<br> > > ><br> > > > Unfortunately reverting this commit (git revert -n 323a94afb= 623)<br> > > > doesn't resolve this issue. Old kernel does not boot if = "device<br> > > > amdtemp" is enabled. Probably wrong commit or I am doin= g somethig<br> > > > wrong!?<br> > > ><br> > ><br> > > Strange.=C2=A0 My FreeBSD-14 kernel boots with device amdtemp (wh= ich automatically<br> > > results in amdsmn being included).=C2=A0 It's FreeBSD-15 whic= h fails for me.<br> ><br> > 1. 'kload amdtemp' works:<br> >=C2=A0 =C2=A0 12=C2=A0 =C2=A0 1 0xffffffff81e7c000=C2=A0 =C2=A0 =C2=A03= 160 amdtemp.ko<br> >=C2=A0 =C2=A0 13=C2=A0 =C2=A0 1 0xffffffff81e80000=C2=A0 =C2=A0 =C2=A02= 138 amdsmn.ko<br> ><br> >=C2=A0 =C2=A0 amdsmn0: <AMD Family 17h System Management Network>= on hostb0<br> >=C2=A0 =C2=A0 amdtemp0: <AMD CPU On-Die Thermal Sensors> on hostb= 0<br> ><br> > 2. GENERIC boots fine. The following kernel does not:<br> ><br> >=C2=A0 =C2=A0 include GENERIC<br> ><br> >=C2=A0 =C2=A0 ident=C2=A0 =C2=A0 =C2=A0 TEST<br> >=C2=A0 =C2=A0 device=C2=A0 =C2=A0 =C2=A0amdtemp<br> ><br> > 3. Unfortunately this is a remote server without a serial console. I<b= r> > don't get a crashdump and I can't find anything in /var/log/me= ssages.<br> ><br> > 4. I have no good revision for stable/14 and main. On main I always<br= > > use GENERIC-NODEBUG. :-(<br> ><br> <br> Thanks, Herbert!=C2=A0 kldload'ing amdsmn and amdtemp really does work!= <br> <br> Now I can run FBSD-15 :)<br> <br> --<br> Gary Jennejohn<br> <br> </blockquote></div> --000000000000143aca06045349c9--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfqXanzJU8Ki0vdjce6wieZrJvbiqs%2Bkqen6d2VQd=pzAQ>