Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 29 Dec 2024 09:32:23 -0700
From:      Warner Losh <imp@bsdimp.com>
To:        Mark Johnston <markj@freebsd.org>
Cc:        Yuri Pankov <yuri@aetern.org>, current@freebsd.org, imp@freebsd.org, jhb@freebsd.org
Subject:   Re: hdaa: uma_zalloc_debug: zone "malloc-{32,64}" with the following non-sleepable locks held
Message-ID:  <CANCZdfqyAgn=G_DS8EuXWpuUPyu5psSNXQ4Sta%2BTDHhHm0Pm-w@mail.gmail.com>
In-Reply-To: <Z3Am46XzkMkljEp-@nuc>
References:  <6901050e-c6bc-4347-a0d4-98e1de94b005@aetern.org> <Z3Am46XzkMkljEp-@nuc>

next in thread | previous in thread | raw e-mail | index | archive | help
--00000000000045ab17062a6b3e2a
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Sat, Dec 28, 2024 at 9:27=E2=80=AFAM Mark Johnston <markj@freebsd.org> w=
rote:

> On Fri, Dec 27, 2024 at 08:30:37PM +0700, Yuri Pankov wrote:
> > Getting the following debug notifications:
> >
> > hdacc0: <ATI R6xx HDA CODEC> at cad 0 on hdac0
> > hdaa0: <ATI R6xx Audio Function Group> at nid 1 on hdacc0
> > uma_zalloc_debug: zone "malloc-32" with the following non-sleepable
> > locks held:
> > exclusive sleep mutex hdac0 (HDA driver mutex) r =3D 0
> > (0xfffff80107cb7aa0) locked @ /usr/src/sys/dev/sound/pci/hda/hdaa.c:157=
1
> > stack backtrace:
> > #0 0xffffffff80bcbbac at witness_debugger+0x6c
> > #1 0xffffffff80bccdc0 at witness_warn+0x430
> > #2 0xffffffff80f00974 at uma_zalloc_debug+0x34
> > #3 0xffffffff80f004c7 at uma_zalloc_arg+0x27
> > #4 0xffffffff80b26a7d at malloc+0x7d
> > #5 0xffffffff80b2737d at realloc+0xed
> > #6 0xffffffff80b27432 at reallocf+0x12
> > #7 0xffffffff80b9238d at devclass_add_device+0x1cd
> > #8 0xffffffff80b9093b at make_device+0x10b
> > #9 0xffffffff80b9077d at device_add_child_ordered+0x2d
> > #10 0xffffffff808b2b2c at hdaa_configure+0x485c
> > #11 0xffffffff808ac5b4 at hdaa_attach+0x544
> > #12 0xffffffff80b92b9b at device_attach+0x45b
> > #13 0xffffffff80b93f0a at bus_attach_children+0x4a
> > #14 0xffffffff808c51c0 at hdacc_attach+0x2f0
> > #15 0xffffffff80b92b9b at device_attach+0x45b
> > #16 0xffffffff80b93f0a at bus_attach_children+0x4a
> > #17 0xffffffff808c3e9d at hdac_attach2+0x35d
>
> I see this as well on a new system.  I think this is fallout from commit
> f3d3c63442fff.
>
> At a glance, the hdaa lock in question can't trivially be made
> sleepable, as it's also used to lock a callout handler,
> hdaa_jack_poll_callback(), and the lock itself is shared with the parent
> hdac device.
>
> Until that's fixed somehow, I suspect we should restore the M_NOWAIT
> usage.
>

I think that's right. One issue is that it's doing its own locking in
attach, but since we're not yet competing
for resources, that may be misplaced (I've not looked in detail, though). I
agree that reverting
this small part of the change would be warranted until we can sort out the
other issues with
newbus. While I'd like to transition to a topo lock for it, I know all the
difficulties that CAM has had
with that route. While it exists in a more hostile environment for things
coming and going, I think
that maybe jumping to some kind of epoch or smr approach for lifetime
management may be
better, though I've not thought though it in detail since ideally we'd do
it for newbus and then
move CAM's lifetime management into that same mechanism and radically
simplify the code there
which is a twisty maze of hacks to ensure things don't go away too soon
when its reference counting
fails to cover some weird edge case.

Warner


> > pcm0: <ATI R6xx (HDMI)> at nid 3 on hdaa0
> > pcm1: <ATI R6xx (HDMI)> at nid 5 on hdaa0
> > pcm2: <ATI R6xx (HDMI)> at nid 7 on hdaa0
> > pcm3: <ATI R6xx (HDMI)> at nid 9 on hdaa0
> > hdacc1: <Realtek ALC888 HDA CODEC> at cad 0 on hdac1
> > hdaa1: <Realtek ALC888 Audio Function Group> at nid 1 on hdacc1
> > uma_zalloc_debug: zone "malloc-64" with the following non-sleepable
> > locks held:
> > exclusive sleep mutex hdac1 (HDA driver mutex) r =3D 0
> > (0xfffff80107cb7a40) locked @ /usr/src/sys/dev/sound/pci/hda/hdaa.c:157=
1
> > stack backtrace:
> > #0 0xffffffff80bcbbac at witness_debugger+0x6c
> > #1 0xffffffff80bccdc0 at witness_warn+0x430
> > #2 0xffffffff80f00974 at uma_zalloc_debug+0x34
> > #3 0xffffffff80f004c7 at uma_zalloc_arg+0x27
> > #4 0xffffffff80b26a7d at malloc+0x7d
> > #5 0xffffffff80b2737d at realloc+0xed
> > #6 0xffffffff80b27432 at reallocf+0x12
> > #7 0xffffffff80b9238d at devclass_add_device+0x1cd
> > #8 0xffffffff80b9093b at make_device+0x10b
> > #9 0xffffffff80b9077d at device_add_child_ordered+0x2d
> > #10 0xffffffff808b2b2c at hdaa_configure+0x485c
> > #11 0xffffffff808ac5b4 at hdaa_attach+0x544
> > #12 0xffffffff80b92b9b at device_attach+0x45b
> > #13 0xffffffff80b93f0a at bus_attach_children+0x4a
> > #14 0xffffffff808c51c0 at hdacc_attach+0x2f0
> > #15 0xffffffff80b92b9b at device_attach+0x45b
> > #16 0xffffffff80b93f0a at bus_attach_children+0x4a
> > #17 0xffffffff808c3e9d at hdac_attach2+0x35d
> > pcm4: <Realtek ALC888 (Rear Analog 5.1/2.0)> at nid 20,22,21 and 24,26
> > on hdaa1
> > pcm5: <Realtek ALC888 (Front Analog)> at nid 27 and 25 on hdaa1
> > pcm6: <Realtek ALC888 (Internal Digital)> at nid 17 and 31 on hdaa1
> > pcm7: <Realtek ALC888 (Rear Digital)> at nid 30 on hdaa1
> >
> > Devices in question:
> >
> > hdac0@pci0:17:0:1:      class=3D0x040300 rev=3D0x00 hdr=3D0x00 vendor=
=3D0x1002
> > device=3D0x1640 subvendor=3D0x15d9 subdevice=3D0x1c97
> >     vendor     =3D 'Advanced Micro Devices, Inc. [AMD/ATI]'
> >     device     =3D 'Rembrandt Radeon High Definition Audio Controller'
> >     class      =3D multimedia
> >     subclass   =3D HDA
> > hdac1@pci0:17:0:6:      class=3D0x040300 rev=3D0x00 hdr=3D0x00 vendor=
=3D0x1022
> > device=3D0x15e3 subvendor=3D0x15d9 subdevice=3D0x1c97
> >     vendor     =3D 'Advanced Micro Devices, Inc. [AMD]'
> >     device     =3D 'Family 17h/19h HD Audio Controller'
> >     class      =3D multimedia
> >     subclass   =3D HDA
> >
>

--00000000000045ab17062a6b3e2a
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote g=
mail_quote_container"><div dir=3D"ltr" class=3D"gmail_attr">On Sat, Dec 28,=
 2024 at 9:27=E2=80=AFAM Mark Johnston &lt;<a href=3D"mailto:markj@freebsd.=
org">markj@freebsd.org</a>&gt; wrote:<br></div><blockquote class=3D"gmail_q=
uote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,2=
04);padding-left:1ex">On Fri, Dec 27, 2024 at 08:30:37PM +0700, Yuri Pankov=
 wrote:<br>
&gt; Getting the following debug notifications:<br>
&gt; <br>
&gt; hdacc0: &lt;ATI R6xx HDA CODEC&gt; at cad 0 on hdac0<br>
&gt; hdaa0: &lt;ATI R6xx Audio Function Group&gt; at nid 1 on hdacc0<br>
&gt; uma_zalloc_debug: zone &quot;malloc-32&quot; with the following non-sl=
eepable<br>
&gt; locks held:<br>
&gt; exclusive sleep mutex hdac0 (HDA driver mutex) r =3D 0<br>
&gt; (0xfffff80107cb7aa0) locked @ /usr/src/sys/dev/sound/pci/hda/hdaa.c:15=
71<br>
&gt; stack backtrace:<br>
&gt; #0 0xffffffff80bcbbac at witness_debugger+0x6c<br>
&gt; #1 0xffffffff80bccdc0 at witness_warn+0x430<br>
&gt; #2 0xffffffff80f00974 at uma_zalloc_debug+0x34<br>
&gt; #3 0xffffffff80f004c7 at uma_zalloc_arg+0x27<br>
&gt; #4 0xffffffff80b26a7d at malloc+0x7d<br>
&gt; #5 0xffffffff80b2737d at realloc+0xed<br>
&gt; #6 0xffffffff80b27432 at reallocf+0x12<br>
&gt; #7 0xffffffff80b9238d at devclass_add_device+0x1cd<br>
&gt; #8 0xffffffff80b9093b at make_device+0x10b<br>
&gt; #9 0xffffffff80b9077d at device_add_child_ordered+0x2d<br>
&gt; #10 0xffffffff808b2b2c at hdaa_configure+0x485c<br>
&gt; #11 0xffffffff808ac5b4 at hdaa_attach+0x544<br>
&gt; #12 0xffffffff80b92b9b at device_attach+0x45b<br>
&gt; #13 0xffffffff80b93f0a at bus_attach_children+0x4a<br>
&gt; #14 0xffffffff808c51c0 at hdacc_attach+0x2f0<br>
&gt; #15 0xffffffff80b92b9b at device_attach+0x45b<br>
&gt; #16 0xffffffff80b93f0a at bus_attach_children+0x4a<br>
&gt; #17 0xffffffff808c3e9d at hdac_attach2+0x35d<br>
<br>
I see this as well on a new system.=C2=A0 I think this is fallout from comm=
it<br>
f3d3c63442fff.<br>
<br>
At a glance, the hdaa lock in question can&#39;t trivially be made<br>
sleepable, as it&#39;s also used to lock a callout handler,<br>
hdaa_jack_poll_callback(), and the lock itself is shared with the parent<br=
>
hdac device.<br>
<br>
Until that&#39;s fixed somehow, I suspect we should restore the M_NOWAIT<br=
>
usage.<br></blockquote><div><br></div><div>I think that&#39;s right. One is=
sue is that it&#39;s doing its own locking in attach, but since we&#39;re n=
ot yet competing</div><div>for resources, that may be misplaced (I&#39;ve n=
ot looked in detail, though). I agree that reverting</div><div>this small p=
art of the change would be warranted until we can sort out the other issues=
 with</div><div>newbus. While I&#39;d like to transition to a topo lock for=
 it, I know all the difficulties that CAM has had</div><div>with that route=
. While it exists in a more hostile environment for things coming and going=
, I think</div><div>that maybe jumping to some kind of epoch or smr approac=
h for lifetime management may be</div><div>better, though I&#39;ve not thou=
ght though it in detail since ideally we&#39;d do it for newbus and then=C2=
=A0</div><div>move CAM&#39;s lifetime management into that same mechanism a=
nd radically simplify the code there</div><div>which is a twisty maze of ha=
cks to ensure things don&#39;t go away too soon when its reference counting=
</div><div>fails to cover some weird edge case.</div><div><br></div><div>Wa=
rner</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margi=
n:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex=
">
&gt; pcm0: &lt;ATI R6xx (HDMI)&gt; at nid 3 on hdaa0<br>
&gt; pcm1: &lt;ATI R6xx (HDMI)&gt; at nid 5 on hdaa0<br>
&gt; pcm2: &lt;ATI R6xx (HDMI)&gt; at nid 7 on hdaa0<br>
&gt; pcm3: &lt;ATI R6xx (HDMI)&gt; at nid 9 on hdaa0<br>
&gt; hdacc1: &lt;Realtek ALC888 HDA CODEC&gt; at cad 0 on hdac1<br>
&gt; hdaa1: &lt;Realtek ALC888 Audio Function Group&gt; at nid 1 on hdacc1<=
br>
&gt; uma_zalloc_debug: zone &quot;malloc-64&quot; with the following non-sl=
eepable<br>
&gt; locks held:<br>
&gt; exclusive sleep mutex hdac1 (HDA driver mutex) r =3D 0<br>
&gt; (0xfffff80107cb7a40) locked @ /usr/src/sys/dev/sound/pci/hda/hdaa.c:15=
71<br>
&gt; stack backtrace:<br>
&gt; #0 0xffffffff80bcbbac at witness_debugger+0x6c<br>
&gt; #1 0xffffffff80bccdc0 at witness_warn+0x430<br>
&gt; #2 0xffffffff80f00974 at uma_zalloc_debug+0x34<br>
&gt; #3 0xffffffff80f004c7 at uma_zalloc_arg+0x27<br>
&gt; #4 0xffffffff80b26a7d at malloc+0x7d<br>
&gt; #5 0xffffffff80b2737d at realloc+0xed<br>
&gt; #6 0xffffffff80b27432 at reallocf+0x12<br>
&gt; #7 0xffffffff80b9238d at devclass_add_device+0x1cd<br>
&gt; #8 0xffffffff80b9093b at make_device+0x10b<br>
&gt; #9 0xffffffff80b9077d at device_add_child_ordered+0x2d<br>
&gt; #10 0xffffffff808b2b2c at hdaa_configure+0x485c<br>
&gt; #11 0xffffffff808ac5b4 at hdaa_attach+0x544<br>
&gt; #12 0xffffffff80b92b9b at device_attach+0x45b<br>
&gt; #13 0xffffffff80b93f0a at bus_attach_children+0x4a<br>
&gt; #14 0xffffffff808c51c0 at hdacc_attach+0x2f0<br>
&gt; #15 0xffffffff80b92b9b at device_attach+0x45b<br>
&gt; #16 0xffffffff80b93f0a at bus_attach_children+0x4a<br>
&gt; #17 0xffffffff808c3e9d at hdac_attach2+0x35d<br>
&gt; pcm4: &lt;Realtek ALC888 (Rear Analog 5.1/2.0)&gt; at nid 20,22,21 and=
 24,26<br>
&gt; on hdaa1<br>
&gt; pcm5: &lt;Realtek ALC888 (Front Analog)&gt; at nid 27 and 25 on hdaa1<=
br>
&gt; pcm6: &lt;Realtek ALC888 (Internal Digital)&gt; at nid 17 and 31 on hd=
aa1<br>
&gt; pcm7: &lt;Realtek ALC888 (Rear Digital)&gt; at nid 30 on hdaa1<br>
&gt; <br>
&gt; Devices in question:<br>
&gt; <br>
&gt; hdac0@pci0:17:0:1:=C2=A0 =C2=A0 =C2=A0 class=3D0x040300 rev=3D0x00 hdr=
=3D0x00 vendor=3D0x1002<br>
&gt; device=3D0x1640 subvendor=3D0x15d9 subdevice=3D0x1c97<br>
&gt;=C2=A0 =C2=A0 =C2=A0vendor=C2=A0 =C2=A0 =C2=A0=3D &#39;Advanced Micro D=
evices, Inc. [AMD/ATI]&#39;<br>
&gt;=C2=A0 =C2=A0 =C2=A0device=C2=A0 =C2=A0 =C2=A0=3D &#39;Rembrandt Radeon=
 High Definition Audio Controller&#39;<br>
&gt;=C2=A0 =C2=A0 =C2=A0class=C2=A0 =C2=A0 =C2=A0 =3D multimedia<br>
&gt;=C2=A0 =C2=A0 =C2=A0subclass=C2=A0 =C2=A0=3D HDA<br>
&gt; hdac1@pci0:17:0:6:=C2=A0 =C2=A0 =C2=A0 class=3D0x040300 rev=3D0x00 hdr=
=3D0x00 vendor=3D0x1022<br>
&gt; device=3D0x15e3 subvendor=3D0x15d9 subdevice=3D0x1c97<br>
&gt;=C2=A0 =C2=A0 =C2=A0vendor=C2=A0 =C2=A0 =C2=A0=3D &#39;Advanced Micro D=
evices, Inc. [AMD]&#39;<br>
&gt;=C2=A0 =C2=A0 =C2=A0device=C2=A0 =C2=A0 =C2=A0=3D &#39;Family 17h/19h H=
D Audio Controller&#39;<br>
&gt;=C2=A0 =C2=A0 =C2=A0class=C2=A0 =C2=A0 =C2=A0 =3D multimedia<br>
&gt;=C2=A0 =C2=A0 =C2=A0subclass=C2=A0 =C2=A0=3D HDA<br>
&gt; <br>
</blockquote></div></div>

--00000000000045ab17062a6b3e2a--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfqyAgn=G_DS8EuXWpuUPyu5psSNXQ4Sta%2BTDHhHm0Pm-w>