From nobody Sun Dec 29 16:32:23 2024 X-Original-To: current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4YLlBr6mq0z5jPXB for ; Sun, 29 Dec 2024 16:32:36 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-pj1-x102a.google.com (mail-pj1-x102a.google.com [IPv6:2607:f8b0:4864:20::102a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4YLlBr14Czz3y4b for ; Sun, 29 Dec 2024 16:32:36 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=bsdimp-com.20230601.gappssmtp.com header.s=20230601 header.b=0p4nOv+5; spf=none (mx1.freebsd.org: domain of wlosh@bsdimp.com has no SPF policy when checking 2607:f8b0:4864:20::102a) smtp.mailfrom=wlosh@bsdimp.com; dmarc=none Received: by mail-pj1-x102a.google.com with SMTP id 98e67ed59e1d1-2eec9b3a1bbso8585314a91.3 for ; Sun, 29 Dec 2024 08:32:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20230601.gappssmtp.com; s=20230601; t=1735489955; x=1736094755; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=VlRzhgfh8YXWOJwU4qAR/eT1m5PNsG+raQxzg0ygEzQ=; b=0p4nOv+5QT/ezELefWLz5+2XwA4nvn1sm8czVqdaoaGfV5wt07H+r3zQjqb6r62ljy yE78GF53OjSJpS7Z7ckdiigmEs7+DGpA5mE8MNVswNJo+e8JSfh6Yb6166RfdFxW5NGj WT4SGHyKsAe9iezldSwzWRs8pwZA/E/pRtpYhnRv3NRLppzoCrNxjXTq5bjVk1OdfChy FNMeNP9Pu6isS1/EN/eTdfT2P9Yon76TIdk7S6ys15FXnXjjKDxo/u4WHlDJvaCXtsNA U26aSL0d1KgGoOlaBtNPI8tg4QuXv+41zPmi2YuFCgxWETGe1xM5DF/RQtyk/Z8i6zY4 x8nQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735489955; x=1736094755; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=VlRzhgfh8YXWOJwU4qAR/eT1m5PNsG+raQxzg0ygEzQ=; b=NUqhrlOYIpmpGHdcyA2OdUHh3ixo5yzeRbjZE0Ptg1BxryUazylhWuKUtMKho3K+2V FAV/nBYMERbGvx09XXT3G5UIu05ieDgPWwOwWNHDnuDWMAMifUnPrDT8e4X+Lr3Bq/lQ 1qpgi/myW+PPb0a7sl6XwnykVq+V9jp5PCsUyCowMeACaYIKGGbIHFSjp+QgrI21LoL4 Xm7TG++VecRU32kDyq9I6JARUzSKsi9Nr4hUV+ebJCevI7QZOo9YgP0DTN58+2DVkpM2 0Ys7OLQ0TX93/qP4cleJMbuEq2/zVspQ+wxNwbdxJvymOYH7dM0qpNa5Wrg2sciipLE4 A0og== X-Forwarded-Encrypted: i=1; AJvYcCVwbNAmmnNQCcGfe921lJb3fuzV8eT0M+jUcmw/NrejiLUx/s090PzUQOKmFbVIeabhkqTcsfsH@freebsd.org X-Gm-Message-State: AOJu0Yx3QO2s+eplkAD90GPbbPZp+2dYq7bS5JoNe88VDrgpbN7OYgwm zo4Vi35rZth9aSqX6uE9kfuXXSKDC3GKRhjsQ+i/8l4QSjFOBrNguf/jjTllx5hCXbSlfbm1qWG 0rbt4UGQV+68BhEFmgKvBPIa0iZC4rX+sM5INog== X-Gm-Gg: ASbGncvmdUtOBDpe/WQ0C5BLR+0UZO1lqlt5SOmCtQsa+NFBU9iGrjIIDKibSCdURT4 aQAXqInloWmbIle4J0ZS3HTsTAfmRe0tSpHbTqg== X-Google-Smtp-Source: AGHT+IEIuSmTQ/N1B82s3t5B6FFBDyithx62zR178iLSdbE9jZbDxHyteDWcY7xnQD6I7QWCRU81yhYrbSjSsSRGZ1Y= X-Received: by 2002:a17:90b:3d44:b0:2ee:e518:c1d8 with SMTP id 98e67ed59e1d1-2f452eb3268mr52193116a91.30.1735489954877; Sun, 29 Dec 2024 08:32:34 -0800 (PST) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@FreeBSD.org MIME-Version: 1.0 References: <6901050e-c6bc-4347-a0d4-98e1de94b005@aetern.org> In-Reply-To: From: Warner Losh Date: Sun, 29 Dec 2024 09:32:23 -0700 Message-ID: Subject: Re: hdaa: uma_zalloc_debug: zone "malloc-{32,64}" with the following non-sleepable locks held To: Mark Johnston Cc: Yuri Pankov , current@freebsd.org, imp@freebsd.org, jhb@freebsd.org Content-Type: multipart/alternative; boundary="00000000000045ab17062a6b3e2a" X-Spamd-Result: default: False [-3.00 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FORGED_SENDER(0.30)[imp@bsdimp.com,wlosh@bsdimp.com]; R_DKIM_ALLOW(-0.20)[bsdimp-com.20230601.gappssmtp.com:s=20230601]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; RCVD_COUNT_ONE(0.00)[1]; MIME_TRACE(0.00)[0:+,1:+,2:~]; ARC_NA(0.00)[]; TO_DN_SOME(0.00)[]; MISSING_XM_UA(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::102a:from]; DMARC_NA(0.00)[bsdimp.com]; MLMMJ_DEST(0.00)[current@freebsd.org]; FROM_NEQ_ENVFROM(0.00)[imp@bsdimp.com,wlosh@bsdimp.com]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_FIVE(0.00)[5]; R_SPF_NA(0.00)[no SPF record]; PREVIOUSLY_DELIVERED(0.00)[current@freebsd.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_TLS_LAST(0.00)[]; DKIM_TRACE(0.00)[bsdimp-com.20230601.gappssmtp.com:+] X-Rspamd-Queue-Id: 4YLlBr14Czz3y4b X-Spamd-Bar: -- --00000000000045ab17062a6b3e2a Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sat, Dec 28, 2024 at 9:27=E2=80=AFAM Mark Johnston w= rote: > On Fri, Dec 27, 2024 at 08:30:37PM +0700, Yuri Pankov wrote: > > Getting the following debug notifications: > > > > hdacc0: at cad 0 on hdac0 > > hdaa0: at nid 1 on hdacc0 > > uma_zalloc_debug: zone "malloc-32" with the following non-sleepable > > locks held: > > exclusive sleep mutex hdac0 (HDA driver mutex) r =3D 0 > > (0xfffff80107cb7aa0) locked @ /usr/src/sys/dev/sound/pci/hda/hdaa.c:157= 1 > > stack backtrace: > > #0 0xffffffff80bcbbac at witness_debugger+0x6c > > #1 0xffffffff80bccdc0 at witness_warn+0x430 > > #2 0xffffffff80f00974 at uma_zalloc_debug+0x34 > > #3 0xffffffff80f004c7 at uma_zalloc_arg+0x27 > > #4 0xffffffff80b26a7d at malloc+0x7d > > #5 0xffffffff80b2737d at realloc+0xed > > #6 0xffffffff80b27432 at reallocf+0x12 > > #7 0xffffffff80b9238d at devclass_add_device+0x1cd > > #8 0xffffffff80b9093b at make_device+0x10b > > #9 0xffffffff80b9077d at device_add_child_ordered+0x2d > > #10 0xffffffff808b2b2c at hdaa_configure+0x485c > > #11 0xffffffff808ac5b4 at hdaa_attach+0x544 > > #12 0xffffffff80b92b9b at device_attach+0x45b > > #13 0xffffffff80b93f0a at bus_attach_children+0x4a > > #14 0xffffffff808c51c0 at hdacc_attach+0x2f0 > > #15 0xffffffff80b92b9b at device_attach+0x45b > > #16 0xffffffff80b93f0a at bus_attach_children+0x4a > > #17 0xffffffff808c3e9d at hdac_attach2+0x35d > > I see this as well on a new system. I think this is fallout from commit > f3d3c63442fff. > > At a glance, the hdaa lock in question can't trivially be made > sleepable, as it's also used to lock a callout handler, > hdaa_jack_poll_callback(), and the lock itself is shared with the parent > hdac device. > > Until that's fixed somehow, I suspect we should restore the M_NOWAIT > usage. > I think that's right. One issue is that it's doing its own locking in attach, but since we're not yet competing for resources, that may be misplaced (I've not looked in detail, though). I agree that reverting this small part of the change would be warranted until we can sort out the other issues with newbus. While I'd like to transition to a topo lock for it, I know all the difficulties that CAM has had with that route. While it exists in a more hostile environment for things coming and going, I think that maybe jumping to some kind of epoch or smr approach for lifetime management may be better, though I've not thought though it in detail since ideally we'd do it for newbus and then move CAM's lifetime management into that same mechanism and radically simplify the code there which is a twisty maze of hacks to ensure things don't go away too soon when its reference counting fails to cover some weird edge case. Warner > > pcm0: at nid 3 on hdaa0 > > pcm1: at nid 5 on hdaa0 > > pcm2: at nid 7 on hdaa0 > > pcm3: at nid 9 on hdaa0 > > hdacc1: at cad 0 on hdac1 > > hdaa1: at nid 1 on hdacc1 > > uma_zalloc_debug: zone "malloc-64" with the following non-sleepable > > locks held: > > exclusive sleep mutex hdac1 (HDA driver mutex) r =3D 0 > > (0xfffff80107cb7a40) locked @ /usr/src/sys/dev/sound/pci/hda/hdaa.c:157= 1 > > stack backtrace: > > #0 0xffffffff80bcbbac at witness_debugger+0x6c > > #1 0xffffffff80bccdc0 at witness_warn+0x430 > > #2 0xffffffff80f00974 at uma_zalloc_debug+0x34 > > #3 0xffffffff80f004c7 at uma_zalloc_arg+0x27 > > #4 0xffffffff80b26a7d at malloc+0x7d > > #5 0xffffffff80b2737d at realloc+0xed > > #6 0xffffffff80b27432 at reallocf+0x12 > > #7 0xffffffff80b9238d at devclass_add_device+0x1cd > > #8 0xffffffff80b9093b at make_device+0x10b > > #9 0xffffffff80b9077d at device_add_child_ordered+0x2d > > #10 0xffffffff808b2b2c at hdaa_configure+0x485c > > #11 0xffffffff808ac5b4 at hdaa_attach+0x544 > > #12 0xffffffff80b92b9b at device_attach+0x45b > > #13 0xffffffff80b93f0a at bus_attach_children+0x4a > > #14 0xffffffff808c51c0 at hdacc_attach+0x2f0 > > #15 0xffffffff80b92b9b at device_attach+0x45b > > #16 0xffffffff80b93f0a at bus_attach_children+0x4a > > #17 0xffffffff808c3e9d at hdac_attach2+0x35d > > pcm4: at nid 20,22,21 and 24,26 > > on hdaa1 > > pcm5: at nid 27 and 25 on hdaa1 > > pcm6: at nid 17 and 31 on hdaa1 > > pcm7: at nid 30 on hdaa1 > > > > Devices in question: > > > > hdac0@pci0:17:0:1: class=3D0x040300 rev=3D0x00 hdr=3D0x00 vendor= =3D0x1002 > > device=3D0x1640 subvendor=3D0x15d9 subdevice=3D0x1c97 > > vendor =3D 'Advanced Micro Devices, Inc. [AMD/ATI]' > > device =3D 'Rembrandt Radeon High Definition Audio Controller' > > class =3D multimedia > > subclass =3D HDA > > hdac1@pci0:17:0:6: class=3D0x040300 rev=3D0x00 hdr=3D0x00 vendor= =3D0x1022 > > device=3D0x15e3 subvendor=3D0x15d9 subdevice=3D0x1c97 > > vendor =3D 'Advanced Micro Devices, Inc. [AMD]' > > device =3D 'Family 17h/19h HD Audio Controller' > > class =3D multimedia > > subclass =3D HDA > > > --00000000000045ab17062a6b3e2a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Sat, Dec 28,= 2024 at 9:27=E2=80=AFAM Mark Johnston <markj@freebsd.org> wrote:
On Fri, Dec 27, 2024 at 08:30:37PM +0700, Yuri Pankov= wrote:
> Getting the following debug notifications:
>
> hdacc0: <ATI R6xx HDA CODEC> at cad 0 on hdac0
> hdaa0: <ATI R6xx Audio Function Group> at nid 1 on hdacc0
> uma_zalloc_debug: zone "malloc-32" with the following non-sl= eepable
> locks held:
> exclusive sleep mutex hdac0 (HDA driver mutex) r =3D 0
> (0xfffff80107cb7aa0) locked @ /usr/src/sys/dev/sound/pci/hda/hdaa.c:15= 71
> stack backtrace:
> #0 0xffffffff80bcbbac at witness_debugger+0x6c
> #1 0xffffffff80bccdc0 at witness_warn+0x430
> #2 0xffffffff80f00974 at uma_zalloc_debug+0x34
> #3 0xffffffff80f004c7 at uma_zalloc_arg+0x27
> #4 0xffffffff80b26a7d at malloc+0x7d
> #5 0xffffffff80b2737d at realloc+0xed
> #6 0xffffffff80b27432 at reallocf+0x12
> #7 0xffffffff80b9238d at devclass_add_device+0x1cd
> #8 0xffffffff80b9093b at make_device+0x10b
> #9 0xffffffff80b9077d at device_add_child_ordered+0x2d
> #10 0xffffffff808b2b2c at hdaa_configure+0x485c
> #11 0xffffffff808ac5b4 at hdaa_attach+0x544
> #12 0xffffffff80b92b9b at device_attach+0x45b
> #13 0xffffffff80b93f0a at bus_attach_children+0x4a
> #14 0xffffffff808c51c0 at hdacc_attach+0x2f0
> #15 0xffffffff80b92b9b at device_attach+0x45b
> #16 0xffffffff80b93f0a at bus_attach_children+0x4a
> #17 0xffffffff808c3e9d at hdac_attach2+0x35d

I see this as well on a new system.=C2=A0 I think this is fallout from comm= it
f3d3c63442fff.

At a glance, the hdaa lock in question can't trivially be made
sleepable, as it's also used to lock a callout handler,
hdaa_jack_poll_callback(), and the lock itself is shared with the parent hdac device.

Until that's fixed somehow, I suspect we should restore the M_NOWAIT usage.

I think that's right. One is= sue is that it's doing its own locking in attach, but since we're n= ot yet competing
for resources, that may be misplaced (I've n= ot looked in detail, though). I agree that reverting
this small p= art of the change would be warranted until we can sort out the other issues= with
newbus. While I'd like to transition to a topo lock for= it, I know all the difficulties that CAM has had
with that route= . While it exists in a more hostile environment for things coming and going= , I think
that maybe jumping to some kind of epoch or smr approac= h for lifetime management may be
better, though I've not thou= ght though it in detail since ideally we'd do it for newbus and then=C2= =A0
move CAM's lifetime management into that same mechanism a= nd radically simplify the code there
which is a twisty maze of ha= cks to ensure things don't go away too soon when its reference counting=
fails to cover some weird edge case.

Wa= rner
=C2=A0