Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 22 May 2024 12:53:04 -0600
From:      Warner Losh <imp@bsdimp.com>
To:        Alexander Leidinger <Alexander@leidinger.net>
Cc:        Current <current@freebsd.org>, Alexander Motin <mav@freebsd.org>
Subject:   Re: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /..../sys/cam/nvme/nvme_da.c:469
Message-ID:  <CANCZdfo-k_ScVQY1MtOC2wUG4nCatbea9JwS7xzJc_OduVLyhA@mail.gmail.com>
In-Reply-To: <730565997ef678bbfe87d7861075edae@Leidinger.net>
References:  <730565997ef678bbfe87d7861075edae@Leidinger.net>

next in thread | previous in thread | raw e-mail | index | archive | help
--0000000000008455b106190f7288
Content-Type: multipart/alternative; boundary="0000000000008455ae06190f7286"

--0000000000008455ae06190f7286
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

First order:

Looks like we're trying to schedule a trim, but that fails due to a malloc
issue. So then, since it's a
malloc issue, we wind up trying to automatically reschedule this I/O, which
recurses into the driver
with a bad lock held and boop.

Can you reproduce this?

If so, can you test this patch?

diff --git a/sys/cam/nvme/nvme_da.c b/sys/cam/nvme/nvme_da.c
index 3f6cf8702870..357e612200e9 100644
--- a/sys/cam/nvme/nvme_da.c
+++ b/sys/cam/nvme/nvme_da.c
@@ -1077,7 +1077,9 @@ ndastart(struct cam_periph *periph, union ccb
*start_ccb)

                        trim =3D malloc(sizeof(*trim), M_NVMEDA, M_ZERO |
M_NOWAIT);
                        if (trim =3D=3D NULL) {
+                         cam_periph_unlock(periph);
                                biofinish(bp, NULL, ENOMEM);
+                         cam_periph_lock(periph);
                                xpt_release_ccb(start_ccb);
                                ndaschedule(periph);
                                return;

(the mailer may mangle it, so I've also attached it in case people want to
comment on this).

The logic here is that we have special logic in the ENOMEM case that will
recursively
call the start routine, which calls the scheduler which expects to be able
to take out the
periph lock. But it can't, because it's already locked. It also invokes a
pacing protocol that
slows down things even more.

What I'm now not sure about is whether or not we need to just release
start_ccb or if we also need to call ndaschedule too here. Seems like we
might not want to (but it's a safe nop if not). I've cc'd mav@ to see if he
has opinions on what's going on.

Warner


On Wed, May 22, 2024 at 11:22=E2=80=AFAM Alexander Leidinger <
Alexander@leidinger.net> wrote:

> Hi,
>
> I got the panic in $Subject. Anyone an idea?
>
> Complete crashlog available at https://wiki.leidinger.net/core.txt.6
> (1.2 MB)
>
> Short version:
> ---snip---
> [11417] KDB: stack backtrace:
> [11417] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfffffe043133f830
> [11417] vpanic() at vpanic+0x13f/frame 0xfffffe043133f960
> [11417] panic() at panic+0x43/frame 0xfffffe043133f9c0
> [11417] __mtx_lock_sleep() at __mtx_lock_sleep+0x491/frame
> 0xfffffe043133fa50
> [11417] __mtx_lock_flags() at __mtx_lock_flags+0x9c/frame
> 0xfffffe043133fa70
> [11417] ndastrategy() at ndastrategy+0x3c/frame 0xfffffe043133faa0
> [11417] g_disk_start() at g_disk_start+0x569/frame 0xfffffe043133fb00
> [11417] g_io_request() at g_io_request+0x2b6/frame 0xfffffe043133fb30
> [11417] g_io_deliver() at g_io_deliver+0x1cc/frame 0xfffffe043133fb80
> [11417] g_disk_done() at g_disk_done+0xee/frame 0xfffffe043133fbc0
> [11417] ndastart() at ndastart+0x4a3/frame 0xfffffe043133fc20
> [11417] xpt_run_allocq() at xpt_run_allocq+0xa5/frame 0xfffffe043133fc70
> [11417] ndastrategy() at ndastrategy+0x6d/frame 0xfffffe043133fca0
> [11417] g_disk_start() at g_disk_start+0x569/frame 0xfffffe043133fd00
> [11417] g_io_request() at g_io_request+0x2b6/frame 0xfffffe043133fd30
> [11417] g_io_request() at g_io_request+0x2b6/frame 0xfffffe043133fd60
> [11417] g_io_request() at g_io_request+0x2b6/frame 0xfffffe043133fd90
> [11417] vdev_geom_io_start() at vdev_geom_io_start+0x257/frame
> 0xfffffe043133fdc0
> [11417] zio_vdev_io_start() at zio_vdev_io_start+0x321/frame
> 0xfffffe043133fe10
> [11417] zio_execute() at zio_execute+0x78/frame 0xfffffe043133fe40
> [11417] taskqueue_run_locked() at taskqueue_run_locked+0x1c7/frame
> 0xfffffe043133fec0
> [11417] taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame
> 0xfffffe043133fef0
> ---snip---
>
> This is with a world from 2024-05-17-084543.
>
> Bye,
> Alexander.
>
> --
> http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF
> http://www.FreeBSD.org    netchild@FreeBSD.org  : PGP 0x8F31830F9F2772BF
>

--0000000000008455ae06190f7286
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><div>First order: <br></div><div><br></di=
v><div>Looks like we&#39;re trying to schedule a trim, but that fails due t=
o a malloc issue. So then, since it&#39;s a</div><div>malloc issue, we wind=
 up trying to automatically reschedule this I/O, which recurses into the dr=
iver</div><div>with a bad lock held and boop.</div><div><br></div><div>Can =
you reproduce this?</div><div><br></div><div>If so, can you test this patch=
?</div><div><br></div><div>diff --git a/sys/cam/nvme/nvme_da.c b/sys/cam/nv=
me/nvme_da.c<br>index 3f6cf8702870..357e612200e9 100644<br>--- a/sys/cam/nv=
me/nvme_da.c<br>+++ b/sys/cam/nvme/nvme_da.c<br>@@ -1077,7 +1077,9 @@ ndast=
art(struct cam_periph *periph, union ccb *start_ccb)<br>=C2=A0<br>=C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 trim =3D malloc(sizeof(*trim), M_NVMEDA, M_ZERO | M_NOWAIT);<br>=C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 if (trim =3D=3D NULL) {<br>+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 cam_periph_unlock(periph);<br>=C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 biofinish(bp, NULL, ENOMEM);<br>+ =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 cam_periph_lock(periph);<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 xpt_r=
elease_ccb(start_ccb);<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 ndasche=
dule(periph);<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return;</div><div><=
br></div><div>(the mailer may mangle it, so I&#39;ve also attached it in ca=
se people want to comment on this).</div><div><br></div><div>The logic here=
 is that we have special logic in the ENOMEM case that will recursively</di=
v><div>call the start routine, which calls the scheduler which expects to b=
e able to take out the</div><div>periph lock. But it can&#39;t, because it&=
#39;s already locked. It also invokes a pacing protocol that</div><div>slow=
s down things even more.</div><div><br></div><div>What I&#39;m now not sure=
 about is whether or not we need to just release start_ccb or if we also ne=
ed to call ndaschedule too here. Seems like we might not want to (but it&#3=
9;s a safe nop if not). I&#39;ve cc&#39;d mav@ to see if he has opinions on=
 what&#39;s going on.<br></div><div><br></div><div>Warner</div><div><br></d=
iv></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_att=
r">On Wed, May 22, 2024 at 11:22=E2=80=AFAM Alexander Leidinger &lt;<a href=
=3D"mailto:Alexander@leidinger.net">Alexander@leidinger.net</a>&gt; wrote:<=
br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8e=
x;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>
<br>
I got the panic in $Subject. Anyone an idea?<br>
<br>
Complete crashlog available at <a href=3D"https://wiki.leidinger.net/core.t=
xt.6" rel=3D"noreferrer" target=3D"_blank">https://wiki.leidinger.net/core.=
txt.6</a> <br>
(1.2 MB)<br>
<br>
Short version:<br>
---snip---<br>
[11417] KDB: stack backtrace:<br>
[11417] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame <br>
0xfffffe043133f830<br>
[11417] vpanic() at vpanic+0x13f/frame 0xfffffe043133f960<br>
[11417] panic() at panic+0x43/frame 0xfffffe043133f9c0<br>
[11417] __mtx_lock_sleep() at __mtx_lock_sleep+0x491/frame <br>
0xfffffe043133fa50<br>
[11417] __mtx_lock_flags() at __mtx_lock_flags+0x9c/frame <br>
0xfffffe043133fa70<br>
[11417] ndastrategy() at ndastrategy+0x3c/frame 0xfffffe043133faa0<br>
[11417] g_disk_start() at g_disk_start+0x569/frame 0xfffffe043133fb00<br>
[11417] g_io_request() at g_io_request+0x2b6/frame 0xfffffe043133fb30<br>
[11417] g_io_deliver() at g_io_deliver+0x1cc/frame 0xfffffe043133fb80<br>
[11417] g_disk_done() at g_disk_done+0xee/frame 0xfffffe043133fbc0<br>
[11417] ndastart() at ndastart+0x4a3/frame 0xfffffe043133fc20<br>
[11417] xpt_run_allocq() at xpt_run_allocq+0xa5/frame 0xfffffe043133fc70<br=
>
[11417] ndastrategy() at ndastrategy+0x6d/frame 0xfffffe043133fca0<br>
[11417] g_disk_start() at g_disk_start+0x569/frame 0xfffffe043133fd00<br>
[11417] g_io_request() at g_io_request+0x2b6/frame 0xfffffe043133fd30<br>
[11417] g_io_request() at g_io_request+0x2b6/frame 0xfffffe043133fd60<br>
[11417] g_io_request() at g_io_request+0x2b6/frame 0xfffffe043133fd90<br>
[11417] vdev_geom_io_start() at vdev_geom_io_start+0x257/frame <br>
0xfffffe043133fdc0<br>
[11417] zio_vdev_io_start() at zio_vdev_io_start+0x321/frame <br>
0xfffffe043133fe10<br>
[11417] zio_execute() at zio_execute+0x78/frame 0xfffffe043133fe40<br>
[11417] taskqueue_run_locked() at taskqueue_run_locked+0x1c7/frame <br>
0xfffffe043133fec0<br>
[11417] taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame <br>
0xfffffe043133fef0<br>
---snip---<br>
<br>
This is with a world from 2024-05-17-084543.<br>
<br>
Bye,<br>
Alexander.<br>
<br>
-- <br>
<a href=3D"http://www.Leidinger.net" rel=3D"noreferrer" target=3D"_blank">h=
ttp://www.Leidinger.net</a> Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF=
<br>
<a href=3D"http://www.FreeBSD.org" rel=3D"noreferrer" target=3D"_blank">htt=
p://www.FreeBSD.org</a>=C2=A0 =C2=A0 netchild@FreeBSD.org=C2=A0 : PGP 0x8F3=
1830F9F2772BF<br>
</blockquote></div></div>

--0000000000008455ae06190f7286--
--0000000000008455b106190f7288
Content-Type: text/x-patch; charset="US-ASCII"; name="nda.diff"
Content-Disposition: attachment; filename="nda.diff"
Content-Transfer-Encoding: base64
Content-ID: <f_lwi6hnts0>
X-Attachment-Id: f_lwi6hnts0

ZGlmZiAtLWdpdCBhL3N5cy9jYW0vbnZtZS9udm1lX2RhLmMgYi9zeXMvY2FtL252bWUvbnZtZV9k
YS5jCmluZGV4IDNmNmNmODcwMjg3MC4uMzU3ZTYxMjIwMGU5IDEwMDY0NAotLS0gYS9zeXMvY2Ft
L252bWUvbnZtZV9kYS5jCisrKyBiL3N5cy9jYW0vbnZtZS9udm1lX2RhLmMKQEAgLTEwNzcsNyAr
MTA3Nyw5IEBAIG5kYXN0YXJ0KHN0cnVjdCBjYW1fcGVyaXBoICpwZXJpcGgsIHVuaW9uIGNjYiAq
c3RhcnRfY2NiKQogCiAJCQl0cmltID0gbWFsbG9jKHNpemVvZigqdHJpbSksIE1fTlZNRURBLCBN
X1pFUk8gfCBNX05PV0FJVCk7CiAJCQlpZiAodHJpbSA9PSBOVUxMKSB7CisJCQkJY2FtX3Blcmlw
aF91bmxvY2socGVyaXBoKTsKIAkJCQliaW9maW5pc2goYnAsIE5VTEwsIEVOT01FTSk7CisJCQkJ
Y2FtX3BlcmlwaF9sb2NrKHBlcmlwaCk7CiAJCQkJeHB0X3JlbGVhc2VfY2NiKHN0YXJ0X2NjYik7
CiAJCQkJbmRhc2NoZWR1bGUocGVyaXBoKTsKIAkJCQlyZXR1cm47Cg==
--0000000000008455b106190f7288--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfo-k_ScVQY1MtOC2wUG4nCatbea9JwS7xzJc_OduVLyhA>