Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 25 May 2024 10:34:32 +0200
From:      Alexander Leidinger <Alexander@Leidinger.net>
To:        Warner Losh <imp@bsdimp.com>
Cc:        Current <current@freebsd.org>, Alexander Motin <mav@freebsd.org>
Subject:   Re: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /..../sys/cam/nvme/nvme_da.c:469
Message-ID:  <4e7ebc2b51104ade3ee2a86859c9fb9a@Leidinger.net>
In-Reply-To: <d7138e8c2d6888cfe9ec73b76e6ae98b@Leidinger.net>
References:  <730565997ef678bbfe87d7861075edae@Leidinger.net> <CANCZdfo-k_ScVQY1MtOC2wUG4nCatbea9JwS7xzJc_OduVLyhA@mail.gmail.com> <d7138e8c2d6888cfe9ec73b76e6ae98b@Leidinger.net>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)

--=_b3d06bc92298dfaee154a8e512810367
Content-Type: multipart/alternative;
 boundary="=_42204c80d8c879abe751c11d81fb0a5b"

--=_42204c80d8c879abe751c11d81fb0a5b
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII;
 format=flowed

Am 2024-05-22 22:45, schrieb Alexander Leidinger:

> Am 2024-05-22 20:53, schrieb Warner Losh:
> 
>> First order:
>> 
>> Looks like we're trying to schedule a trim, but that fails due to a 
>> malloc issue. So then, since it's a
>> malloc issue, we wind up trying to automatically reschedule this I/O, 
>> which recurses into the driver
>> with a bad lock held and boop.
>> 
>> Can you reproduce this?
> 
> So far I had it once. At least I have only one crashdump. I had one 
> more reboot/crash, but no dump. I also have a watchdog running on this 
> system, so not sure what caused the (unusual) reboot. I had a poudriere 
> build running at both times. Since the crashdump I didn't run poudriere 
> anymore.
> 
>> If so, can you test this patch?
> 
> I give it a try tomorrow anyway, and I will try to stress the system 
> again with poudriere.
> 
> The nvme is a cache and also a log device for a zpool, so not really a 
> deterministic way to trigger access to it.

I've run a lot of poudriere builds together with other load (about 30 
jails with mysql, postgresql, redis, webmail, postfix, imap, java stuff, 
...) on this system since thursday. So far no panic in the nvme part.

Bye,
Alexander.

-- 
http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org    netchild@FreeBSD.org  : PGP 0x8F31830F9F2772BF
--=_42204c80d8c879abe751c11d81fb0a5b
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset=UTF-8

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; charset=
=3DUTF-8" /></head><body style=3D'font-size: 10pt; font-family: Verdana,Gen=
eva,sans-serif'>
<p id=3D"reply-intro">Am 2024-05-22 22:45, schrieb Alexander Leidinger:</p>
<blockquote type=3D"cite" style=3D"padding: 0 0.4em; border-left: #1010ff 2=
px solid; margin: 0">
<div id=3D"replybody1">
<div style=3D"font-size: 10pt; font-family: Verdana,Geneva,sans-serif;">
<p id=3D"v1reply-intro">Am 2024-05-22 20:53, schrieb Warner Losh:</p>
<blockquote style=3D"padding: 0 0.4em; border-left: #1010ff 2px solid; marg=
in: 0;">
<div id=3D"v1replybody1">
<div dir=3D"ltr">
<div dir=3D"ltr">
<div>First order:</div>
<div>&nbsp;</div>
<div>Looks like we're trying to schedule a trim, but that fails due to a ma=
lloc issue. So then, since it's a</div>
<div>malloc issue, we wind up trying to automatically reschedule this I/O, =
which recurses into the driver</div>
<div>with a bad lock held and boop.</div>
<div>&nbsp;</div>
<div>Can you reproduce this?</div>
</div>
</div>
</div>
</blockquote>
<div id=3D"v1replybody1">
<div dir=3D"ltr">
<div dir=3D"ltr">
<div>&nbsp;</div>
<div>So far I had it once. At least I have only one crashdump. I had one mo=
re reboot/crash, but no dump. I also have a watchdog running on this system=
, so not sure what caused the (unusual) reboot. I had a poudriere build run=
ning at both times. Since the crashdump I didn't run poudriere anymore.</di=
v>
<div>&nbsp;</div>
</div>
</div>
</div>
<blockquote style=3D"padding: 0 0.4em; border-left: #1010ff 2px solid; marg=
in: 0;">
<div id=3D"v1replybody1">
<div dir=3D"ltr">
<div dir=3D"ltr">
<div>If so, can you test this patch?</div>
</div>
</div>
</div>
</blockquote>
<div id=3D"v1replybody1">
<div dir=3D"ltr">
<div dir=3D"ltr">
<div>&nbsp;</div>
<div>I give it a try tomorrow anyway, and I will try to stress the system a=
gain with poudriere.</div>
</div>
</div>
</div>
<p>The nvme is a cache and also a log device for a zpool, so not really a d=
eterministic way to trigger access to it.</p>
</div>
</div>
</blockquote>
<p>I've run a lot of poudriere builds together with other load (about 30 ja=
ils with mysql, postgresql, redis, webmail, postfix, imap, java stuff, ...)=
 on this system since thursday. So far no panic in the nvme part.</p>
<p>Bye,<br />Alexander.</p>
<div id=3D"signature">-- <br />
<div class=3D"pre" style=3D"margin: 0; padding: 0; font-family: monospace">=
<a href=3D"http://www.Leidinger.net" target=3D"_blank" rel=3D"noopener nore=
ferrer">http://www.Leidinger.net</a>; <a href=3D"mailto:Alexander@Leidinger.=
net:">Alexander@Leidinger.net:</a> PGP 0x8F31830F9F2772BF<br /><a href=3D"h=
ttp://www.FreeBSD.org" target=3D"_blank" rel=3D"noopener noreferrer">http:/=
/www.FreeBSD.org</a> &nbsp; &nbsp;<a href=3D"mailto:netchild@FreeBSD.org">n=
etchild@FreeBSD.org</a> &nbsp;: PGP 0x8F31830F9F2772BF</div>
</div>
</body></html>

--=_42204c80d8c879abe751c11d81fb0a5b--


--=_b3d06bc92298dfaee154a8e512810367
Content-Type: application/pgp-signature;
 name=signature.asc
Content-Disposition: attachment;
 filename=signature.asc;
 size=833
Content-Description: OpenPGP digital signature

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEER9UlYXp1PSd08nWXEg2wmwP42IYFAmZRoqoACgkQEg2wmwP4
2IZxsw/9EG+rs+xTeo+5EBG2di3Z5isg9XgVzkasK34f7kQlFZxmJlc7V1BD9gW9
TbuS9radXY2BFq/v+iEdyA1vXkj3fNSz+4jx4NkghkH5FZqDQ84arTPgp1siK/vl
ENzha3d720dOCGcTu+z428sF9ykiDwHAXeymuCFcsFuogf4ARh4wmU76An/BwL2H
yOfbf78DY4+Z5ZKxD3nNDzgN5vX5hf2WirOmZtfCD73ukiPsJr7htUaOguxYp2ur
wL1+rIfgyI3XyFjrPq9YlGiqTEQX8/u0gj2kRT27saVPmzDU6dyita8KH4UbqGfv
8r4fHAjSm06bkXZU8RPOD8OvIyZgLDqX/sZlBDdImvB77x3wy1Qskg7pOPbDFZK+
vDz2kcuW62zmavTZCgULcNxW39Ond50aae3jO9zhj9Cksw2AHFeqycWzl7LVCNPj
bmfEfUmYk6LbCkfOVlsHp9Gt56XKSoozumGOurqAWG+FnVUr0hpBBcz654nDaydd
NZLGVvubu6m80on1ICux6GuY6f/E8q2dljmKbluKflGlyVBKXt53os6PUR4oqiB4
5lRlAKFff/saY3DkXAR7V/Dw8elr2ZlWIeOnulqpO2S4OB8JoP8/KiDskLILrQo5
HIz5KPauEGdwU28d4iIGMMVXk17xqJZ9YnpDBR7mejl17VEo9pQ=
=0OL2
-----END PGP SIGNATURE-----

--=_b3d06bc92298dfaee154a8e512810367--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4e7ebc2b51104ade3ee2a86859c9fb9a>