Date: Mon, 12 Oct 2020 15:42:52 -0700 From: Xin Li <delphij@delphij.net> To: Warner Losh <imp@bsdimp.com>, Alexander Motin <mav@freebsd.org> Cc: Xin LI <d@delphij.net>, FreeBSD Current <freebsd-current@freebsd.org>, Warner Losh <imp@freebsd.org> Subject: Re: GPF on boot with devmatch Message-ID: <0ab2fc9a-3f60-c375-03ee-1f10c32acc2f@delphij.net> In-Reply-To: <CANCZdfqGzeQFspsnNbAc2PUd0_JiEjSzmzeHqOZKZ=twr-Go3Q@mail.gmail.com> References: <02fa309e-9467-f741-8092-974bfc145c9a@FreeBSD.org> <CANCZdfp_djyU_-UkRHy1eZEu_XLekc%2BYuA2=9k74=rbJFR3S0A@mail.gmail.com> <5e4f0439-08fa-7715-7672-05793d05cc6d@FreeBSD.org> <CANCZdfqGzeQFspsnNbAc2PUd0_JiEjSzmzeHqOZKZ=twr-Go3Q@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --4h3QLtY5N89prGwiclIaknmUEBbV7jjfY Content-Type: multipart/mixed; boundary="oKbvK48qPVSnnoqk5JOc2KVAOQJnS4ZYK"; protected-headers="v1" From: Xin Li <delphij@delphij.net> Reply-To: d@delphij.net To: Warner Losh <imp@bsdimp.com>, Alexander Motin <mav@freebsd.org> Cc: Xin LI <d@delphij.net>, FreeBSD Current <freebsd-current@freebsd.org>, Warner Losh <imp@freebsd.org> Message-ID: <0ab2fc9a-3f60-c375-03ee-1f10c32acc2f@delphij.net> Subject: Re: GPF on boot with devmatch References: <02fa309e-9467-f741-8092-974bfc145c9a@FreeBSD.org> <CANCZdfp_djyU_-UkRHy1eZEu_XLekc+YuA2=9k74=rbJFR3S0A@mail.gmail.com> <5e4f0439-08fa-7715-7672-05793d05cc6d@FreeBSD.org> <CANCZdfqGzeQFspsnNbAc2PUd0_JiEjSzmzeHqOZKZ=twr-Go3Q@mail.gmail.com> In-Reply-To: <CANCZdfqGzeQFspsnNbAc2PUd0_JiEjSzmzeHqOZKZ=twr-Go3Q@mail.gmail.com> --oKbvK48qPVSnnoqk5JOc2KVAOQJnS4ZYK Content-Type: multipart/mixed; boundary="------------ED3A605B99096EC90529902D" Content-Language: en-US This is a multi-part message in MIME format. --------------ED3A605B99096EC90529902D Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 10/12/20 11:13, Warner Losh wrote: >=20 >=20 > On Mon, Oct 5, 2020 at 3:39 PM Alexander Motin <mav@freebsd.org > <mailto:mav@freebsd.org>> wrote: >=20 > On 05.10.2020 17:20, Warner Losh wrote: > > On Mon, Oct 5, 2020 at 12:36 PM Alexander Motin <mav@freebsd.org > <mailto:mav@freebsd.org> > > <mailto:mav@freebsd.org <mailto:mav@freebsd.org>>> wrote: > > > >=C2=A0 =C2=A0 =C2=A0I can add that we've received report about ide= ntical panic on > FreeBSD > >=C2=A0 =C2=A0 =C2=A0releng/12.2 of r365436, AKA TrueNAS 12.0-RC1: > >=C2=A0 =C2=A0 =C2=A0https://jira.ixsystems.com/browse/NAS-107578 > <https://jira.ixsystems.com/browse/NAS-107578> .=C2=A0 So it looks = a) pretty > >=C2=A0 =C2=A0 =C2=A0rate (one report from thousands of early adopt= ers and none in > our lab), > >=C2=A0 =C2=A0 =C2=A0and b) it is in stable/12 too, not only head. > > > > Thanks! I'll see if I can recreate here....=C2=A0 But we're acces= sing the > > sysctl tree from devmatch to get some information, which should a= lways > > be OK (the fact that it isn't suggests=C2=A0either a bug in some = driver > > leaving bad pointers, or some race or both)...=C2=A0 It would be = nice > to know > > which nodes they were, or to have a kernel panic I can look at...= >=20 > All we have now in this case is a screenshot you may see in the tic= ket. > =C2=A0Also previously the same user on some earlier version of stab= le/12 > reported other very weird panics on process lock being dropped wher= e it > can't be in some other sysctls inside kern.proc, so if we guess tho= se > are related, I suspect there may be some kind of memory corruption > happening, but have no clue where.=C2=A0 Unfortunately we have only= textdumps > for those.=C2=A0 So if Xin is able to reproduce it locally, it may = be our > best chance to debug it, at least this specific issue. >=20 >=20 > That's totally weird.=C2=A0 >=20 > Xin Li's=C2=A0traceback lead to code I just rewrote in current, while t= his > code leads to code that's been there for a long time and hasn't been > MFC'd. This suggests that Xin Li's=C2=A0backtrace isn't to be trusted, = or > there's two issues at play. Both are plausible. I've fixed a minor > signedness bug and a possible one byte overflow that might have happene= d > in the code I just rewrote. But I suspect this is due to something else= > related to how children are handled after we've raced. Maybe there's > something special about how USB does things, because other buses will > create the child early and the child list is stable. If USB's discovery= > code is adding something and is racing with devd's=C2=A0walking of the = tree, > that might explain it...=C2=A0 It would be nice if there were some way = to > provoke the race on a system I could get a core from for deeper analysi= s.... There might be some other players; I just don't have a lot of time recently to shoot it down; the system is somewhat critical for my internal network so I can't afford to have long downtimes (it's fine if I have controllable downtimes, e.g. if you want me to deliberately panic the system and get some debugging data, please feel free to ask as long as I can continue to boot at the end of experiment :)) =46rom what I was observing, it seems to be some kind of race condition between the USB stack and sysctl tree; however, the race might be delicate as I never successfully provoked the panic on my laptop, which also runs -CURRENT). If you want to add some instruments to the code, please let me know and I'll get the tree patched to try to catch it. Cheers, > Warner > =C2=A0 >=20 > --=20 > Alexander Motin >=20 --------------ED3A605B99096EC90529902D-- --oKbvK48qPVSnnoqk5JOc2KVAOQJnS4ZYK-- --4h3QLtY5N89prGwiclIaknmUEBbV7jjfY Content-Type: application/pgp-signature; name="OpenPGP_signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="OpenPGP_signature" -----BEGIN PGP SIGNATURE----- wsF5BAABCAAjFiEEceNg5NEMZIki80nQQHl/fJX0g08FAl+E2+wFAwAAAAAACgkQQHl/fJX0g0/3 SRAAhOAgJXlvZyWqzg9ZFi9mtul7GpLmMSDSh3osBpufBKF40AvZyXyHBRslkG04EiQ6NZ3/8i0+ JG9mE+Zl7dpYGjcD8ucbOsnK2DyrHubDL91gccUYWig7i2fSmO+Sa1Ow/vEZZ0oSRSoqJ0+QfVoS e96mPLWYeo4yOA/ANmK47YfC0t6NB9Sc4NDo3pvmCORE9yBE3UKDGVUXMIC4nr9jy/h9vSosH6We 4HtEKG1txrckIv8bN7J+kPAADvLzgnKuCaHr7iSDPm+ak3sbbabGnSY9TRkDdh3BvZ3z58rrTsui TSZTa/+LJ4KLK9NhjKsKNzp2CJQL/iJQ+ZiweT1tZ0pXaFJUGBlpgSuA7Iwt4yrMIF7UAicRXLYq IwCaCLLctb79ixHxHPuhR19sthCpVbMiFOdad73SBfvhQ78RPfknr9Wljjr9nYcaFQ7qMMSivlwW wFRXBRLkyXvw6XLlmX+l1UcfoZjmM+GaQ3qoTriYnStJlim63UumsSpeBSvNlBmhJtTUEJj2V6aL mz1jd8pgkVU6Hv9gJErxNv3EuLHGeC1SkHfTEK+rduuXDtpN005Onm//9cGWetnZ61hKZXbbGq6L 4CT22HAGJtBnmcXBZEAiHihHJmRv5qQB883yQsC7EQebHOd99FTb57GhYfB70c4j6JWYrc2p9BUq sKw= =DRvH -----END PGP SIGNATURE----- --4h3QLtY5N89prGwiclIaknmUEBbV7jjfY--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0ab2fc9a-3f60-c375-03ee-1f10c32acc2f>