Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 12 Oct 2020 15:42:52 -0700
From:      Xin Li <delphij@delphij.net>
To:        Warner Losh <imp@bsdimp.com>, Alexander Motin <mav@freebsd.org>
Cc:        Xin LI <d@delphij.net>, FreeBSD Current <freebsd-current@freebsd.org>, Warner Losh <imp@freebsd.org>
Subject:   Re: GPF on boot with devmatch
Message-ID:  <0ab2fc9a-3f60-c375-03ee-1f10c32acc2f@delphij.net>
In-Reply-To: <CANCZdfqGzeQFspsnNbAc2PUd0_JiEjSzmzeHqOZKZ=twr-Go3Q@mail.gmail.com>
References:  <02fa309e-9467-f741-8092-974bfc145c9a@FreeBSD.org> <CANCZdfp_djyU_-UkRHy1eZEu_XLekc%2BYuA2=9k74=rbJFR3S0A@mail.gmail.com> <5e4f0439-08fa-7715-7672-05793d05cc6d@FreeBSD.org> <CANCZdfqGzeQFspsnNbAc2PUd0_JiEjSzmzeHqOZKZ=twr-Go3Q@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--4h3QLtY5N89prGwiclIaknmUEBbV7jjfY
Content-Type: multipart/mixed; boundary="oKbvK48qPVSnnoqk5JOc2KVAOQJnS4ZYK";
 protected-headers="v1"
From: Xin Li <delphij@delphij.net>
Reply-To: d@delphij.net
To: Warner Losh <imp@bsdimp.com>, Alexander Motin <mav@freebsd.org>
Cc: Xin LI <d@delphij.net>, FreeBSD Current <freebsd-current@freebsd.org>,
 Warner Losh <imp@freebsd.org>
Message-ID: <0ab2fc9a-3f60-c375-03ee-1f10c32acc2f@delphij.net>
Subject: Re: GPF on boot with devmatch
References: <02fa309e-9467-f741-8092-974bfc145c9a@FreeBSD.org>
 <CANCZdfp_djyU_-UkRHy1eZEu_XLekc+YuA2=9k74=rbJFR3S0A@mail.gmail.com>
 <5e4f0439-08fa-7715-7672-05793d05cc6d@FreeBSD.org>
 <CANCZdfqGzeQFspsnNbAc2PUd0_JiEjSzmzeHqOZKZ=twr-Go3Q@mail.gmail.com>
In-Reply-To: <CANCZdfqGzeQFspsnNbAc2PUd0_JiEjSzmzeHqOZKZ=twr-Go3Q@mail.gmail.com>

--oKbvK48qPVSnnoqk5JOc2KVAOQJnS4ZYK
Content-Type: multipart/mixed;
 boundary="------------ED3A605B99096EC90529902D"
Content-Language: en-US

This is a multi-part message in MIME format.
--------------ED3A605B99096EC90529902D
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

On 10/12/20 11:13, Warner Losh wrote:
>=20
>=20
> On Mon, Oct 5, 2020 at 3:39 PM Alexander Motin <mav@freebsd.org
> <mailto:mav@freebsd.org>> wrote:
>=20
>     On 05.10.2020 17:20, Warner Losh wrote:
>     > On Mon, Oct 5, 2020 at 12:36 PM Alexander Motin <mav@freebsd.org
>     <mailto:mav@freebsd.org>
>     > <mailto:mav@freebsd.org <mailto:mav@freebsd.org>>> wrote:
>     >
>     >=C2=A0 =C2=A0 =C2=A0I can add that we've received report about ide=
ntical panic on
>     FreeBSD
>     >=C2=A0 =C2=A0 =C2=A0releng/12.2 of r365436, AKA TrueNAS 12.0-RC1:
>     >=C2=A0 =C2=A0 =C2=A0https://jira.ixsystems.com/browse/NAS-107578
>     <https://jira.ixsystems.com/browse/NAS-107578>; .=C2=A0 So it looks =
a) pretty
>     >=C2=A0 =C2=A0 =C2=A0rate (one report from thousands of early adopt=
ers and none in
>     our lab),
>     >=C2=A0 =C2=A0 =C2=A0and b) it is in stable/12 too, not only head.
>     >
>     > Thanks! I'll see if I can recreate here....=C2=A0 But we're acces=
sing the
>     > sysctl tree from devmatch to get some information, which should a=
lways
>     > be OK (the fact that it isn't suggests=C2=A0either a bug in some =
driver
>     > leaving bad pointers, or some race or both)...=C2=A0 It would be =
nice
>     to know
>     > which nodes they were, or to have a kernel panic I can look at...=

>=20
>     All we have now in this case is a screenshot you may see in the tic=
ket.
>     =C2=A0Also previously the same user on some earlier version of stab=
le/12
>     reported other very weird panics on process lock being dropped wher=
e it
>     can't be in some other sysctls inside kern.proc, so if we guess tho=
se
>     are related, I suspect there may be some kind of memory corruption
>     happening, but have no clue where.=C2=A0 Unfortunately we have only=
 textdumps
>     for those.=C2=A0 So if Xin is able to reproduce it locally, it may =
be our
>     best chance to debug it, at least this specific issue.
>=20
>=20
> That's totally weird.=C2=A0
>=20
> Xin Li's=C2=A0traceback lead to code I just rewrote in current, while t=
his
> code leads to code that's been there for a long time and hasn't been
> MFC'd. This suggests that Xin Li's=C2=A0backtrace isn't to be trusted, =
or
> there's two issues at play. Both are plausible. I've fixed a minor
> signedness bug and a possible one byte overflow that might have happene=
d
> in the code I just rewrote. But I suspect this is due to something else=

> related to how children are handled after we've raced. Maybe there's
> something special about how USB does things, because other buses will
> create the child early and the child list is stable. If USB's discovery=

> code is adding something and is racing with devd's=C2=A0walking of the =
tree,
> that might explain it...=C2=A0 It would be nice if there were some way =
to
> provoke the race on a system I could get a core from for deeper analysi=
s....

There might be some other players; I just don't have a lot of time
recently to shoot it down; the system is somewhat critical for my
internal network so I can't afford to have long downtimes (it's fine if
I have controllable downtimes, e.g. if you want me to deliberately panic
the system and get some debugging data, please feel free to ask as long
as I can continue to boot at the end of experiment :))

=46rom what I was observing, it seems to be some kind of race condition
between the USB stack and sysctl tree; however, the race might be
delicate as I never successfully provoked the panic on my laptop, which
also runs -CURRENT).  If you want to add some instruments to the code,
please let me know and I'll get the tree patched to try to catch it.

Cheers,

> Warner
> =C2=A0
>=20
>     --=20
>     Alexander Motin
>=20


--------------ED3A605B99096EC90529902D--

--oKbvK48qPVSnnoqk5JOc2KVAOQJnS4ZYK--

--4h3QLtY5N89prGwiclIaknmUEBbV7jjfY
Content-Type: application/pgp-signature; name="OpenPGP_signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="OpenPGP_signature"

-----BEGIN PGP SIGNATURE-----

wsF5BAABCAAjFiEEceNg5NEMZIki80nQQHl/fJX0g08FAl+E2+wFAwAAAAAACgkQQHl/fJX0g0/3
SRAAhOAgJXlvZyWqzg9ZFi9mtul7GpLmMSDSh3osBpufBKF40AvZyXyHBRslkG04EiQ6NZ3/8i0+
JG9mE+Zl7dpYGjcD8ucbOsnK2DyrHubDL91gccUYWig7i2fSmO+Sa1Ow/vEZZ0oSRSoqJ0+QfVoS
e96mPLWYeo4yOA/ANmK47YfC0t6NB9Sc4NDo3pvmCORE9yBE3UKDGVUXMIC4nr9jy/h9vSosH6We
4HtEKG1txrckIv8bN7J+kPAADvLzgnKuCaHr7iSDPm+ak3sbbabGnSY9TRkDdh3BvZ3z58rrTsui
TSZTa/+LJ4KLK9NhjKsKNzp2CJQL/iJQ+ZiweT1tZ0pXaFJUGBlpgSuA7Iwt4yrMIF7UAicRXLYq
IwCaCLLctb79ixHxHPuhR19sthCpVbMiFOdad73SBfvhQ78RPfknr9Wljjr9nYcaFQ7qMMSivlwW
wFRXBRLkyXvw6XLlmX+l1UcfoZjmM+GaQ3qoTriYnStJlim63UumsSpeBSvNlBmhJtTUEJj2V6aL
mz1jd8pgkVU6Hv9gJErxNv3EuLHGeC1SkHfTEK+rduuXDtpN005Onm//9cGWetnZ61hKZXbbGq6L
4CT22HAGJtBnmcXBZEAiHihHJmRv5qQB883yQsC7EQebHOd99FTb57GhYfB70c4j6JWYrc2p9BUq
sKw=
=DRvH
-----END PGP SIGNATURE-----

--4h3QLtY5N89prGwiclIaknmUEBbV7jjfY--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0ab2fc9a-3f60-c375-03ee-1f10c32acc2f>