From owner-freebsd-current@freebsd.org Mon Oct 12 22:43:01 2020 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 5E4DB42A6AF for ; Mon, 12 Oct 2020 22:43:01 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from anubis.delphij.net (anubis.delphij.net [64.62.153.212]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "anubis.delphij.net", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4C9DGX1dMjz3Wrm; Mon, 12 Oct 2020 22:42:59 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from odin.corp.delphij.net (unknown [IPv6:2601:646:8601:f4a:a0bc:5958:aee5:d913]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by anubis.delphij.net (Postfix) with ESMTPSA id 3E3573C9FC; Mon, 12 Oct 2020 15:42:53 -0700 (PDT) Reply-To: d@delphij.net Subject: Re: GPF on boot with devmatch To: Warner Losh , Alexander Motin Cc: Xin LI , FreeBSD Current , Warner Losh References: <02fa309e-9467-f741-8092-974bfc145c9a@FreeBSD.org> <5e4f0439-08fa-7715-7672-05793d05cc6d@FreeBSD.org> From: Xin Li Message-ID: <0ab2fc9a-3f60-c375-03ee-1f10c32acc2f@delphij.net> Date: Mon, 12 Oct 2020 15:42:52 -0700 User-Agent: Thunderbird MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="4h3QLtY5N89prGwiclIaknmUEBbV7jjfY" X-Rspamd-Queue-Id: 4C9DGX1dMjz3Wrm X-Spamd-Bar: / X-Spamd-Result: default: False [0.81 / 15.00]; HAS_REPLYTO(0.00)[d@delphij.net]; RCVD_VIA_SMTP_AUTH(0.00)[]; XM_UA_NO_VERSION(0.01)[]; R_SPF_ALLOW(0.00)[+mx]; HAS_ATTACHMENT(0.00)[]; URIBL_RED(3.50)[ixsystems.com:url]; RCPT_COUNT_FIVE(0.00)[5]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[delphij.net:+]; DMARC_POLICY_ALLOW(0.00)[delphij.net,reject]; HAS_ANON_DOMAIN(0.10)[]; SIGNED_PGP(-2.00)[]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:+,3:+,4:+,5:~]; ASN(0.00)[asn:6939, ipnet:64.62.128.0/18, country:US]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.72)[-0.722]; R_DKIM_ALLOW(0.00)[delphij.net:s=m7e2]; FREEFALL_USER(0.00)[delphij]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.89)[0.886]; NEURAL_HAM_LONG(-0.86)[-0.862]; MIME_GOOD(-0.20)[multipart/signed,multipart/mixed,text/plain,application/pgp-keys]; REPLYTO_DOM_EQ_FROM_DOM(0.00)[]; BAD_REP_POLICIES(0.10)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; MAILMAN_DEST(0.00)[freebsd-current] X-Content-Filtered-By: Mailman/MimeDel 2.1.33 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Oct 2020 22:43:01 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --4h3QLtY5N89prGwiclIaknmUEBbV7jjfY Content-Type: multipart/mixed; boundary="oKbvK48qPVSnnoqk5JOc2KVAOQJnS4ZYK"; protected-headers="v1" From: Xin Li Reply-To: d@delphij.net To: Warner Losh , Alexander Motin Cc: Xin LI , FreeBSD Current , Warner Losh Message-ID: <0ab2fc9a-3f60-c375-03ee-1f10c32acc2f@delphij.net> Subject: Re: GPF on boot with devmatch References: <02fa309e-9467-f741-8092-974bfc145c9a@FreeBSD.org> <5e4f0439-08fa-7715-7672-05793d05cc6d@FreeBSD.org> In-Reply-To: --oKbvK48qPVSnnoqk5JOc2KVAOQJnS4ZYK Content-Type: multipart/mixed; boundary="------------ED3A605B99096EC90529902D" Content-Language: en-US This is a multi-part message in MIME format. --------------ED3A605B99096EC90529902D Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 10/12/20 11:13, Warner Losh wrote: >=20 >=20 > On Mon, Oct 5, 2020 at 3:39 PM Alexander Motin > wrote: >=20 > On 05.10.2020 17:20, Warner Losh wrote: > > On Mon, Oct 5, 2020 at 12:36 PM Alexander Motin > > >> wrote: > > > >=C2=A0 =C2=A0 =C2=A0I can add that we've received report about ide= ntical panic on > FreeBSD > >=C2=A0 =C2=A0 =C2=A0releng/12.2 of r365436, AKA TrueNAS 12.0-RC1: > >=C2=A0 =C2=A0 =C2=A0https://jira.ixsystems.com/browse/NAS-107578 > .=C2=A0 So it looks = a) pretty > >=C2=A0 =C2=A0 =C2=A0rate (one report from thousands of early adopt= ers and none in > our lab), > >=C2=A0 =C2=A0 =C2=A0and b) it is in stable/12 too, not only head. > > > > Thanks! I'll see if I can recreate here....=C2=A0 But we're acces= sing the > > sysctl tree from devmatch to get some information, which should a= lways > > be OK (the fact that it isn't suggests=C2=A0either a bug in some = driver > > leaving bad pointers, or some race or both)...=C2=A0 It would be = nice > to know > > which nodes they were, or to have a kernel panic I can look at...= >=20 > All we have now in this case is a screenshot you may see in the tic= ket. > =C2=A0Also previously the same user on some earlier version of stab= le/12 > reported other very weird panics on process lock being dropped wher= e it > can't be in some other sysctls inside kern.proc, so if we guess tho= se > are related, I suspect there may be some kind of memory corruption > happening, but have no clue where.=C2=A0 Unfortunately we have only= textdumps > for those.=C2=A0 So if Xin is able to reproduce it locally, it may = be our > best chance to debug it, at least this specific issue. >=20 >=20 > That's totally weird.=C2=A0 >=20 > Xin Li's=C2=A0traceback lead to code I just rewrote in current, while t= his > code leads to code that's been there for a long time and hasn't been > MFC'd. This suggests that Xin Li's=C2=A0backtrace isn't to be trusted, = or > there's two issues at play. Both are plausible. I've fixed a minor > signedness bug and a possible one byte overflow that might have happene= d > in the code I just rewrote. But I suspect this is due to something else= > related to how children are handled after we've raced. Maybe there's > something special about how USB does things, because other buses will > create the child early and the child list is stable. If USB's discovery= > code is adding something and is racing with devd's=C2=A0walking of the = tree, > that might explain it...=C2=A0 It would be nice if there were some way = to > provoke the race on a system I could get a core from for deeper analysi= s.... There might be some other players; I just don't have a lot of time recently to shoot it down; the system is somewhat critical for my internal network so I can't afford to have long downtimes (it's fine if I have controllable downtimes, e.g. if you want me to deliberately panic the system and get some debugging data, please feel free to ask as long as I can continue to boot at the end of experiment :)) =46rom what I was observing, it seems to be some kind of race condition between the USB stack and sysctl tree; however, the race might be delicate as I never successfully provoked the panic on my laptop, which also runs -CURRENT). If you want to add some instruments to the code, please let me know and I'll get the tree patched to try to catch it. Cheers, > Warner > =C2=A0 >=20 > --=20 > Alexander Motin >=20 --------------ED3A605B99096EC90529902D-- --oKbvK48qPVSnnoqk5JOc2KVAOQJnS4ZYK-- --4h3QLtY5N89prGwiclIaknmUEBbV7jjfY Content-Type: application/pgp-signature; name="OpenPGP_signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="OpenPGP_signature" -----BEGIN PGP SIGNATURE----- wsF5BAABCAAjFiEEceNg5NEMZIki80nQQHl/fJX0g08FAl+E2+wFAwAAAAAACgkQQHl/fJX0g0/3 SRAAhOAgJXlvZyWqzg9ZFi9mtul7GpLmMSDSh3osBpufBKF40AvZyXyHBRslkG04EiQ6NZ3/8i0+ JG9mE+Zl7dpYGjcD8ucbOsnK2DyrHubDL91gccUYWig7i2fSmO+Sa1Ow/vEZZ0oSRSoqJ0+QfVoS e96mPLWYeo4yOA/ANmK47YfC0t6NB9Sc4NDo3pvmCORE9yBE3UKDGVUXMIC4nr9jy/h9vSosH6We 4HtEKG1txrckIv8bN7J+kPAADvLzgnKuCaHr7iSDPm+ak3sbbabGnSY9TRkDdh3BvZ3z58rrTsui TSZTa/+LJ4KLK9NhjKsKNzp2CJQL/iJQ+ZiweT1tZ0pXaFJUGBlpgSuA7Iwt4yrMIF7UAicRXLYq IwCaCLLctb79ixHxHPuhR19sthCpVbMiFOdad73SBfvhQ78RPfknr9Wljjr9nYcaFQ7qMMSivlwW wFRXBRLkyXvw6XLlmX+l1UcfoZjmM+GaQ3qoTriYnStJlim63UumsSpeBSvNlBmhJtTUEJj2V6aL mz1jd8pgkVU6Hv9gJErxNv3EuLHGeC1SkHfTEK+rduuXDtpN005Onm//9cGWetnZ61hKZXbbGq6L 4CT22HAGJtBnmcXBZEAiHihHJmRv5qQB883yQsC7EQebHOd99FTb57GhYfB70c4j6JWYrc2p9BUq sKw= =DRvH -----END PGP SIGNATURE----- --4h3QLtY5N89prGwiclIaknmUEBbV7jjfY--