Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 8 Mar 2022 12:26:05 +0000
From:      Andrew Turner <andrew@fubar.geek.nz>
To:        Mark Johnston <markj@freebsd.org>
Cc:        Mark Millard <marklmi@yahoo.com>, FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>, Ronald Klop <ronald-lists@klop.ws>, bob prohaska <fbsd@www.zefox.net>, Free BSD <freebsd-arm@freebsd.org>, freebsd-current <freebsd-current@freebsd.org>
Subject:   Re: panic: data abort in critical section or under mutex  (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on 14-CURRENT/aarch64 Feb 28))
Message-ID:  <FB6C78DE-A043-4E99-BF17-7DC2F638E685@fubar.geek.nz>
In-Reply-To: <YiZXKcX3mfLn2iNA@nuc>
References:  <C2F96211-0180-45DA-872F-52358D9ED35B.ref@yahoo.com> <C2F96211-0180-45DA-872F-52358D9ED35B@yahoo.com> <1800459695.1.1646649539521@mailrelay> <132978150.92.1646660769467@mailrelay> <YiYhIQXl1sd4cOVS@nuc> <3374E0F8-D712-4ED0-A62B-B6924FC8A5E2@fubar.geek.nz> <YiY2jmD97leKev0F@nuc> <F25AAD14-209C-43AA-8496-8396F4C4EB76@yahoo.com> <YiZXKcX3mfLn2iNA@nuc>

next in thread | previous in thread | raw e-mail | index | archive | help

--Apple-Mail=_27C8038E-6248-4772-9061-7457E0DFDA9B
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8



> On 7 Mar 2022, at 19:04, Mark Johnston <markj@freebsd.org> wrote:
>=20
> On Mon, Mar 07, 2022 at 10:03:51AM -0800, Mark Millard wrote:
>>=20
>>=20
>> On 2022-Mar-7, at 08:45, Mark Johnston <markj@FreeBSD.org> wrote:
>>=20
>>> On Mon, Mar 07, 2022 at 04:25:22PM +0000, Andrew Turner wrote:
>>>>=20
>>>>> On 7 Mar 2022, at 15:13, Mark Johnston <markj@freebsd.org> wrote:
>>>>> ...
>>>>> A (the?) problem is that the compiler is treating "pc" as an alias
>>>>> for x18, but the rmlock code assumes that the pcpu pointer is =
loaded
>>>>> once, as it dereferences "pc" outside of the critical section.  On
>>>>> arm64, if a context switch occurs between the store at =
_rm_rlock+144 and
>>>>> the load at +152, and the thread is migrated to another CPU, then =
we'll
>>>>> end up using the wrong CPU ID in the rm->rm_writecpus test.
>>>>>=20
>>>>> I suspect the problem is unique to arm64 as its get_pcpu()
>>>>> implementation is different from the others in that it doesn't use
>>>>> volatile-qualified inline assembly.  This has been the case since
>>>>> =
https://cgit.freebsd.org/src/commit/?id=3D63c858a04d56529eddbddf85ad04fc8e=
99e73762 =
<https://cgit.freebsd.org/src/commit/?id=3D63c858a04d56529eddbddf85ad04fc8=
e99e73762>
>>>>> .
>>>>>=20
>>>>> I haven't been able to reproduce any crashes running poudriere in =
an
>>>>> arm64 AWS instance, though.  Could you please try the patch below =
and
>>>>> confirm whether it fixes your panics?  I verified that the =
apparent
>>>>> problem described above is gone with the patch.
>>>>=20
>>>> Alternatively (or additionally) we could do something like the =
following. There are only a few MI users of get_pcpu with the main place =
being in rm locks.
>>>>=20
>>>> diff --git a/sys/arm64/include/pcpu.h b/sys/arm64/include/pcpu.h
>>>> index 09f6361c651c..59b890e5c2ea 100644
>>>> --- a/sys/arm64/include/pcpu.h
>>>> +++ b/sys/arm64/include/pcpu.h
>>>> @@ -58,7 +58,14 @@ struct pcpu;
>>>>=20
>>>> register struct pcpu *pcpup __asm ("x18");
>>>>=20
>>>> -#define        get_pcpu()      pcpup
>>>> +static inline struct pcpu *
>>>> +get_pcpu(void)
>>>> +{
>>>> +       struct pcpu *pcpu;
>>>> +
>>>> +       __asm __volatile("mov   %0, x18" : "=3D&r"(pcpu));
>>>> +       return (pcpu);
>>>> +}
>>>>=20
>>>> static inline struct thread *
>>>> get_curthread(void)
>>>=20
>>> Indeed, I think this is probably the best solution.

I=E2=80=99ve pushed the above to git in ed3066342660 & will MFC in a few =
days.

>=20
> Thinking a bit more, even with that patch, code like this may not =
behave
> the same on arm64 as on other platforms:
>=20
> critical_enter();
> ptr =3D &PCPU_GET(foo);
> critical_exit();
> bar =3D *ptr;
>=20
> since as far as I can see the compiler may translate it to
>=20
> critical_enter();
> critical_exit();
> bar =3D PCPU_GET(foo);

If we think this will be a problem we could change the PCPU_PTR macro to =
use get_pcpu again, however I only see two places it=E2=80=99s used in =
the MI code in subr_witness.c and kern_clock.c. Neither of these appear =
to be problematic from a quick look as there are no critical sections, =
although I=E2=80=99m not familiar enough with the code to know for =
certain.

Andrew=

--Apple-Mail=_27C8038E-6248-4772-9061-7457E0DFDA9B
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=utf-8

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; =
charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; line-break: after-white-space;" class=3D""><br =
class=3D""><div><br class=3D""><blockquote type=3D"cite" class=3D""><div =
class=3D"">On 7 Mar 2022, at 19:04, Mark Johnston &lt;<a =
href=3D"mailto:markj@freebsd.org" class=3D"">markj@freebsd.org</a>&gt; =
wrote:</div><br class=3D"Apple-interchange-newline"><div class=3D""><meta =
charset=3D"UTF-8" class=3D""><span style=3D"caret-color: rgb(0, 0, 0); =
font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
text-decoration: none; float: none; display: inline !important;" =
class=3D"">On Mon, Mar 07, 2022 at 10:03:51AM -0800, Mark Millard =
wrote:</span><br style=3D"caret-color: rgb(0, 0, 0); font-family: =
Helvetica; font-size: 12px; font-style: normal; font-variant-caps: =
normal; font-weight: normal; letter-spacing: normal; text-align: start; =
text-indent: 0px; text-transform: none; white-space: normal; =
word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: =
none;" class=3D""><blockquote type=3D"cite" style=3D"font-family: =
Helvetica; font-size: 12px; font-style: normal; font-variant-caps: =
normal; font-weight: normal; letter-spacing: normal; orphans: auto; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; =
-webkit-text-stroke-width: 0px; text-decoration: none;" class=3D""><br =
class=3D""><br class=3D"">On 2022-Mar-7, at 08:45, Mark Johnston &lt;<a =
href=3D"mailto:markj@FreeBSD.org" class=3D"">markj@FreeBSD.org</a>&gt; =
wrote:<br class=3D""><br class=3D""><blockquote type=3D"cite" =
class=3D"">On Mon, Mar 07, 2022 at 04:25:22PM +0000, Andrew Turner =
wrote:<br class=3D""><blockquote type=3D"cite" class=3D""><br =
class=3D""><blockquote type=3D"cite" class=3D"">On 7 Mar 2022, at 15:13, =
Mark Johnston &lt;<a href=3D"mailto:markj@freebsd.org" =
class=3D"">markj@freebsd.org</a>&gt; wrote:<br class=3D"">...<br =
class=3D"">A (the?) problem is that the compiler is treating "pc" as an =
alias<br class=3D"">for x18, but the rmlock code assumes that the pcpu =
pointer is loaded<br class=3D"">once, as it dereferences "pc" outside of =
the critical section. &nbsp;On<br class=3D"">arm64, if a context switch =
occurs between the store at _rm_rlock+144 and<br class=3D"">the load at =
+152, and the thread is migrated to another CPU, then we'll<br =
class=3D"">end up using the wrong CPU ID in the rm-&gt;rm_writecpus =
test.<br class=3D""><br class=3D"">I suspect the problem is unique to =
arm64 as its get_pcpu()<br class=3D"">implementation is different from =
the others in that it doesn't use<br class=3D"">volatile-qualified =
inline assembly. &nbsp;This has been the case since<br class=3D""><a =
href=3D"https://cgit.freebsd.org/src/commit/?id=3D63c858a04d56529eddbddf85=
ad04fc8e99e73762" =
class=3D"">https://cgit.freebsd.org/src/commit/?id=3D63c858a04d56529eddbdd=
f85ad04fc8e99e73762</a> &lt;<a =
href=3D"https://cgit.freebsd.org/src/commit/?id=3D63c858a04d56529eddbddf85=
ad04fc8e99e73762" =
class=3D"">https://cgit.freebsd.org/src/commit/?id=3D63c858a04d56529eddbdd=
f85ad04fc8e99e73762</a>&gt;<br class=3D"">.<br class=3D""><br class=3D"">I=
 haven't been able to reproduce any crashes running poudriere in an<br =
class=3D"">arm64 AWS instance, though. &nbsp;Could you please try the =
patch below and<br class=3D"">confirm whether it fixes your panics? =
&nbsp;I verified that the apparent<br class=3D"">problem described above =
is gone with the patch.<br class=3D""></blockquote><br =
class=3D"">Alternatively (or additionally) we could do something like =
the following. There are only a few MI users of get_pcpu with the main =
place being in rm locks.<br class=3D""><br class=3D"">diff --git =
a/sys/arm64/include/pcpu.h b/sys/arm64/include/pcpu.h<br class=3D"">index =
09f6361c651c..59b890e5c2ea 100644<br class=3D"">--- =
a/sys/arm64/include/pcpu.h<br class=3D"">+++ =
b/sys/arm64/include/pcpu.h<br class=3D"">@@ -58,7 +58,14 @@ struct =
pcpu;<br class=3D""><br class=3D"">register struct pcpu *pcpup __asm =
("x18");<br class=3D""><br class=3D"">-#define =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;get_pcpu() =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;pcpup<br class=3D"">+static inline struct =
pcpu *<br class=3D"">+get_pcpu(void)<br class=3D"">+{<br class=3D"">+ =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;struct pcpu *pcpu;<br class=3D"">+<br =
class=3D"">+ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;__asm __volatile("mov =
&nbsp;&nbsp;%0, x18" : "=3D&amp;r"(pcpu));<br class=3D"">+ =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return (pcpu);<br class=3D"">+}<br =
class=3D""><br class=3D"">static inline struct thread *<br =
class=3D"">get_curthread(void)<br class=3D""></blockquote><br =
class=3D"">Indeed, I think this is probably the best solution.<br =
class=3D""></blockquote></blockquote></div></blockquote><div><br =
class=3D""></div><div>I=E2=80=99ve pushed the above to git =
in&nbsp;ed3066342660 &amp; will MFC in a few days.</div><br =
class=3D""><blockquote type=3D"cite" class=3D""><div class=3D""><br =
style=3D"caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: =
12px; font-style: normal; font-variant-caps: normal; font-weight: =
normal; letter-spacing: normal; text-align: start; text-indent: 0px; =
text-transform: none; white-space: normal; word-spacing: 0px; =
-webkit-text-stroke-width: 0px; text-decoration: none;" class=3D""><span =
style=3D"caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: =
12px; font-style: normal; font-variant-caps: normal; font-weight: =
normal; letter-spacing: normal; text-align: start; text-indent: 0px; =
text-transform: none; white-space: normal; word-spacing: 0px; =
-webkit-text-stroke-width: 0px; text-decoration: none; float: none; =
display: inline !important;" class=3D"">Thinking a bit more, even with =
that patch, code like this may not behave</span><br style=3D"caret-color: =
rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: =
normal; font-variant-caps: normal; font-weight: normal; letter-spacing: =
normal; text-align: start; text-indent: 0px; text-transform: none; =
white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
text-decoration: none;" class=3D""><span style=3D"caret-color: rgb(0, 0, =
0); font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
text-decoration: none; float: none; display: inline !important;" =
class=3D"">the same on arm64 as on other platforms:</span><br =
style=3D"caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: =
12px; font-style: normal; font-variant-caps: normal; font-weight: =
normal; letter-spacing: normal; text-align: start; text-indent: 0px; =
text-transform: none; white-space: normal; word-spacing: 0px; =
-webkit-text-stroke-width: 0px; text-decoration: none;" class=3D""><br =
style=3D"caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: =
12px; font-style: normal; font-variant-caps: normal; font-weight: =
normal; letter-spacing: normal; text-align: start; text-indent: 0px; =
text-transform: none; white-space: normal; word-spacing: 0px; =
-webkit-text-stroke-width: 0px; text-decoration: none;" class=3D""><span =
style=3D"caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: =
12px; font-style: normal; font-variant-caps: normal; font-weight: =
normal; letter-spacing: normal; text-align: start; text-indent: 0px; =
text-transform: none; white-space: normal; word-spacing: 0px; =
-webkit-text-stroke-width: 0px; text-decoration: none; float: none; =
display: inline !important;" class=3D"">critical_enter();</span><br =
style=3D"caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: =
12px; font-style: normal; font-variant-caps: normal; font-weight: =
normal; letter-spacing: normal; text-align: start; text-indent: 0px; =
text-transform: none; white-space: normal; word-spacing: 0px; =
-webkit-text-stroke-width: 0px; text-decoration: none;" class=3D""><span =
style=3D"caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: =
12px; font-style: normal; font-variant-caps: normal; font-weight: =
normal; letter-spacing: normal; text-align: start; text-indent: 0px; =
text-transform: none; white-space: normal; word-spacing: 0px; =
-webkit-text-stroke-width: 0px; text-decoration: none; float: none; =
display: inline !important;" class=3D"">ptr =3D =
&amp;PCPU_GET(foo);</span><br style=3D"caret-color: rgb(0, 0, 0); =
font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
text-decoration: none;" class=3D""><span style=3D"caret-color: rgb(0, 0, =
0); font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
text-decoration: none; float: none; display: inline !important;" =
class=3D"">critical_exit();</span><br style=3D"caret-color: rgb(0, 0, =
0); font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
text-decoration: none;" class=3D""><span style=3D"caret-color: rgb(0, 0, =
0); font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
text-decoration: none; float: none; display: inline !important;" =
class=3D"">bar =3D *ptr;</span><br style=3D"caret-color: rgb(0, 0, 0); =
font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
text-decoration: none;" class=3D""><br style=3D"caret-color: rgb(0, 0, =
0); font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
text-decoration: none;" class=3D""><span style=3D"caret-color: rgb(0, 0, =
0); font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
text-decoration: none; float: none; display: inline !important;" =
class=3D"">since as far as I can see the compiler may translate it =
to</span><br style=3D"caret-color: rgb(0, 0, 0); font-family: Helvetica; =
font-size: 12px; font-style: normal; font-variant-caps: normal; =
font-weight: normal; letter-spacing: normal; text-align: start; =
text-indent: 0px; text-transform: none; white-space: normal; =
word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: =
none;" class=3D""><br style=3D"caret-color: rgb(0, 0, 0); font-family: =
Helvetica; font-size: 12px; font-style: normal; font-variant-caps: =
normal; font-weight: normal; letter-spacing: normal; text-align: start; =
text-indent: 0px; text-transform: none; white-space: normal; =
word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: =
none;" class=3D""><span style=3D"caret-color: rgb(0, 0, 0); font-family: =
Helvetica; font-size: 12px; font-style: normal; font-variant-caps: =
normal; font-weight: normal; letter-spacing: normal; text-align: start; =
text-indent: 0px; text-transform: none; white-space: normal; =
word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: =
none; float: none; display: inline !important;" =
class=3D"">critical_enter();</span><br style=3D"caret-color: rgb(0, 0, =
0); font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
text-decoration: none;" class=3D""><span style=3D"caret-color: rgb(0, 0, =
0); font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
text-decoration: none; float: none; display: inline !important;" =
class=3D"">critical_exit();</span><br style=3D"caret-color: rgb(0, 0, =
0); font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
text-decoration: none;" class=3D""><span style=3D"caret-color: rgb(0, 0, =
0); font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
text-decoration: none; float: none; display: inline !important;" =
class=3D"">bar =3D PCPU_GET(foo);</span><br style=3D"caret-color: rgb(0, =
0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
text-decoration: none;" class=3D""></div></blockquote><div><br =
class=3D""></div><div>If we think this will be a problem we could change =
the&nbsp;PCPU_PTR macro to use get_pcpu again, however I only see two =
places it=E2=80=99s used in the MI code in&nbsp;subr_witness.c =
and&nbsp;kern_clock.c. Neither of these appear to be problematic from a =
quick look as there are no critical sections, although I=E2=80=99m not =
familiar enough with the code to know for certain.</div><div><br =
class=3D""></div><div>Andrew</div></div></body></html>=

--Apple-Mail=_27C8038E-6248-4772-9061-7457E0DFDA9B--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FB6C78DE-A043-4E99-BF17-7DC2F638E685>