Date: Tue, 8 Mar 2022 12:26:05 +0000 From: Andrew Turner <andrew@fubar.geek.nz> To: Mark Johnston <markj@freebsd.org> Cc: Mark Millard <marklmi@yahoo.com>, FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>, Ronald Klop <ronald-lists@klop.ws>, bob prohaska <fbsd@www.zefox.net>, Free BSD <freebsd-arm@freebsd.org>, freebsd-current <freebsd-current@freebsd.org> Subject: Re: panic: data abort in critical section or under mutex (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on 14-CURRENT/aarch64 Feb 28)) Message-ID: <FB6C78DE-A043-4E99-BF17-7DC2F638E685@fubar.geek.nz> In-Reply-To: <YiZXKcX3mfLn2iNA@nuc> References: <C2F96211-0180-45DA-872F-52358D9ED35B@yahoo.com> <1800459695.1.1646649539521@mailrelay> <132978150.92.1646660769467@mailrelay> <YiYhIQXl1sd4cOVS@nuc> <3374E0F8-D712-4ED0-A62B-B6924FC8A5E2@fubar.geek.nz> <YiY2jmD97leKev0F@nuc> <F25AAD14-209C-43AA-8496-8396F4C4EB76@yahoo.com> <YiZXKcX3mfLn2iNA@nuc>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --] > On 7 Mar 2022, at 19:04, Mark Johnston <markj@freebsd.org> wrote: > > On Mon, Mar 07, 2022 at 10:03:51AM -0800, Mark Millard wrote: >> >> >> On 2022-Mar-7, at 08:45, Mark Johnston <markj@FreeBSD.org> wrote: >> >>> On Mon, Mar 07, 2022 at 04:25:22PM +0000, Andrew Turner wrote: >>>> >>>>> On 7 Mar 2022, at 15:13, Mark Johnston <markj@freebsd.org> wrote: >>>>> ... >>>>> A (the?) problem is that the compiler is treating "pc" as an alias >>>>> for x18, but the rmlock code assumes that the pcpu pointer is loaded >>>>> once, as it dereferences "pc" outside of the critical section. On >>>>> arm64, if a context switch occurs between the store at _rm_rlock+144 and >>>>> the load at +152, and the thread is migrated to another CPU, then we'll >>>>> end up using the wrong CPU ID in the rm->rm_writecpus test. >>>>> >>>>> I suspect the problem is unique to arm64 as its get_pcpu() >>>>> implementation is different from the others in that it doesn't use >>>>> volatile-qualified inline assembly. This has been the case since >>>>> https://cgit.freebsd.org/src/commit/?id=63c858a04d56529eddbddf85ad04fc8e99e73762 <https://cgit.freebsd.org/src/commit/?id=63c858a04d56529eddbddf85ad04fc8e99e73762> >>>>> . >>>>> >>>>> I haven't been able to reproduce any crashes running poudriere in an >>>>> arm64 AWS instance, though. Could you please try the patch below and >>>>> confirm whether it fixes your panics? I verified that the apparent >>>>> problem described above is gone with the patch. >>>> >>>> Alternatively (or additionally) we could do something like the following. There are only a few MI users of get_pcpu with the main place being in rm locks. >>>> >>>> diff --git a/sys/arm64/include/pcpu.h b/sys/arm64/include/pcpu.h >>>> index 09f6361c651c..59b890e5c2ea 100644 >>>> --- a/sys/arm64/include/pcpu.h >>>> +++ b/sys/arm64/include/pcpu.h >>>> @@ -58,7 +58,14 @@ struct pcpu; >>>> >>>> register struct pcpu *pcpup __asm ("x18"); >>>> >>>> -#define get_pcpu() pcpup >>>> +static inline struct pcpu * >>>> +get_pcpu(void) >>>> +{ >>>> + struct pcpu *pcpu; >>>> + >>>> + __asm __volatile("mov %0, x18" : "=&r"(pcpu)); >>>> + return (pcpu); >>>> +} >>>> >>>> static inline struct thread * >>>> get_curthread(void) >>> >>> Indeed, I think this is probably the best solution. I’ve pushed the above to git in ed3066342660 & will MFC in a few days. > > Thinking a bit more, even with that patch, code like this may not behave > the same on arm64 as on other platforms: > > critical_enter(); > ptr = &PCPU_GET(foo); > critical_exit(); > bar = *ptr; > > since as far as I can see the compiler may translate it to > > critical_enter(); > critical_exit(); > bar = PCPU_GET(foo); If we think this will be a problem we could change the PCPU_PTR macro to use get_pcpu again, however I only see two places it’s used in the MI code in subr_witness.c and kern_clock.c. Neither of these appear to be problematic from a quick look as there are no critical sections, although I’m not familiar enough with the code to know for certain. Andrew [-- Attachment #2 --] <html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On 7 Mar 2022, at 19:04, Mark Johnston <<a href="mailto:markj@freebsd.org" class="">markj@freebsd.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta charset="UTF-8" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">On Mon, Mar 07, 2022 at 10:03:51AM -0800, Mark Millard wrote:</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br class=""><br class="">On 2022-Mar-7, at 08:45, Mark Johnston <<a href="mailto:markj@FreeBSD.org" class="">markj@FreeBSD.org</a>> wrote:<br class=""><br class=""><blockquote type="cite" class="">On Mon, Mar 07, 2022 at 04:25:22PM +0000, Andrew Turner wrote:<br class=""><blockquote type="cite" class=""><br class=""><blockquote type="cite" class="">On 7 Mar 2022, at 15:13, Mark Johnston <<a href="mailto:markj@freebsd.org" class="">markj@freebsd.org</a>> wrote:<br class="">...<br class="">A (the?) problem is that the compiler is treating "pc" as an alias<br class="">for x18, but the rmlock code assumes that the pcpu pointer is loaded<br class="">once, as it dereferences "pc" outside of the critical section. On<br class="">arm64, if a context switch occurs between the store at _rm_rlock+144 and<br class="">the load at +152, and the thread is migrated to another CPU, then we'll<br class="">end up using the wrong CPU ID in the rm->rm_writecpus test.<br class=""><br class="">I suspect the problem is unique to arm64 as its get_pcpu()<br class="">implementation is different from the others in that it doesn't use<br class="">volatile-qualified inline assembly. This has been the case since<br class=""><a href="https://cgit.freebsd.org/src/commit/?id=63c858a04d56529eddbddf85ad04fc8e99e73762" class="">https://cgit.freebsd.org/src/commit/?id=63c858a04d56529eddbddf85ad04fc8e99e73762</a> <<a href="https://cgit.freebsd.org/src/commit/?id=63c858a04d56529eddbddf85ad04fc8e99e73762" class="">https://cgit.freebsd.org/src/commit/?id=63c858a04d56529eddbddf85ad04fc8e99e73762</a>><br class="">.<br class=""><br class="">I haven't been able to reproduce any crashes running poudriere in an<br class="">arm64 AWS instance, though. Could you please try the patch below and<br class="">confirm whether it fixes your panics? I verified that the apparent<br class="">problem described above is gone with the patch.<br class=""></blockquote><br class="">Alternatively (or additionally) we could do something like the following. There are only a few MI users of get_pcpu with the main place being in rm locks.<br class=""><br class="">diff --git a/sys/arm64/include/pcpu.h b/sys/arm64/include/pcpu.h<br class="">index 09f6361c651c..59b890e5c2ea 100644<br class="">--- a/sys/arm64/include/pcpu.h<br class="">+++ b/sys/arm64/include/pcpu.h<br class="">@@ -58,7 +58,14 @@ struct pcpu;<br class=""><br class="">register struct pcpu *pcpup __asm ("x18");<br class=""><br class="">-#define get_pcpu() pcpup<br class="">+static inline struct pcpu *<br class="">+get_pcpu(void)<br class="">+{<br class="">+ struct pcpu *pcpu;<br class="">+<br class="">+ __asm __volatile("mov %0, x18" : "=&r"(pcpu));<br class="">+ return (pcpu);<br class="">+}<br class=""><br class="">static inline struct thread *<br class="">get_curthread(void)<br class=""></blockquote><br class="">Indeed, I think this is probably the best solution.<br class=""></blockquote></blockquote></div></blockquote><div><br class=""></div><div>I’ve pushed the above to git in ed3066342660 & will MFC in a few days.</div><br class=""><blockquote type="cite" class=""><div class=""><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">Thinking a bit more, even with that patch, code like this may not behave</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">the same on arm64 as on other platforms:</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">critical_enter();</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">ptr = &PCPU_GET(foo);</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">critical_exit();</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">bar = *ptr;</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">since as far as I can see the compiler may translate it to</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">critical_enter();</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">critical_exit();</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">bar = PCPU_GET(foo);</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""></div></blockquote><div><br class=""></div><div>If we think this will be a problem we could change the PCPU_PTR macro to use get_pcpu again, however I only see two places it’s used in the MI code in subr_witness.c and kern_clock.c. Neither of these appear to be problematic from a quick look as there are no critical sections, although I’m not familiar enough with the code to know for certain.</div><div><br class=""></div><div>Andrew</div></div></body></html>help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FB6C78DE-A043-4E99-BF17-7DC2F638E685>
