Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 7 Mar 2022 14:04:09 -0500
From:      Mark Johnston <markj@freebsd.org>
To:        Mark Millard <marklmi@yahoo.com>
Cc:        FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>, Andrew Turner <andrew@fubar.geek.nz>, Ronald Klop <ronald-lists@klop.ws>, bob prohaska <fbsd@www.zefox.net>, Free BSD <freebsd-arm@freebsd.org>, freebsd-current <freebsd-current@freebsd.org>
Subject:   Re: panic: data abort in critical section or under mutex  (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on 14-CURRENT/aarch64 Feb 28))
Message-ID:  <YiZXKcX3mfLn2iNA@nuc>
In-Reply-To: <F25AAD14-209C-43AA-8496-8396F4C4EB76@yahoo.com>
References:  <C2F96211-0180-45DA-872F-52358D9ED35B.ref@yahoo.com> <C2F96211-0180-45DA-872F-52358D9ED35B@yahoo.com> <1800459695.1.1646649539521@mailrelay> <132978150.92.1646660769467@mailrelay> <YiYhIQXl1sd4cOVS@nuc> <3374E0F8-D712-4ED0-A62B-B6924FC8A5E2@fubar.geek.nz> <YiY2jmD97leKev0F@nuc> <F25AAD14-209C-43AA-8496-8396F4C4EB76@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Mar 07, 2022 at 10:03:51AM -0800, Mark Millard wrote:
> 
> 
> On 2022-Mar-7, at 08:45, Mark Johnston <markj@FreeBSD.org> wrote:
> 
> > On Mon, Mar 07, 2022 at 04:25:22PM +0000, Andrew Turner wrote:
> >> 
> >>> On 7 Mar 2022, at 15:13, Mark Johnston <markj@freebsd.org> wrote:
> >>> ...
> >>> A (the?) problem is that the compiler is treating "pc" as an alias
> >>> for x18, but the rmlock code assumes that the pcpu pointer is loaded
> >>> once, as it dereferences "pc" outside of the critical section.  On
> >>> arm64, if a context switch occurs between the store at _rm_rlock+144 and
> >>> the load at +152, and the thread is migrated to another CPU, then we'll
> >>> end up using the wrong CPU ID in the rm->rm_writecpus test.
> >>> 
> >>> I suspect the problem is unique to arm64 as its get_pcpu()
> >>> implementation is different from the others in that it doesn't use
> >>> volatile-qualified inline assembly.  This has been the case since
> >>> https://cgit.freebsd.org/src/commit/?id=63c858a04d56529eddbddf85ad04fc8e99e73762 <https://cgit.freebsd.org/src/commit/?id=63c858a04d56529eddbddf85ad04fc8e99e73762>;
> >>> .
> >>> 
> >>> I haven't been able to reproduce any crashes running poudriere in an
> >>> arm64 AWS instance, though.  Could you please try the patch below and
> >>> confirm whether it fixes your panics?  I verified that the apparent
> >>> problem described above is gone with the patch.
> >> 
> >> Alternatively (or additionally) we could do something like the following. There are only a few MI users of get_pcpu with the main place being in rm locks.
> >> 
> >> diff --git a/sys/arm64/include/pcpu.h b/sys/arm64/include/pcpu.h
> >> index 09f6361c651c..59b890e5c2ea 100644
> >> --- a/sys/arm64/include/pcpu.h
> >> +++ b/sys/arm64/include/pcpu.h
> >> @@ -58,7 +58,14 @@ struct pcpu;
> >> 
> >> register struct pcpu *pcpup __asm ("x18");
> >> 
> >> -#define        get_pcpu()      pcpup
> >> +static inline struct pcpu *
> >> +get_pcpu(void)
> >> +{
> >> +       struct pcpu *pcpu;
> >> +
> >> +       __asm __volatile("mov   %0, x18" : "=&r"(pcpu));
> >> +       return (pcpu);
> >> +}
> >> 
> >> static inline struct thread *
> >> get_curthread(void)
> > 
> > Indeed, I think this is probably the best solution.

Thinking a bit more, even with that patch, code like this may not behave
the same on arm64 as on other platforms:

critical_enter();
ptr = &PCPU_GET(foo);
critical_exit();
bar = *ptr;

since as far as I can see the compiler may translate it to

critical_enter();
critical_exit();
bar = PCPU_GET(foo);

> Is this just partially reverting:
> 
> https://cgit.freebsd.org/src/commit/?id=63c858a04d56
> 
> If so, there might need to be comments about why the updated
> code is as it will be.
> 
> Looks like stable/13 picked up sensitivity to the get_pcpu
> details in rmlock in:
> 
> https://cgit.freebsd.org/src/commit/?h=stable/13&id=543157870da5
> 
> (a 2022-03-04 commit) and stable/13 also has the get_pcpu
> misdefinition in:
> 
> https://cgit.freebsd.org/src/commit/sys/arm64/include/pcpu.h?h=stable/13&id=63c858a04d56
> 
> . So an MFC would be appropriate in order for aarch64
> to be reliable for any variations in get_pcpu in stable/13
> (and for 13.1 to be so as well).

I reverted the rmlock commit in stable/13 already.  Either get_pcpu()
will be fixed shortly or 13.1 will ship without the rmlock commit.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YiZXKcX3mfLn2iNA>