Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 7 Mar 2022 21:54:26 +0100 (CET)
From:      Ronald Klop <ronald-lists@klop.ws>
To:        Mark Johnston <markj@freebsd.org>
Cc:        bob prohaska <fbsd@www.zefox.net>, Mark Millard <marklmi@yahoo.com>, freebsd-arm@freebsd.org, freebsd-current <freebsd-current@freebsd.org>
Subject:   Re: panic: data abort in critical section or under mutex  (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on 14-CURRENT/aarch64 Feb 28))
Message-ID:  <1302689164.173.1646686466515@mailrelay>
In-Reply-To: <YiYhIQXl1sd4cOVS@nuc>
References:  <C2F96211-0180-45DA-872F-52358D9ED35B.ref@yahoo.com> <C2F96211-0180-45DA-872F-52358D9ED35B@yahoo.com> <1800459695.1.1646649539521@mailrelay> <132978150.92.1646660769467@mailrelay> <YiYhIQXl1sd4cOVS@nuc>

next in thread | previous in thread | raw e-mail | index | archive | help
------=_Part_172_1254189170.1646686466401
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

 
Van: Mark Johnston <markj@freebsd.org>
Datum: maandag, 7 maart 2022 16:13
Aan: Ronald Klop <ronald-lists@klop.ws>
CC: bob prohaska <fbsd@www.zefox.net>, Mark Millard <marklmi@yahoo.com>, freebsd-arm@freebsd.org, freebsd-current <freebsd-current@freebsd.org>
Onderwerp: Re: panic: data abort in critical section or under mutex (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on 14-CURRENT/aarch64 Feb 28))
> 
> On Mon, Mar 07, 2022 at 02:46:09PM +0100, Ronald Klop wrote:
> > Dear Mark Johnston,
> >
> > I did some binary search in the kernels and came to the conclusion that https://cgit.freebsd.org/src/commit/?id=1517b8d5a7f58897200497811de1b18809c07d3e still works and https://cgit.freebsd.org/src/commit/?id=407c34e735b5d17e2be574808a09e6d729b0a45a panics.
> >
> > I suspect your commit in https://cgit.freebsd.org/src/commit/?id=c84bb8cd771ce4bed58152e47a32dda470bef23a.
> >
> > Last panic:
> >
> > panic: vm_fault failed: ffff00000046e708 error 1
> > cpuid = 1
> > time = 1646660058
> > KDB: stack backtrace:
> > db_trace_self() at db_trace_self
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x30
> > vpanic() at vpanic+0x174
> > panic() at panic+0x44
> > data_abort() at data_abort+0x2e8
> > handle_el1h_sync() at handle_el1h_sync+0x10
> > --- exception, esr 0x96000004
> > _rm_rlock_debug() at _rm_rlock_debug+0x8c
> > osd_get() at osd_get+0x5c
> > zio_execute() at zio_execute+0xf8
> > taskqueue_run_locked() at taskqueue_run_locked+0x178
> > taskqueue_thread_loop() at taskqueue_thread_loop+0xc8
> > fork_exit() at fork_exit+0x74
> > fork_trampoline() at fork_trampoline+0x14
> > KDB: enter: panic
> > [ thread pid 0 tid 100129 ]
> > Stopped at      kdb_enter+0x44: undefined       f902011f
> > db>
> >
> > A more recent kernel (912df91) still panics. See below.
> >
> > Do you have time to look into this? What can I provide in information to help?
> 
> Hmm.  So after my rmlock commits, we have the following disassembly for
> _rm_rlock() (with a few annotations added by me).  Note that the pcpu
> pointer is stored in register x18 by convention.
> 
>    0xffff00000046e304 <+0>:     stp     x29, x30, [sp, #-16]!
>    0xffff00000046e308 <+4>:     mov     x29, sp
>    0xffff00000046e30c <+8>:     ldr     x8, [x18]
>    0xffff00000046e310 <+12>:    ldr     x9, [x18]
>    0xffff00000046e314 <+16>:    ldr     x10, [x18]
>    0xffff00000046e318 <+20>:    cmp     x9, x10
>    0xffff00000046e31c <+24>:    b.ne    0xffff00000046e3cc <_rm_rlock+200>  // b.any
>    0xffff00000046e320 <+28>:    ldr     x9, [x18]
>    0xffff00000046e324 <+32>:    ldrh    w9, [x9, #314]
>    0xffff00000046e328 <+36>:    cbnz    w9, 0xffff00000046e3c0 <_rm_rlock+188>
>    0xffff00000046e32c <+40>:    str     wzr, [x1, #32]
>    0xffff00000046e330 <+44>:    stp     x0, x8, [x1, #16]
>    0xffff00000046e334 <+48>:    ldrb    w9, [x0, #10]
>    0xffff00000046e338 <+52>:    tbz     w9, #4, 0xffff00000046e358 <_rm_rlock+84>
>    0xffff00000046e33c <+56>:    ldr     x9, [x18]
>    0xffff00000046e340 <+60>:    ldr     w10, [x9, #888]
>    0xffff00000046e344 <+64>:    add     w10, w10, #0x1
>    0xffff00000046e348 <+68>:    str     w10, [x9, #888]
>    0xffff00000046e34c <+72>:    ldr     x9, [x18]
>    0xffff00000046e350 <+76>:    ldr     w9, [x9, #888]
>    0xffff00000046e354 <+80>:    cbz     w9, 0xffff00000046e3f4 <_rm_rlock+240>
>    0xffff00000046e358 <+84>:    ldr     w9, [x8, #1212]
>    0xffff00000046e35c <+88>:    add     x10, x18, #0x90
>    0xffff00000046e360 <+92>:    add     w9, w9, #0x1
>    0xffff00000046e364 <+96>:    str     w9, [x8, #1212]  <------- critical_enter
>    0xffff00000046e368 <+100>:   str     x10, [x1, #8]
>    0xffff00000046e36c <+104>:   ldr     x9, [x18, #144]
>    0xffff00000046e370 <+108>:   str     x9, [x1]
>    0xffff00000046e374 <+112>:   str     x1, [x9, #8]
>    0xffff00000046e378 <+116>:   str     x1, [x18, #144]
>    0xffff00000046e37c <+120>:   ldr     x9, [x18]
>    0xffff00000046e380 <+124>:   ldr     w10, [x9, #356]
>    0xffff00000046e384 <+128>:   add     w10, w10, #0x1
>    0xffff00000046e388 <+132>:   str     w10, [x9, #356]
>    0xffff00000046e38c <+136>:   ldr     w9, [x8, #1212]
>    0xffff00000046e390 <+140>:   sub     w9, w9, #0x1
>    0xffff00000046e394 <+144>:   str     w9, [x8, #1212]  <------- critical_exit
>    0xffff00000046e398 <+148>:   ldrb    w8, [x8, #304]
>    0xffff00000046e39c <+152>:   ldr     w9, [x18, #60]   <------- loading &pc->pc_cpuid
>    ...
> 
> A (the?) problem is that the compiler is treating "pc" as an alias
> for x18, but the rmlock code assumes that the pcpu pointer is loaded
> once, as it dereferences "pc" outside of the critical section.  On
> arm64, if a context switch occurs between the store at _rm_rlock+144 and
> the load at +152, and the thread is migrated to another CPU, then we'll
> end up using the wrong CPU ID in the rm->rm_writecpus test.
> 
> I suspect the problem is unique to arm64 as its get_pcpu()
> implementation is different from the others in that it doesn't use
> volatile-qualified inline assembly.  This has been the case since
> https://cgit.freebsd.org/src/commit/?id=63c858a04d56529eddbddf85ad04fc8e99e73762
> .
> 
> I haven't been able to reproduce any crashes running poudriere in an
> arm64 AWS instance, though.  Could you please try the patch below and
> confirm whether it fixes your panics?  I verified that the apparent
> problem described above is gone with the patch.
> 
> diff --git a/sys/kern/kern_rmlock.c b/sys/kern/kern_rmlock.c
> index 0cdcfb8fec62..e51c25136ae0 100644
> --- a/sys/kern/kern_rmlock.c
> +++ b/sys/kern/kern_rmlock.c
> @@ -437,6 +437,7 @@ _rm_rlock(struct rmlock *rm, struct rm_priotracker *tracker, int trylock)
>  {
>     struct thread *td = curthread;
>     struct pcpu *pc;
> +   int cpuid;
>  
>     if (SCHEDULER_STOPPED())
>         return (1);
> @@ -452,6 +453,7 @@ _rm_rlock(struct rmlock *rm, struct rm_priotracker *tracker, int trylock)
>     atomic_interrupt_fence();
>  
>     pc = get_pcpu();
> +   cpuid = pc->pc_cpuid;
>     rm_tracker_add(pc, tracker);
>     sched_pin();
>  
> @@ -463,7 +465,7 @@ _rm_rlock(struct rmlock *rm, struct rm_priotracker *tracker, int trylock)
>      * conditional jump.
>      */
>     if (__predict_true(0 == (td->td_owepreempt |
> -       CPU_ISSET(pc->pc_cpuid, &rm->rm_writecpus))))
> +       CPU_ISSET(cpuid, &rm->rm_writecpus))))
>         return (1);
>  
>     /* We do not have a read token and need to acquire one. */
> 
> 
> 

Hi,

This patch paniced again:
x0: ffffa00005a31500                                                                                             
  x1: ffffa00005a0e000                                                                                                            
  x2:                2                                                                                                            
  x3: ffffa00076c4e9a0                                                                                                            
  x4:                0                                                                                                            
  x5:    e672743c8f9e5                                                                                                            
  x6:    dc89f70500ab1
  x7:               14
  x8: ffffa00005a31518
  x9:                1
 x10: ffffa00005a0e000
 x11:                0
 x12:                0
 x13:                a
 x14: 1013e6b85a8ecbe4
 x15:     1dce740d11a5
 x16: ffff3ea86e2434bf
 x17: fffffffffffffff2
 x18: ffff0000fe661800 (g_ctx + fcf9fa54)
 x19: ffffa00076c4e9a0
 x20: ffff0000fec39000 (g_ctx + fd577254)
 x21:                2
 x22: ffff0000419b6090 (g_ctx + 402f42e4)
 x23: ffff000000c0b137 (lockstat_enabled + 0)
 x24:              100
 x25: ffff000000c0b000 (version + a0)
 x26: ffff000000c0b000 (version + a0)
 x27: ffff000000c0b000 (version + a0)
 x28:                0
 x29: ffff0000fe661800 (g_ctx + fcf9fa54)
  sp: ffff0000fe661800
  lr: ffff00000154ea50 (zio_dva_throttle + 154)
 elr: ffff00000154ea80 (zio_dva_throttle + 184)
spsr:         60000045
 far:     2b753286b0b8
panic: Unknown kernel exception 0 esr_el1 2000000
cpuid = 1
time = 1646685857
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
vpanic() at vpanic+0x174
panic() at panic+0x44
do_el1h_sync() at do_el1h_sync+0x184
handle_el1h_sync() at handle_el1h_sync+0x10
--- exception, esr 0x2000000
zio_dva_throttle() at zio_dva_throttle+0x184
zio_execute() at zio_execute+0x58
KDB: enter: panic
[ thread pid 0 tid 100129 ]
Stopped at      kdb_enter+0x44: undefined       f901c11f
db>  



Will try the patch of Andrew next. Compilation might take a while so maybe it wil be tomorrow.

Regards,
Ronald.
 
------=_Part_172_1254189170.1646686466401
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit

<html><head></head><body>&nbsp;
<p><strong>Van:</strong> Mark Johnston &lt;markj@freebsd.org&gt;<br />
<strong>Datum:</strong> maandag, 7 maart 2022 16:13<br />
<strong>Aan:</strong> Ronald Klop &lt;ronald-lists@klop.ws&gt;<br />
<strong>CC:</strong> bob prohaska &lt;fbsd@www.zefox.net&gt;, Mark Millard &lt;marklmi@yahoo.com&gt;, freebsd-arm@freebsd.org, freebsd-current &lt;freebsd-current@freebsd.org&gt;<br />
<strong>Onderwerp:</strong> Re: panic: data abort in critical section or under mutex (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on 14-CURRENT/aarch64 Feb 28))</p>

<blockquote style="padding-right: 0px; padding-left: 5px; margin-left: 5px; border-left: #000000 2px solid; margin-right: 0px">
<div class="MessageRFC822Viewer" id="P">
<div class="TextPlainViewer" id="P.P">On Mon, Mar 07, 2022 at 02:46:09PM +0100, Ronald Klop wrote:<br />
&gt; Dear Mark Johnston,<br />
&gt;<br />
&gt; I did some binary search in the kernels and came to the conclusion that <a href="https://cgit.freebsd.org/src/commit/?id=1517b8d5a7f58897200497811de1b18809c07d3e">https://cgit.freebsd.org/src/commit/?id=1517b8d5a7f58897200497811de1b18809c07d3e</a>; still works and <a href="https://cgit.freebsd.org/src/commit/?id=407c34e735b5d17e2be574808a09e6d729b0a45a">https://cgit.freebsd.org/src/commit/?id=407c34e735b5d17e2be574808a09e6d729b0a45a</a>; panics.<br />
&gt;<br />
&gt; I suspect your commit in <a href="https://cgit.freebsd.org/src/commit/?id=c84bb8cd771ce4bed58152e47a32dda470bef23a">https://cgit.freebsd.org/src/commit/?id=c84bb8cd771ce4bed58152e47a32dda470bef23a</a>.<br />
&gt;<br />
&gt; Last panic:<br />
&gt;<br />
&gt; panic: vm_fault failed: ffff00000046e708 error 1<br />
&gt; cpuid = 1<br />
&gt; time = 1646660058<br />
&gt; KDB: stack backtrace:<br />
&gt; db_trace_self() at db_trace_self<br />
&gt; db_trace_self_wrapper() at db_trace_self_wrapper+0x30<br />
&gt; vpanic() at vpanic+0x174<br />
&gt; panic() at panic+0x44<br />
&gt; data_abort() at data_abort+0x2e8<br />
&gt; handle_el1h_sync() at handle_el1h_sync+0x10<br />
&gt; --- exception, esr 0x96000004<br />
&gt; _rm_rlock_debug() at _rm_rlock_debug+0x8c<br />
&gt; osd_get() at osd_get+0x5c<br />
&gt; zio_execute() at zio_execute+0xf8<br />
&gt; taskqueue_run_locked() at taskqueue_run_locked+0x178<br />
&gt; taskqueue_thread_loop() at taskqueue_thread_loop+0xc8<br />
&gt; fork_exit() at fork_exit+0x74<br />
&gt; fork_trampoline() at fork_trampoline+0x14<br />
&gt; KDB: enter: panic<br />
&gt; [ thread pid 0 tid 100129 ]<br />
&gt; Stopped at &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;kdb_enter+0x44: undefined &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;f902011f<br />
&gt; db&gt;<br />
&gt;<br />
&gt; A more recent kernel (912df91) still panics. See below.<br />
&gt;<br />
&gt; Do you have time to look into this? What can I provide in information to help?<br />
<br />
Hmm. &nbsp;So after my rmlock commits, we have the following disassembly for<br />
_rm_rlock() (with a few annotations added by me). &nbsp;Note that the pcpu<br />
pointer is stored in register x18 by convention.<br />
<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e304 &lt;+0&gt;: &nbsp;&nbsp;&nbsp;&nbsp;stp &nbsp;&nbsp;&nbsp;&nbsp;x29, x30, [sp, #-16]!<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e308 &lt;+4&gt;: &nbsp;&nbsp;&nbsp;&nbsp;mov &nbsp;&nbsp;&nbsp;&nbsp;x29, sp<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e30c &lt;+8&gt;: &nbsp;&nbsp;&nbsp;&nbsp;ldr &nbsp;&nbsp;&nbsp;&nbsp;x8, [x18]<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e310 &lt;+12&gt;: &nbsp;&nbsp;&nbsp;ldr &nbsp;&nbsp;&nbsp;&nbsp;x9, [x18]<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e314 &lt;+16&gt;: &nbsp;&nbsp;&nbsp;ldr &nbsp;&nbsp;&nbsp;&nbsp;x10, [x18]<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e318 &lt;+20&gt;: &nbsp;&nbsp;&nbsp;cmp &nbsp;&nbsp;&nbsp;&nbsp;x9, x10<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e31c &lt;+24&gt;: &nbsp;&nbsp;&nbsp;b.ne &nbsp;&nbsp;&nbsp;0xffff00000046e3cc &lt;_rm_rlock+200&gt; &nbsp;// b.any<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e320 &lt;+28&gt;: &nbsp;&nbsp;&nbsp;ldr &nbsp;&nbsp;&nbsp;&nbsp;x9, [x18]<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e324 &lt;+32&gt;: &nbsp;&nbsp;&nbsp;ldrh &nbsp;&nbsp;&nbsp;w9, [x9, #314]<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e328 &lt;+36&gt;: &nbsp;&nbsp;&nbsp;cbnz &nbsp;&nbsp;&nbsp;w9, 0xffff00000046e3c0 &lt;_rm_rlock+188&gt;<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e32c &lt;+40&gt;: &nbsp;&nbsp;&nbsp;str &nbsp;&nbsp;&nbsp;&nbsp;wzr, [x1, #32]<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e330 &lt;+44&gt;: &nbsp;&nbsp;&nbsp;stp &nbsp;&nbsp;&nbsp;&nbsp;x0, x8, [x1, #16]<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e334 &lt;+48&gt;: &nbsp;&nbsp;&nbsp;ldrb &nbsp;&nbsp;&nbsp;w9, [x0, #10]<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e338 &lt;+52&gt;: &nbsp;&nbsp;&nbsp;tbz &nbsp;&nbsp;&nbsp;&nbsp;w9, #4, 0xffff00000046e358 &lt;_rm_rlock+84&gt;<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e33c &lt;+56&gt;: &nbsp;&nbsp;&nbsp;ldr &nbsp;&nbsp;&nbsp;&nbsp;x9, [x18]<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e340 &lt;+60&gt;: &nbsp;&nbsp;&nbsp;ldr &nbsp;&nbsp;&nbsp;&nbsp;w10, [x9, #888]<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e344 &lt;+64&gt;: &nbsp;&nbsp;&nbsp;add &nbsp;&nbsp;&nbsp;&nbsp;w10, w10, #0x1<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e348 &lt;+68&gt;: &nbsp;&nbsp;&nbsp;str &nbsp;&nbsp;&nbsp;&nbsp;w10, [x9, #888]<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e34c &lt;+72&gt;: &nbsp;&nbsp;&nbsp;ldr &nbsp;&nbsp;&nbsp;&nbsp;x9, [x18]<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e350 &lt;+76&gt;: &nbsp;&nbsp;&nbsp;ldr &nbsp;&nbsp;&nbsp;&nbsp;w9, [x9, #888]<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e354 &lt;+80&gt;: &nbsp;&nbsp;&nbsp;cbz &nbsp;&nbsp;&nbsp;&nbsp;w9, 0xffff00000046e3f4 &lt;_rm_rlock+240&gt;<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e358 &lt;+84&gt;: &nbsp;&nbsp;&nbsp;ldr &nbsp;&nbsp;&nbsp;&nbsp;w9, [x8, #1212]<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e35c &lt;+88&gt;: &nbsp;&nbsp;&nbsp;add &nbsp;&nbsp;&nbsp;&nbsp;x10, x18, #0x90<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e360 &lt;+92&gt;: &nbsp;&nbsp;&nbsp;add &nbsp;&nbsp;&nbsp;&nbsp;w9, w9, #0x1<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e364 &lt;+96&gt;: &nbsp;&nbsp;&nbsp;str &nbsp;&nbsp;&nbsp;&nbsp;w9, [x8, #1212] &nbsp;&lt;------- critical_enter<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e368 &lt;+100&gt;: &nbsp;&nbsp;str &nbsp;&nbsp;&nbsp;&nbsp;x10, [x1, #8]<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e36c &lt;+104&gt;: &nbsp;&nbsp;ldr &nbsp;&nbsp;&nbsp;&nbsp;x9, [x18, #144]<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e370 &lt;+108&gt;: &nbsp;&nbsp;str &nbsp;&nbsp;&nbsp;&nbsp;x9, [x1]<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e374 &lt;+112&gt;: &nbsp;&nbsp;str &nbsp;&nbsp;&nbsp;&nbsp;x1, [x9, #8]<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e378 &lt;+116&gt;: &nbsp;&nbsp;str &nbsp;&nbsp;&nbsp;&nbsp;x1, [x18, #144]<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e37c &lt;+120&gt;: &nbsp;&nbsp;ldr &nbsp;&nbsp;&nbsp;&nbsp;x9, [x18]<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e380 &lt;+124&gt;: &nbsp;&nbsp;ldr &nbsp;&nbsp;&nbsp;&nbsp;w10, [x9, #356]<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e384 &lt;+128&gt;: &nbsp;&nbsp;add &nbsp;&nbsp;&nbsp;&nbsp;w10, w10, #0x1<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e388 &lt;+132&gt;: &nbsp;&nbsp;str &nbsp;&nbsp;&nbsp;&nbsp;w10, [x9, #356]<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e38c &lt;+136&gt;: &nbsp;&nbsp;ldr &nbsp;&nbsp;&nbsp;&nbsp;w9, [x8, #1212]<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e390 &lt;+140&gt;: &nbsp;&nbsp;sub &nbsp;&nbsp;&nbsp;&nbsp;w9, w9, #0x1<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e394 &lt;+144&gt;: &nbsp;&nbsp;str &nbsp;&nbsp;&nbsp;&nbsp;w9, [x8, #1212] &nbsp;&lt;------- critical_exit<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e398 &lt;+148&gt;: &nbsp;&nbsp;ldrb &nbsp;&nbsp;&nbsp;w8, [x8, #304]<br />
&nbsp;&nbsp;&nbsp;0xffff00000046e39c &lt;+152&gt;: &nbsp;&nbsp;ldr &nbsp;&nbsp;&nbsp;&nbsp;w9, [x18, #60] &nbsp;&nbsp;&lt;------- loading &amp;pc-&gt;pc_cpuid<br />
&nbsp;&nbsp;&nbsp;...<br />
<br />
A (the?) problem is that the compiler is treating &quot;pc&quot; as an alias<br />
for x18, but the rmlock code assumes that the pcpu pointer is loaded<br />
once, as it dereferences &quot;pc&quot; outside of the critical section. &nbsp;On<br />
arm64, if a context switch occurs between the store at _rm_rlock+144 and<br />
the load at +152, and the thread is migrated to another CPU, then we'll<br />
end up using the wrong CPU ID in the rm-&gt;rm_writecpus test.<br />
<br />
I suspect the problem is unique to arm64 as its get_pcpu()<br />
implementation is different from the others in that it doesn't use<br />
volatile-qualified inline assembly. &nbsp;This has been the case since<br />
<a href="https://cgit.freebsd.org/src/commit/?id=63c858a04d56529eddbddf85ad04fc8e99e73762">https://cgit.freebsd.org/src/commit/?id=63c858a04d56529eddbddf85ad04fc8e99e73762</a><br />
.<br />
<br />
I haven't been able to reproduce any crashes running poudriere in an<br />
arm64 AWS instance, though. &nbsp;Could you please try the patch below and<br />
confirm whether it fixes your panics? &nbsp;I verified that the apparent<br />
problem described above is gone with the patch.<br />
<br />
diff --git a/sys/kern/kern_rmlock.c b/sys/kern/kern_rmlock.c<br />
index 0cdcfb8fec62..e51c25136ae0 100644<br />
--- a/sys/kern/kern_rmlock.c<br />
+++ b/sys/kern/kern_rmlock.c<br />
@@ -437,6 +437,7 @@ _rm_rlock(struct rmlock *rm, struct rm_priotracker *tracker, int trylock)<br />
&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;struct thread *td = curthread;<br />
&nbsp;&nbsp;&nbsp;&nbsp;struct pcpu *pc;<br />
+ &nbsp;&nbsp;int cpuid;<br />
&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;if (SCHEDULER_STOPPED())<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return (1);<br />
@@ -452,6 +453,7 @@ _rm_rlock(struct rmlock *rm, struct rm_priotracker *tracker, int trylock)<br />
&nbsp;&nbsp;&nbsp;&nbsp;atomic_interrupt_fence();<br />
&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;pc = get_pcpu();<br />
+ &nbsp;&nbsp;cpuid = pc-&gt;pc_cpuid;<br />
&nbsp;&nbsp;&nbsp;&nbsp;rm_tracker_add(pc, tracker);<br />
&nbsp;&nbsp;&nbsp;&nbsp;sched_pin();<br />
&nbsp;<br />
@@ -463,7 +465,7 @@ _rm_rlock(struct rmlock *rm, struct rm_priotracker *tracker, int trylock)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;* conditional jump.<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*/<br />
&nbsp;&nbsp;&nbsp;&nbsp;if (__predict_true(0 == (td-&gt;td_owepreempt |<br />
- &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;CPU_ISSET(pc-&gt;pc_cpuid, &amp;rm-&gt;rm_writecpus))))<br />
+ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;CPU_ISSET(cpuid, &amp;rm-&gt;rm_writecpus))))<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return (1);<br />
&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;/* We do not have a read token and need to acquire one. */</div>

<hr /></div>
</blockquote>
<br />
Hi,<br />
<br />
This patch paniced again:
<pre>
x0: ffffa00005a31500&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;
&nbsp; x1: ffffa00005a0e000&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;
&nbsp; x2:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;
&nbsp; x3: ffffa00076c4e9a0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;
&nbsp; x4:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;
&nbsp; x5:&nbsp;&nbsp;&nbsp; e672743c8f9e5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;
&nbsp; x6:&nbsp;&nbsp;&nbsp; dc89f70500ab1
&nbsp; x7:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 14
&nbsp; x8: ffffa00005a31518
&nbsp; x9:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1
&nbsp;x10: ffffa00005a0e000
&nbsp;x11:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0
&nbsp;x12:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0
&nbsp;x13:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; a
&nbsp;x14: 1013e6b85a8ecbe4
&nbsp;x15:&nbsp;&nbsp;&nbsp;&nbsp; 1dce740d11a5
&nbsp;x16: ffff3ea86e2434bf
&nbsp;x17: fffffffffffffff2
&nbsp;x18: ffff0000fe661800 (g_ctx + fcf9fa54)
&nbsp;x19: ffffa00076c4e9a0
&nbsp;x20: ffff0000fec39000 (g_ctx + fd577254)
&nbsp;x21:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2
&nbsp;x22: ffff0000419b6090 (g_ctx + 402f42e4)
&nbsp;x23: ffff000000c0b137 (lockstat_enabled + 0)
&nbsp;x24:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 100
&nbsp;x25: ffff000000c0b000 (version + a0)
&nbsp;x26: ffff000000c0b000 (version + a0)
&nbsp;x27: ffff000000c0b000 (version + a0)
&nbsp;x28:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0
&nbsp;x29: ffff0000fe661800 (g_ctx + fcf9fa54)
&nbsp; sp: ffff0000fe661800
&nbsp; lr: ffff00000154ea50 (zio_dva_throttle + 154)
&nbsp;elr: ffff00000154ea80 (zio_dva_throttle + 184)
spsr:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 60000045
&nbsp;far:&nbsp;&nbsp;&nbsp;&nbsp; 2b753286b0b8
panic: Unknown kernel exception 0 esr_el1 2000000
cpuid = 1
time = 1646685857
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
vpanic() at vpanic+0x174
panic() at panic+0x44
do_el1h_sync() at do_el1h_sync+0x184
handle_el1h_sync() at handle_el1h_sync+0x10
--- exception, esr 0x2000000
zio_dva_throttle() at zio_dva_throttle+0x184
zio_execute() at zio_execute+0x58
KDB: enter: panic
[ thread pid 0 tid 100129 ]
Stopped at&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; kdb_enter+0x44: undefined&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; f901c11f
db&gt; &nbsp;


</pre>
Will try the patch of Andrew next. Compilation might take a while so maybe it wil be tomorrow.<br />
<br />
Regards,<br />
Ronald.<br />
&nbsp;</body></html>
------=_Part_172_1254189170.1646686466401--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1302689164.173.1646686466515>