From owner-freebsd-arm@freebsd.org  Fri Sep 15 04:14:59 2017
Return-Path: <owner-freebsd-arm@freebsd.org>
Delivered-To: freebsd-arm@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B6F53E23522
 for <freebsd-arm@mailman.ysv.freebsd.org>;
 Fri, 15 Sep 2017 04:14:59 +0000 (UTC)
 (envelope-from markmi@dsl-only.net)
Received: from asp.reflexion.net (outbound-mail-210-66.reflexion.net
 [208.70.210.66])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 74F3076C70
 for <freebsd-arm@freebsd.org>; Fri, 15 Sep 2017 04:14:58 +0000 (UTC)
 (envelope-from markmi@dsl-only.net)
Received: (qmail 17477 invoked from network); 15 Sep 2017 04:14:52 -0000
Received: from unknown (HELO mail-cs-02.app.dca.reflexion.local) (10.81.19.2)
 by 0 (rfx-qmail) with SMTP; 15 Sep 2017 04:14:52 -0000
Received: by mail-cs-02.app.dca.reflexion.local
 (Reflexion email security v8.40.2) with SMTP;
 Fri, 15 Sep 2017 00:14:52 -0400 (EDT)
Received: (qmail 19639 invoked from network); 15 Sep 2017 04:14:52 -0000
Received: from unknown (HELO iron2.pdx.net) (69.64.224.71)
 by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 15 Sep 2017 04:14:52 -0000
Received: from [192.168.1.109] (c-67-170-167-181.hsd1.or.comcast.net
 [67.170.167.181])
 by iron2.pdx.net (Postfix) with ESMTPSA id 90F37EC91CC;
 Thu, 14 Sep 2017 21:14:51 -0700 (PDT)
From: Mark Millard <markmi@dsl-only.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\))
Subject: Re: FYI: Pine64+ 2GB (so A64) booting and non-debug vs. debug kernel:
 nondebug+INVARIANTS+INVARIANT_SUPPORT sufficient to boot
Date: Thu, 14 Sep 2017 21:14:50 -0700
References: <1C18FF04-6772-4E9C-88C5-B8D5478C5809@dsl-only.net>
 <E450D0E4-0086-468B-9D0C-ED3DBBB13945@dsl-only.net>
 <6D63486A-E933-4CC2-9A24-0688BE01A0DA@dsl-only.net>
To: Emmanuel Vadot <manu@bidouilliste.com>,
 freebsd-arm <freebsd-arm@freebsd.org>,
 freebsd-hackers <freebsd-hackers@freebsd.org>
In-Reply-To: <6D63486A-E933-4CC2-9A24-0688BE01A0DA@dsl-only.net>
Message-Id: <8E15A747-3413-4537-9ECA-5EDAD1285351@dsl-only.net>
X-Mailer: Apple Mail (2.3273)
X-BeenThere: freebsd-arm@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Porting FreeBSD to ARM processors." <freebsd-arm.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-arm>,
 <mailto:freebsd-arm-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arm/>
List-Post: <mailto:freebsd-arm@freebsd.org>
List-Help: <mailto:freebsd-arm-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-arm>,
 <mailto:freebsd-arm-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 15 Sep 2017 04:14:59 -0000

[I've traced the failure back to the bad pointer
value that is in place when it was put to use. I
omit the prior details of earlier explorations
that got me into this area.]

Summary of the following investigations:

When the witness or invariant failure during:

taskqgroup_adjust_softirq(0)...

happens it traces back to the condition:

        pcpu_find(cpu)->pc_curthread =3D=3D NULL

in the code:

        ctd =3D pcpu_find(cpu)->pc_curthread;

for cpu =3D=3D 1 in tdq_notify (but inlined). It then
attempts to evaluate:

ctd->td_priority

which gets a data_abort but it is during when
blocked_lock is the container lock for the
thread.

In the witness included case this leads to:

panic: acquiring blockable sleep lock with spinlock or critical section =
held (sleep mutex) pmap @ /usr/src/sys/arm64/arm64/pmap.c:4710
cpuid =3D 0
time =3D 13

but that is a later consequence of the earlier
problem.

I'm not sure why pcpu_find(cpu)->pc_curthread
is still NULL but since the behavior is intermittent
for debug kernel builds it suggests a race for an
update that was initiated but not always finished
in time.

(I've had occasions of hours of reboots to try
to get a failure but mostly only needing a few.
I've not run into a long sequence of failures to
boot for a debug kernel but have had some
back-to-back ones.)


Supporting detail that lead to the above:

int
pmap_fault(pmap_t pmap, uint64_t esr, uint64_t far)

The far (fault address register) argument to pmap_fault is the=20
rd one (x2 below):

ffff000000606954 <pmap_fault> stp       x22, x21, [sp, #-48]!
ffff000000606958 <pmap_fault+0x4> stp   x20, x19, [sp, #16]
ffff00000060695c <pmap_fault+0x8> stp   x29, x30, [sp, #32]
ffff000000606960 <pmap_fault+0xc> add   x29, sp, #0x20
ffff000000606964 <pmap_fault+0x10> mov  x20, x2
ffff000000606968 <pmap_fault+0x14> mov  x22, x1
ffff00000060696c <pmap_fault+0x18> mov  x21, x0

For the failing call sequence far ends up stored on the stack
via the x20 save-to-stack in:

ffff0000002f8c0c <__mtx_lock_flags> stp x24, x23, [sp, #-64]!
ffff0000002f8c10 <__mtx_lock_flags+0x4> stp     x22, x21, [sp, #16]
ffff0000002f8c14 <__mtx_lock_flags+0x8> stp     x20, x19, [sp, #32]
ffff0000002f8c18 <__mtx_lock_flags+0xc> stp     x29, x30, [sp, #48]
ffff0000002f8c1c <__mtx_lock_flags+0x10> add    x29, sp, #0x30

Stack segment with a little context:

0xffff000069850470:     ffff0000698504b0                ffff0000002f8b80
0xffff000069850480:     ffff000000c6a528                0
0xffff000069850490:     96000004                        ffff000000c6a658
0xffff0000698504a0:     37e                             ffff000000c6a670
0xffff0000698504b0:     ffff0000698504e0                ffff000000606998

So it appears:

pmap_fault`esr =3D=3D 0x96000004
pmap_fault`pmap =3D=3D 0xffff000000c6a658 (vmspace0+0x130)
pmap_fault`far =3D=3D 0x37e

I'll note that 0x37e =3D 894 so it matches up with
x8 =3D=3D 0x0 for the likes of:

elr         0xffff00000033f0dc  sched_switch+0x2bc
. . .
ssched_switch+0x2b8:     ldrb    w9, [x8, #894]

matching:

db> show reg
. . .
x8                           0
. . .

So apparently sched_switch tried to access 0x37e

db> x/gx 0xffff000000606998  =20
pmap_fault+0x44:        f100111f927e0ec8

Part of the back trace is (for the example
debug kernel):

kassert_panic() at witness_checkorder+0x160
         pc =3D 0xffff0000003174e4  lr =3D 0xffff000000374990
         sp =3D 0xffff0000698503f0  fp =3D 0xffff000069850470

witness_checkorder() at __mtx_lock_flags+0xa8
         pc =3D 0xffff000000374990  lr =3D 0xffff0000002f8b7c
         sp =3D 0xffff000069850480  fp =3D 0xffff0000698504b0

__mtx_lock_flags() at pmap_fault+0x40
         pc =3D 0xffff0000002f8b7c  lr =3D 0xffff000000606994
         sp =3D 0xffff0000698504c0  fp =3D 0xffff0000698504e0
       =20
pmap_fault() at data_abort+0xb8
         pc =3D 0xffff000000606994  lr =3D 0xffff000000608a9c
         sp =3D 0xffff0000698504f0  fp =3D 0xffff0000698505a0

data_abort() at do_el1h_sync+0xfc
         pc =3D 0xffff000000608a9c  lr =3D 0xffff0000006088f0
         sp =3D 0xffff0000698505b0  fp =3D 0xffff0000698505e0

do_el1h_sync() at handle_el1h_sync+0x74
         pc =3D 0xffff0000006088f0  lr =3D 0xffff0000005f1874
         sp =3D 0xffff0000698505f0  fp =3D 0xffff000069850700

handle_el1h_sync() at sched_switch+0x2a8
         pc =3D 0xffff0000005f1874  lr =3D 0xffff00000033f0c8
         sp =3D 0xffff000069850710  fp =3D 0xffff0000698507f0

sched_switch() at mi_switch+0x1b8
         pc =3D 0xffff00000033f0c8  lr =3D 0xffff00000032161c
         sp =3D 0xffff000069850800  fp =3D 0xffff000069850820

mi_switch() at taskqgroup_binder+0x7c
         pc =3D 0xffff00000032161c  lr =3D 0xffff00000035510c
         sp =3D 0xffff000069850830  fp =3D 0xffff000069850860

taskqgroup_binder() at gtaskqueue_run_locked+0x104
         pc =3D 0xffff00000035510c  lr =3D 0xffff000000354f74
         sp =3D 0xffff000069850870  fp =3D 0xffff0000698508e0
       =20
gtaskqueue_run_locked() at gtaskqueue_thread_loop+0x9c
         pc =3D 0xffff000000354f74  lr =3D 0xffff000000354d10
         sp =3D 0xffff0000698508f0  fp =3D 0xffff000069850910

gtaskqueue_thread_loop() at fork_exit+0x7c
         pc =3D 0xffff000000354d10  lr =3D 0xffff0000002dbd3c
         sp =3D 0xffff000069850920  fp =3D 0xffff000069850950

fork_exit() at fork_trampoline+0x10
         pc =3D 0xffff0000002dbd3c  lr =3D 0xffff000000608664
         sp =3D 0xffff000069850960  fp =3D 0x0000000000000000

Note:

ffff00000033f0bc <sched_switch+0x29c> ldr       w0, [x19, #1292]
ffff00000033f0c0 <sched_switch+0x2a0> ldrb      w27, [x19, #894]
ffff00000033f0c4 <sched_switch+0x2a4> str       w0, [sp, #12]
ffff00000033f0c8 <sched_switch+0x2a8> bl        ffff000000359708 =
<pcpu_find>
ffff00000033f0cc <sched_switch+0x2ac> ldr       x8, [x0]
ffff00000033f0d0 <sched_switch+0x2b0> mov       w11, w27
ffff00000033f0d4 <sched_switch+0x2b4> adrp      x27, ffff000000c85000 =
<dpcpu+0x158>
ffff00000033f0d8 <sched_switch+0x2b8> ldrb      w9, [x8, #894]
ffff00000033f0dc <sched_switch+0x2bc> cmp       w11, w9
ffff00000033f0e0 <sched_switch+0x2c0> b.cs      ffff00000033f150 =
<sched_switch+0x330>  // b.hs, b.nlast

This is code for the later part of what is
shown below:

static void
tdq_notify(struct tdq *tdq, struct thread *td)
{
        struct thread *ctd;
        int pri;
        int cpu;

        if (tdq->tdq_ipipending)
                return;
        cpu =3D td_get_sched(td)->ts_cpu;
        pri =3D td->td_priority;
        ctd =3D pcpu_find(cpu)->pc_curthread;
        if (!sched_shouldpreempt(pri, ctd->td_priority, 1))
                return;
. . .
}

(Where: sched_shouldpreempt has been inlined and
some of it is interlaced.)

The failing [x8, #894] is the attempt to access: ctd->td_priority

In other words: ctd =3D=3D NULL resulted from the pcpu_find
(i.e., x8 =3D=3D 0 ). As for how it got to be zero:

db> show reg
spsr        0x9600000440000085
x0          0xffff000000ac1000  __pcpu+0x200
. . .

db> x/gx cpuid_to_pcpu,32
cpuid_to_pcpu:  ffff000000ac0e00                ffff000000ac1000
cpuid_to_pcpu+0x10:     ffff000000ac1200                ffff000000ac1400
cpuid_to_pcpu+0x20:     0                               0
. . .

(So cpu =3D=3D 1 .)

db> x/gx 0xffff000000ac1000,8
__pcpu+0x200:   0                               fffffd00005dda80
__pcpu+0x210:   0                               0
__pcpu+0x220:   0                               0
__pcpu+0x230:   100000000                       ffff000000ac1200

Thus it seems that at the time for cpu=3D=3D1 :

        pcpu_find(cpu)->pc_curthread =3D=3D NULL

(at __pcpu+0x200).

=3D=3D=3D
Mark Millard
markmi at dsl-only.net