Date: Mon, 11 Sep 2017 02:21:35 -0700 From: Mark Millard <markmi@dsl-only.net> To: FreeBSD Toolchain <freebsd-toolchain@freebsd.org>, freebsd-arm <freebsd-arm@freebsd.org>, freebsd-hackers <freebsd-hackers@freebsd.org> Subject: Re: head -r323246 Pine64+ 2GB boot time context: acquiring blockable sleep lock with spinlock or critical section held for data_abort calling pmap_fault calling __mtx_lock_flags Message-ID: <E1C7882D-03A9-45F5-97BE-8A181076559D@dsl-only.net> In-Reply-To: <9DB26517-E4E0-4B2A-9855-9F7381AD4C66@dsl-only.net> References: <8419C238-702D-4BF7-89DB-EC649CD405A5@dsl-only.net> <9DB26517-E4E0-4B2A-9855-9F7381AD4C66@dsl-only.net>
next in thread | previous in thread | raw e-mail | index | archive | help
[I got another blockable sleep lock panic during
the Pine64+ 2GB boot, this time with ddb> input
working. I show both the older example and the
new one.]
On 2017-Sep-10, at 3:25 PM, Mark Millard <markmi@dsl-only.net> wrote:
> [I got a boot-time panic with a debug kernel that
> reported a "acquiring blockable sleep lock with
> spinlock or critical section held (sleep mutex)".
> This was for data_abort calling pmap_fault calling
> __mtx_lock_flags . I first include prior non-debug
> kernel reports in case they are related.]
>=20
> . . .
>=20
> . . .
> Release APs
> APs not started
> CPU 0: ARM Cortex-A53 r0p4 affinity: 0
> Instruction Set Attributes 0 =3D <AES+PMULL,SHA1,SHA2,CRC32>
> Instruction Set Attributes 1 =3D <0>
> Processor Features 0 =3D <AdvSIMD,Float,EL3 32,EL2 32,EL1 =
32,EL0 32>
> Processor Features 1 =3D <0>
> Memory Model Features 0 =3D <4k Granule,64k =
Granule,MixedEndian,S/NS Mem,16bit ASID,1TB PA>
> Memory Model Features 1 =3D <>
> Debug Features 0 =3D <2 CTX Breakpoints,4 Watchpoints,6 =
Breakpoints,PMUv3,Debug v8>
> Debug Features 1 =3D <0>
> Auxiliary Features 0 =3D <0>
> Auxiliary Features 1 =3D <0>
> CPU 1: (null) (null) r0p0 affinity: 0
> CPU 2: (null) (null) r0p0 affinity: 0
> CPU 3: (null) (null) r0p0 affinity: 0
> panic: acquiring blockable sleep lock with spinlock or critical =
section held (sleep mutex) pmap @ /usr/src/sys/arm64/arm64/pmap.c:4710
> cpuid =3D 0
> time =3D 13
> KDB: stack backtrace:
> db_trace_self() at db_trace_self_wrapper+0x28
> pc =3D 0xffff0000005efc78 lr =3D 0xffff000000088094
> sp =3D 0xffff000069850080 fp =3D 0xffff000069850290
>=20
> db_trace_self_wrapper() at vpanic+0x164
> pc =3D 0xffff000000088094 lr =3D 0xffff00000031764c
> sp =3D 0xffff0000698502a0 fp =3D 0xffff000069850310
>=20
> vpanic() at kassert_panic+0x15c
> pc =3D 0xffff00000031764c lr =3D 0xffff0000003174e4
> sp =3D 0xffff000069850320 fp =3D 0xffff0000698503e0
>=20
> kassert_panic() at witness_checkorder+0x160
> pc =3D 0xffff0000003174e4 lr =3D 0xffff000000374990
> sp =3D 0xffff0000698503f0 fp =3D 0xffff000069850470
>=20
> witness_checkorder() at __mtx_lock_flags+0xa8
> pc =3D 0xffff000000374990 lr =3D 0xffff0000002f8b7c
> sp =3D 0xffff000069850480 fp =3D 0xffff0000698504b0
>=20
> __mtx_lock_flags() at pmap_fault+0x40
> pc =3D 0xffff0000002f8b7c lr =3D 0xffff000000606994
> sp =3D 0xffff0000698504c0 fp =3D 0xffff0000698504e0
>=20
> pmap_fault() at data_abort+0xb8
> pc =3D 0xffff000000606994 lr =3D 0xffff000000608a9c
> sp =3D 0xffff0000698504f0 fp =3D 0xffff0000698505a0
>=20
> data_abort() at do_el1h_sync+0xfc
> pc =3D 0xffff000000608a9c lr =3D 0xffff0000006088f0
> sp =3D 0xffff0000698505b0 fp =3D 0xffff0000698505e0
>=20
> do_el1h_sync() at handle_el1h_sync+0x74
> pc =3D 0xffff0000006088f0 lr =3D 0xffff0000005f1874
> sp =3D 0xffff0000698505f0 fp =3D 0xffff000069850700
>=20
> handle_el1h_sync() at sched_switch+0x2a8
> pc =3D 0xffff0000005f1874 lr =3D 0xffff00000033f0c8
> sp =3D 0xffff000069850710 fp =3D 0xffff0000698507f0
>=20
> sched_switch() at mi_switch+0x1b8
> pc =3D 0xffff00000033f0c8 lr =3D 0xffff00000032161c
> sp =3D 0xffff000069850800 fp =3D 0xffff000069850820
>=20
> mi_switch() at taskqgroup_binder+0x7c
> pc =3D 0xffff00000032161c lr =3D 0xffff00000035510c
> sp =3D 0xffff000069850830 fp =3D 0xffff000069850860
>=20
> taskqgroup_binder() at gtaskqueue_run_locked+0x104
> pc =3D 0xffff00000035510c lr =3D 0xffff000000354f74
> sp =3D 0xffff000069850870 fp =3D 0xffff0000698508e0
>=20
> gtaskqueue_run_locked() at gtaskqueue_thread_loop+0x9c
> pc =3D 0xffff000000354f74 lr =3D 0xffff000000354d10
> sp =3D 0xffff0000698508f0 fp =3D 0xffff000069850910
>=20
> gtaskqueue_thread_loop() at fork_exit+0x7c
> pc =3D 0xffff000000354d10 lr =3D 0xffff0000002dbd3c
> sp =3D 0xffff000069850920 fp =3D 0xffff000069850950
>=20
> fork_exit() at fork_trampoline+0x10
> pc =3D 0xffff0000002dbd3c lr =3D 0xffff000000608664
> sp =3D 0xffff000069850960 fp =3D 0x0000000000000000
>=20
> KDB: enter: panic
> [ thread pid 0 tid 100058 ]
> Stopped at sched_switch+0x2b8: ldrb w9, [x8, #894]
> db>=20
>=20
> Unfortunately it was not taking console input so that is
> all I got.
=46rom the new example:
CPU 1: (null) (null) r0p0 affinity: 0
CPU 2: (null) (null) r0p0 affinity: 0
CPU 3: (null) (null) r0p0 affinity: 0
panic: acquiring blockable sleep lock with spinlock or critical section =
held (sleep mutex) pmap @ /usr/src/sys/arm64/arm64/pmap.c:4710
cpuid =3D 0
time =3D 13
KDB: stack backtrace:
db_trace_self() at db_trace_self_wrapper+0x28
pc =3D 0xffff0000005efc78 lr =3D 0xffff000000088094
sp =3D 0xffff000069850080 fp =3D 0xffff000069850290
db_trace_self_wrapper() at vpanic+0x164
pc =3D 0xffff000000088094 lr =3D 0xffff00000031764c
sp =3D 0xffff0000698502a0 fp =3D 0xffff000069850310
vpanic() at kassert_panic+0x15c
pc =3D 0xffff00000031764c lr =3D 0xffff0000003174e4
sp =3D 0xffff000069850320 fp =3D 0xffff0000698503e0
kassert_panic() at witness_checkorder+0x160
pc =3D 0xffff0000003174e4 lr =3D 0xffff000000374990
sp =3D 0xffff0000698503f0 fp =3D 0xffff000069850470
witness_checkorder() at __mtx_lock_flags+0xa8
pc =3D 0xffff000000374990 lr =3D 0xffff0000002f8b7c
sp =3D 0xffff000069850480 fp =3D 0xffff0000698504b0
__mtx_lock_flags() at pmap_fault+0x40
pc =3D 0xffff0000002f8b7c lr =3D 0xffff000000606994
sp =3D 0xffff0000698504c0 fp =3D 0xffff0000698504e0
pmap_fault() at data_abort+0xb8
pc =3D 0xffff000000606994 lr =3D 0xffff000000608a9c
sp =3D 0xffff0000698504f0 fp =3D 0xffff0000698505a0
data_abort() at do_el1h_sync+0xfc
pc =3D 0xffff000000608a9c lr =3D 0xffff0000006088f0
sp =3D 0xffff0000698505b0 fp =3D 0xffff0000698505e0
do_el1h_sync() at handle_el1h_sync+0x74
pc =3D 0xffff0000006088f0 lr =3D 0xffff0000005f1874
sp =3D 0xffff0000698505f0 fp =3D 0xffff000069850700
handle_el1h_sync() at sched_switch+0x2a8
pc =3D 0xffff0000005f1874 lr =3D 0xffff00000033f0c8
sp =3D 0xffff000069850710 fp =3D 0xffff0000698507f0
sched_switch() at mi_switch+0x1b8
pc =3D 0xffff00000033f0c8 lr =3D 0xffff00000032161c
sp =3D 0xffff000069850800 fp =3D 0xffff000069850820
mi_switch() at taskqgroup_binder+0x7c
pc =3D 0xffff00000032161c lr =3D 0xffff00000035510c
sp =3D 0xffff000069850830 fp =3D 0xffff000069850860
taskqgroup_binder() at gtaskqueue_run_locked+0x104
pc =3D 0xffff00000035510c lr =3D 0xffff000000354f74
sp =3D 0xffff000069850870 fp =3D 0xffff0000698508e0
gtaskqueue_run_locked() at gtaskqueue_thread_loop+0x9c
pc =3D 0xffff000000354f74 lr =3D 0xffff000000354d10
sp =3D 0xffff0000698508f0 fp =3D 0xffff000069850910
gtaskqueue_thread_loop() at fork_exit+0x7c
pc =3D 0xffff000000354d10 lr =3D 0xffff0000002dbd3c
sp =3D 0xffff000069850920 fp =3D 0xffff000069850950
fork_exit() at fork_trampoline+0x10
pc =3D 0xffff0000002dbd3c lr =3D 0xffff000000608664
sp =3D 0xffff000069850960 fp =3D 0x0000000000000000
KDB: enter: panic
[ thread pid 0 tid 100058 ]
Stopped at sched_switch+0x2b8: ldrb w9, [x8, #894]
It turns our that x8 is reported as holding the value zero:
db> show regs
No such command; use "help" to list available commands
db> show reg
spsr 0x9600000440000085
x0 0xffff000000ac1000 __pcpu+0x200
x1 0x4
x2 0xffff00000068a5cb $d.4+0x15c
x3 0x218 $d.9+0x1d8
x4 0
x5 0x11
x6 0xffff000000a45f20
x7 0x40 $d.14
x8 0
x9 0x5
x10 0xffff0000009a7d88 tdq_cpu+0xe08
x11 0x18
x12 0x1ddc88
x13 0x7ff707d0
x14 0
x15 0x7ff6e010
x16 0x2af8 $d.1+0x122e
x17 0x27c0 $d.1+0xef6
x18 0xffff000069850790
x19 0xfffffd0001415a80
x20 0xffff0000009a7c80 tdq_cpu+0xd00
x21 0xffff0000009a6f80 tdq_cpu
x22 0xffff0000009a7d1d tdq_cpu+0xd9d
x23 0x1
x24 0
x25 0xffff0000009a6f80 tdq_cpu
x26 0xffff000000c85000 dpcpu+0x158
x27 0xffff000000c85000 dpcpu+0x158
x28 0
x29 0xffff0000698507f0
lr 0xffff00000033f0cc sched_switch+0x2ac
elr 0xffff00000033f0dc sched_switch+0x2bc
sp 0xffff000069850790
sched_switch+0x2b8: ldrb w9, [x8, #894]
db> show lockchain
thread 100058 (pid 0, softirq_1) is on a run queue
db> show locks
db> show lock
db> show locktree
db> show sleepqueue
db> show sleepq
ddb> show sleepchain
thread 100058 (pid 0, softirq_1) is on a run queue
db> show alllocks
Process 0 (kernel) thread 0xffff000000acd500 (100000)
exclusive sleep mutex Giant (Giant) r =3D 0 (0xffff000000c5d860) locked =
@ /usr/src/sys/kern/kern_module.c:116
db> show allchains
chain 1:
thread 100049 (pid 18, vmdaemon) sleeping on 0xffff000000aa811c =
"psleep"
chain 2:
thread 100054 (pid 17, laundry: dom0) sleeping on 0xffff000000aa80c4 =
"launds"
chain 3:
thread 100055 (pid 17, uma) sleeping on 0xffff000000aa7b68 "umarcl"
chain 4:
thread 100047 (pid 16, mmcsd0: mmc/sd card) sleeping on =
0xfffffd0000638800 "mmcsd disk jobqueue"
chain 5:
thread 100046 (pid 15, soaiod4) sleeping on 0xffff000000a9dbe4 "-"
chain 6:
thread 100045 (pid 9, soaiod3) sleeping on 0xffff000000a9dbe4 "-"
chain 7:
thread 100044 (pid 8, soaiod2) sleeping on 0xffff000000a9dbe4 "-"
chain 8:
thread 100043 (pid 7, soaiod1) sleeping on 0xffff000000a9dbe4 "-"
chain 9:
thread 100036 (pid 5, sctp_iterator) sleeping on 0xffff000000c7bf20 =
"waiting_for_work"
chain 10:
thread 100028 (pid 14, usbus0) sleeping on 0xffff000040925358 "-"
chain 11:
thread 100029 (pid 14, usbus0) sleeping on 0xffff0000409253b0 "-"
chain 12:
thread 100030 (pid 14, usbus0) sleeping on 0xffff000040925408 "-"
chain 13:
thread 100031 (pid 14, usbus0) sleeping on 0xffff000040925460 "-"
chain 14:
thread 100032 (pid 14, usbus0) sleeping on 0xffff0000409254b8 "-"
chain 15:
thread 100025 (pid 4, doneq0) sleeping on 0xffff000000878280 "-"
chain 16:
thread 100042 (pid 4, scanner) sleeping on 0xffff0000008780c8 "-"
chain 17:
thread 100024 (pid 3, crypto returns) sleeping on 0xffff000000aa6008 =
"crypto_ret_wait"
chain 18:
thread 100023 (pid 2, crypto) sleeping on 0xffff000000aa5ec0 =
"crypto_wait"
chain 19:
thread 100019 (pid 13, g_event) sleeping on 0xffff000000c6a450 "-"
chain 20:
thread 100020 (pid 13, g_up) sleeping on 0xffff000000c6a460 "-"
chain 21:
thread 100021 (pid 13, g_down) sleeping on 0xffff000000c6a458 "-"
chain 22:
thread 100014 (pid 12, swi4: clock (0)) blocked on lock =
0xffff000000c5d860 (sleep mutex) "Giant"
thread 100000 (pid 0, swapper) is on a run queue
chain 23:
thread 100002 (pid 1, kernel) blocked on lock 0xffff000000c5d860 (sleep =
mutex) "Giant"
thread 100000 (pid 0, swapper) is on a run queue
chain 24:
thread 100001 (pid 10, audit) sleeping on 0xffff000000c808e0 =
"audit_worker_cv"
chain 25:
thread 100009 (pid 0, thread taskq) sleeping on 0xfffffd00005f2b00 "-"
chain 26:
thread 100010 (pid 0, aiod_kick taskq) sleeping on 0xfffffd00005f2a00 =
"-"
chain 27:
thread 100012 (pid 0, kqueue_ctx taskq) sleeping on 0xfffffd00005f2700 =
"-"
chain 28:
thread 100022 (pid 0, firmware taskq) sleeping on 0xfffffd00005f2000 =
"-"
chain 29:
thread 100037 (pid 0, acpi_task_0) sleeping on 0xfffffd00005f1400 "-"
chain 30:
thread 100038 (pid 0, acpi_task_1) sleeping on 0xfffffd00005f1400 "-"
chain 31:
thread 100039 (pid 0, acpi_task_2) sleeping on 0xfffffd00005f1400 "-"
chain 32:
thread 100041 (pid 0, CAM taskq) sleeping on 0xfffffd00005f1e00 "-"
chain 33:
thread 100056 (pid 0, if_config_tqg_0) sleeping on 0xfffffd00005f1300 =
"-"
chain 34:
thread 100057 (pid 0, softirq_0) sleeping on 0xfffffd00005f1200 "-"
chain 35:
thread 100059 (pid 0, softirq_2) sleeping on 0xfffffd00005f1000 "-"
chain 36:
thread 100060 (pid 0, softirq_3) sleeping on 0xfffffd00005f0e00 "-"
The code for:
panic: acquiring blockable sleep lock with spinlock or critical section =
held (sleep mutex) pmap @ /usr/src/sys/arm64/arm64/pmap.c:4710
is the PMAP_LOCK in:
int
pmap_fault(pmap_t pmap, uint64_t esr, uint64_t far)
{
#ifdef SMP
uint64_t par;
#endif
=20
switch (ESR_ELx_EXCEPTION(esr)) {
case EXCP_DATA_ABORT_L:
case EXCP_DATA_ABORT:
break;
default:
return (KERN_FAILURE);
}
=20
#ifdef SMP
PMAP_LOCK(pmap);
switch (esr & ISS_DATA_DFSC_MASK) {
case ISS_DATA_DFSC_TF_L0:
case ISS_DATA_DFSC_TF_L1:
case ISS_DATA_DFSC_TF_L2:
case ISS_DATA_DFSC_TF_L3:
/* Ask the MMU to check the address */
if (pmap =3D=3D kernel_pmap)
par =3D arm64_address_translate_s1e1r(far);
else
par =3D arm64_address_translate_s1e0r(far);
=20
/*
* If the translation was successful the address was =
invalid
* due to a break-before-make sequence. We can unlock =
and
* return success to the trap handler.
*/
if (PAR_SUCCESS(par)) {
PMAP_UNLOCK(pmap);
return (KERN_SUCCESS);
}
break;
default:
break;
}
PMAP_UNLOCK(pmap);
#endif
=20
return (KERN_FAILURE);
}
=3D=3D=3D
Mark Millard
markmi at dsl-only.net
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E1C7882D-03A9-45F5-97BE-8A181076559D>
