Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 12 Sep 2022 15:08:03 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Dmitry Salychev <dsl@FreeBSD.org>
Cc:        bob prohaska <fbsd@www.zefox.net>, Mark Johnston <markj@freebsd.org>, Andrew Turner <andrew@fubar.geek.nz>, Ronald Klop <ronald-lists@klop.ws>, freebsd-arm <freebsd-arm@freebsd.org>, freebsd-current@freebsd.org
Subject:   Re: panic: data abort in critical section or under mutex  (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on 14-CURRENT/aarch64 Feb 28))
Message-ID:  <9E1552DB-4A65-4DFF-BC79-CFE045ECF972@yahoo.com>
In-Reply-To: <86czc0eotc.fsf@peasant.tower.home>
References:  <C2F96211-0180-45DA-872F-52358D9ED35B.ref@yahoo.com> <C2F96211-0180-45DA-872F-52358D9ED35B@yahoo.com> <1800459695.1.1646649539521@mailrelay> <132978150.92.1646660769467@mailrelay> <YiYhIQXl1sd4cOVS@nuc> <3374E0F8-D712-4ED0-A62B-B6924FC8A5E2@fubar.geek.nz> <YiY2jmD97leKev0F@nuc> <20220308154204.GA37265@www.zefox.net> <86czc0eotc.fsf@peasant.tower.home>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2022-Sep-12, at 05:10, Dmitry Salychev <dsl@FreeBSD.org> wrote:

> <dpaa2_panics.txt>
> Hi,
>=20
> It seems that the recent 14-CURRENT/aarch64 (866e021) with DPAA2 =
drivers
> panics under network throughtput stress test in random places

3 of your examples get a signal handler called at the
exact same instruction:

#6  0xffff0000004ced5c in witness_lock

The parameters vary, as do the callers:

#7  0xffff00000043a3a8 in __mtx_lock_flags
(twice)
vs.
#7  0xffff00000047d4ec in callout_lock
(once)

Showing one more level, where all are distinct:

#8  0xffff0000007d60a8 in dpaa2_swp_enq_mult =
(swp=3Dswp@entry=3D0xffffa0000056ca00, ed=3Ded@entry=3D0xffff0000bcda2c70,=
 fd=3Dfd@entry=3D0xffff0000bcda2df8, =
flags=3Dflags@entry=3D0xffff0000bcda2c6c, frames_n=3Dframes_n@entry=3D1) =
at /usr/src/sys/dev/dpaa2/dpaa2_swp.c:795
vs.
#8  0xffff000000508f54 in soreceive_generic (so=3D0xffff00011d2c2200, =
psa=3D0x0, uio=3D<optimized out>, mp0=3D<optimized out>, controlp=3D0x0, =
flagsp=3D<optimized out>) at /usr/src/sys/kern/uipc_socket.c:2240
vs.
#8  callout_reset_sbt_on (c=3D0xffff0001121792c0, sbt=3D<optimized out>, =
prec=3D<optimized out>, ftn=3D0xffff00000047d4ec =
<callout_reset_sbt_on+204>, arg=3D0xffff000112179000, cpu=3D0, =
flags=3D256) at /usr/src/sys/kern/kern_timeout.c:962
(no address shown)

Perhaps looking at what the code at 0xffff0000004ced5c
(and before) is doing with what kinds of data would be
useful compared to the less frequent example signal
handler invocations. It is common to all 3 call-chains
above. If dumps for them are around, more than the code
might be able to be looked into.


> with
> unknown kernel exception 0 esr_el1 2000000 on Ten64 board (based on
> NXP's LS1088A, Cortex-A53), but the same code doesn't panic on =
HoneyComb
> (NXP LX2160A, Cortex-A72) even after ~10h long tests.
>=20
> I've gathered some stack backtraces from ddb and kgdb (attached).
> Panic itself can easily be reproduced after several minutes from the
> start of the test. I've tried to change PCPU_PTR macro to use get_pcpu
> again (as discussed in the thread earlier), but it didn't help.
>=20
> If you want to get your hands dirty, DPAA2 stuff I'm using is at
> https://github.com/mcusim/freebsd-src/tree/lx2160acex7-exp (branch is
> lx2160acex7-exp!)
>=20
> Any ideas or places to check would be really helpful.



=3D=3D=3D
Mark Millard
marklmi at yahoo.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9E1552DB-4A65-4DFF-BC79-CFE045ECF972>