Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 17 Oct 2018 21:23:26 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        FreeBSD Current <freebsd-current@freebsd.org>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Cc:        Matthew Macy <mmacy@freebsd.org>
Subject:   Re: head -r339076 based powerpc64 context: fatal kernel trap [ during /usr/tests/Kyuafile's sys/netipsec/tunnel/aes_cbc_128_hmac_sha1:v4 ]
Message-ID:  <0B19E87B-D44F-4B59-BB5A-0EC31974A1E8@yahoo.com>
In-Reply-To: <184CA6E2-AA51-4DE3-A3AA-1E9901316050@yahoo.com>
References:  <99B35B13-29AA-406D-941B-95408C603FEF@yahoo.com> <184CA6E2-AA51-4DE3-A3AA-1E9901316050@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[Booting a debug kernel reported a lock order
reversal that might be relevant. The problem
repeated again: seems to always fail in my
context. The backtrace is like the prior
one, but for the debug kernel build being used
this time.]

On 2018-Oct-17, at 6:29 PM, Mark Millard <marklmi at yahoo.com> wrote:

> [I got another data storage interrupt failure, again
> during kyaua showing:
>=20
> sys/netipsec/tunnel/aes_cbc_128_hmac_sha1:v4  ->
>=20
> but the backtrace looks different. See below.]
>=20
> On 2018-Oct-17, at 4:58 PM, Mark Millard <marklmi at yahoo.com> wrote:
>=20
>> On a powerpc64 with builworld buildkernel built via
>> devel/powerpc64-xtoolchain-gcc for head -r339076
>> (some source adjustments), and a system-cc-is-clang
>> I attempted a:
>>=20
>> # kyua test -k /usr/tests/Kyuafile
>>=20
>> It got to:
>>=20
>> sys/netinet/reuseport_lb:basic_ipv4  ->  failed: =
/usr/src/tests/sys/netinet/reuseport_lb.c:165: bind() failed: Address =
already in use  [0.014s]
>> sys/netinet/reuseport_lb:basic_ipv6  ->  failed: =
/usr/src/tests/sys/netinet/reuseport_lb.c:221: bind() failed: Address =
already in use  [0.014s]
>> sys/netipsec/tunnel/aes_cbc_128_hmac_sha1:v4  -> =20
>>=20
>> and then the system crashed. I am re-running to
>> see what happens.
>>=20
>> The context has a non-debug kernel (but with
>> symbols).
>>=20
>> Hand transcribed from a picture . . .
>>=20
>> fatal kernel trap:
>>=20
>> exception       =3D 0x300 (data storage interrupt)
>> virtual address =3D 0xbfffffffffba8530
>> dsisr           =3D 0x42000000
>> srr0            =3D 0x72b054
>> srr1            =3D 0x9000000000009032
>> current msr     =3D 0x9000000000009032
>> lr              =3D 0x69948c
>> curthread       =3D 0xc000000036f7f000
>> pid =3D 12798, comm =3D ifconfig
>>=20
>> [ thread pid 12798 tid 100312 ]
>> Stopped at lock_init+0x78 stw r9,0x8(r3)
>> db:0:kdb.enter.default> bt
>> Tracing pid 12798 tid 100312 td 0xc000000036f7f000
>> 0xe00000004646e330: at 0xe00000004646e36c
>> 0xe00000004646e360: at epair_modevent+0xf0
>> 0xe00000004646e410: at module_register_init+0xe8
>> 0xe00000004646e4a0: at linker_laod_module+0x6f8
>=20
> Should have been: linker_load_module
>=20
>> 0xe00000004646e580: at kern_kldload+0x150
>> 0xe00000004646e5e0: at sys_kldload+0xb80
>> 0xe00000004646e630: at trap+0xef4
>> 0xe00000004646e790: at powerpc_interrupt+0x12c
>> 0xe00000004646e820: user sc trap by 0x81017fcf8
>> srr1 =3D 0x900000000000f032
>> r1   =3D 0x3fffffffffffcfe0
>> cr   =3D 0x28022482
>> xer  =3D 0x20000000
>> ctr  =3D 0x81017fcf0
>> r2   =3D 0x810336300
>>=20
>>=20
>> # uname -apKU
>> FreeBSD FBSDG5L 12.0-ALPHA8 FreeBSD 12.0-ALPHA8 #4 r339076M: Mon Oct =
15 13:19:35 PDT 2018     =
markmi@FBSDG5L:/usr/obj/powerpc64vtsc_xtoolchain-gcc/powerpc.powerpc64/usr=
/src/powerpc.powerpc64/sys/GENERIC64vtsc-NODBG  powerpc powerpc64 =
1200084 1200084
>>=20
>> ports was at -r480180.
>>=20
>=20
> Again failed during:
>=20
> sys/netinet/reuseport_lb:basic_ipv4  ->  failed: =
/usr/src/tests/sys/netinet/reuseport_lb.c:165: bind() failed: Address =
already in use  [0.013s]
> sys/netinet/reuseport_lb:basic_ipv6  ->  failed: =
/usr/src/tests/sys/netinet/reuseport_lb.c:221: bind() failed: Address =
already in use  [0.013s]
> sys/netipsec/tunnel/aes_cbc_128_hmac_sha1:v4  -> =20
>=20
>=20
> The backtrace this time shows (hand transcribed):
>=20
> fatal kernel trap:
>=20
> exception       =3D 0x300 (data storage interrupt)
> virtual address =3D 0xc00000008cab6530
> dsisr           =3D 0x42000000
> srr0            =3D 0xe000000046e5b228
> srr1            =3D 0x9000000000009032
> current msr     =3D 0x9000000000009032
> lr              =3D 0xe000000046e5b220
> curthread       =3D 0xc00000000d48e000
> pid =3D 9666, comm =3D jail
>=20
> [ thread pid 9666 tid 100185 ]
> Stopped at vnet_epair_init+0x78: stdx r3,r29,r30
> db:0:kdb.enter.default> bt
> Tracing pid 9666 tid 100185 td 0xc00000000d48e000
> 0xe0000000470a1240: at vnet_sysinit+0x64
> 0xe0000000470a1270: at vnet_alloc+0xfc
> 0xe0000000470a12d0: at kern_jail_set+0x1e30
> 0xe0000000470a15e0: at sys_jail_set+08c
> 0xe0000000470a1630: at trap+0xef4
> 0xe0000000470a1790: at powerpc_interrupt+0x12c
> 0xe0000000470a1820: user sc trap by 0x81016a888
> srr1 =3D 0x900000000000f032
> r1   =3D 0x3fffffffffffd090
> cr   =3D 0x28002482
> xer  =3D 0x20000000
> ctr  =3D 0x81016a880
> r2   =3D 0x810322300
>=20
> I got a core.txt.0 this time. it reported:
>=20
> . . .
> epair3a: Ethernet address: 02:60:27:70:4b:0a
> epair3b: Ethernet address: 02:60:27:70:4b:0b
> epair3a: link state changed to UP
> epair3b: link state changed to UP
>=20
> fatal kernel trap:
>=20
>   exception       =3D 0x300 (data storage interrupt)
>   virtual address =3D 0xc00000008cab6530
>   dsisr           =3D 0x42000000
>   srr0            =3D 0xe000000046e5b228 (0xe000000046e5b228)
>   srr1            =3D 0x9000000000009032
>   current msr     =3D 0x9000000000009032
>   lr              =3D 0xe000000046e5b220 (0xe000000046e5b220)
>   curthread       =3D 0xc00000000d48e000
>          pid =3D 9666, comm =3D jail
>=20
>=20

epair3a: Ethernet address: 02:60:27:70:4b:0a
epair3b: Ethernet address: 02:60:27:70:4b:0b
epair3a: link state changed to UP
epair3b: link state changed to UP
lock order reversal:
 1st 0x13be260 allprison (allprison) @ /usr/src/sys/kern/kern_jail.c:960
 2nd 0x15964a0 vnet_sysinit_sxlock (vnet_sysinit_sxlock) @ =
/usr/src/sys/net/vnet.c:575
stack backtrace:
#0 0x6f6520 at witness_debugger+0xf4
#1 0x6f8440 at witness_checkorder+0xa1c
#2 0x675690 at _sx_slock_int+0x70
#3 0x675810 at _sx_slock+0x1c
#4 0x7f4338 at vnet_sysinit+0x38
#5 0x7f44dc at vnet_alloc+0x118
#6 0x62ab84 at kern_jail_set+0x3274
#7 0x62b62c at sys_jail_set+0x8c
#8 0xa8a798 at trap+0x9a0
#9 0xa7e660 at powerpc_interrupt+0x140

fatal kernel trap:

   exception       =3D 0x300 (data storage interrupt)
   virtual address =3D 0xc00000008df1df30
   dsisr           =3D 0x42000000
   srr0            =3D 0xe000000047854e98 (0xe000000047854e98)
   srr1            =3D 0x9000000000009032
   current msr     =3D 0x9000000000009032
   lr              =3D 0xe000000047854e90 (0xe000000047854e90)
   curthread       =3D 0xc0000000206b6000
          pid =3D 9464, comm =3D jail

(Hand transcribed from here on:)

[ thread pid 9464 tid 100296 ]
Stopped at vnet_epair_init+0x78: stdx r3,r29,r30
db:0:kdb.enter.default> bt
Tracing pid 9464 tid 100296 td 0xc0000000206b6000
0xe000000047274240: at vnet_sysinit+0x70
0xe000000047274270: at vnet_alloc+0x118
0xe000000047274300: at kern_jail_set+0x32740
0xe000000047274610: at sys_jail_set+08c
0xe000000047274660: at trap+0x9a0
0xe000000047274790: at powerpc_interrupt+0x140
0xe000000047274820: user sc trap by 0x81016a888
srr1 =3D 0x900000000000f032
r1   =3D 0x3fffffffffffd080
cr   =3D 0x28002482
xer  =3D 0x20000000
ctr  =3D 0x81016a880
r2   =3D 0x810322300



There are past reports of the lock order
reversal, such as:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D210907

=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0B19E87B-D44F-4B59-BB5A-0EC31974A1E8>