Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 14 Jun 2016 08:13:34 -0700
From:      Sean Bruno <sbruno@freebsd.org>
To:        "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject:   lagg(4): LOR, deadlock and panic
Message-ID:  <459d2639-b490-beee-9cd4-05f38983eaed@freebsd.org>

next in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--Is90jhGO75kilTfiCxPNF5lta6GvJPfeR
Content-Type: multipart/mixed; boundary="kUdht4diC0KWBgcQ6tLskKqTDJCRT2ow6"
From: Sean Bruno <sbruno@freebsd.org>
To: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Message-ID: <459d2639-b490-beee-9cd4-05f38983eaed@freebsd.org>
Subject: lagg(4): LOR, deadlock and panic

--kUdht4diC0KWBgcQ6tLskKqTDJCRT2ow6
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

tl;dr --> https://reviews.freebsd.org/D6845

Navdeep and I have been poking at an LOR that seems to be popping up
in -current that is related to lagg(4) and lagg_get_counter().

root@sysdev07:~ # ifconfig lagg0 create laggport ix0 laggproto lacp
192.168.100.11/24
lagg0: link state changed to DOWN
root@sysdev07:~ # ifconfig ix0 up
lock order reversal:
 1st 0xfffff8002d7c9190 if_addr_lock (if_addr_lock) @
/usr/home/sbruno/fbsd_head/sys/net/rtsock.c:1717
 2nd 0xfffff800271a5808 if_lagg rmlock (if_lagg rmlock) @
/usr/home/sbruno/fbsd_head/sys/modules/if_lagg/../../net/if_lagg.c:1057
stack backtrace:
#0 0xffffffff80aa5ab0 at witness_debugger+0x70
#1 0xffffffff80aa59a4 at witness_checkorder+0xe54
#2 0xffffffff80a42521 at _rm_rlock_debug+0x111
#3 0xffffffff82222b2c at lagg_get_counter+0x4c
#4 0xffffffff80b2ebd1 at if_data_copy+0xa1
#5 0xffffffff80b533bc at sysctl_rtsock+0x56c
#6 0xffffffff80a53f0a at sysctl_root_handler_locked+0x8a
#7 0xffffffff80a536c8 at sysctl_root+0x188
#8 0xffffffff80a53cbe at userland_sysctl+0x16e
#9 0xffffffff80a53b14 at sys___sysctl+0x74
#10 0xffffffff80eb5b3b at amd64_syscall+0x2db
#11 0xffffffff80e95c4b at Xfast_syscall+0xfb

Running a netstat -w 1 in the backgrouund while repeatedly creating
destroying the interface lagg0 will lead to either a panic or a deadlock:=


e.g. netstat -w 1 > /dev/null &
while [ 1 ]; do
ifconfig lagg0 destroy
ifconfig lagg0 create laggport ix0 laggproto lacp 192.168.100.11/24
done

When the system deadlocks on the console, kdb sees the locks held like
this:
KDB: enter: Break to debugger
[ thread pid 11 tid 100007 ]
Stopped at      kdb_alt_break_internal+0x18e:   movq    $0,kdb_why
db> show allocks
No such command
db> show alllocks
Process 2173 (ifconfig) thread 0xfffff8002d125a00 (100186)
exclusive rm if_lagg rmlock (if_lagg rmlock) r =3D 0
(0xfffff8002717e408) locked @
/usr/home/sbruno/fbsd_head/sys/modules/if_lagg/../../net/if_lagg.c:1530
exclusive sleep mutex in6_multi_mtx (in6_multi_mtx) r =3D 0
(0xffffffff81d7e288) locked @
/usr/home/sbruno/fbsd_head/sys/netinet6/in6_mcast.c:1142
Process 792 (netstat) thread 0xfffff80027e67a00 (100167)
shared rw if_addr_lock (if_addr_lock) r =3D 0 (0xfffff80103e95190)
locked @ /usr/home/sbruno/fbsd_head/sys/net/rtsock.c:1717
shared rw ifnet_rw (ifnet_rw) r =3D 0 (0xffffffff81d7b760) locked @
/usr/home/sbruno/fbsd_head/sys/net/rtsock.c:1713
exclusive sleep mutex Giant (Giant) r =3D 0 (0xffffffff81d55e08) locked
@ /usr/home/sbruno/fbsd_head/sys/kern/kern_sysctl.c:164

This looks like the netstat is causing a call into the counter
function while the destruction or creation is ongoing.

Removing the LAGG_RLOCK() calls from lagg_get_counter() makes the
deadlock, LOR and panic go away, however this can't be that easy.  I'm
unsure what the RLOCK is for in lagg_get_counter().  It appears that
there is a higher lock in the ifnet access that is protecting
simultaneous access already, but I'm very ignorant of what's going on
here.

I don't see any other driver with locks in its get_counter()
functions, so I'm not sure what the best course of action here is.

Sean




--kUdht4diC0KWBgcQ6tLskKqTDJCRT2ow6--

--Is90jhGO75kilTfiCxPNF5lta6GvJPfeR
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQF8BAEBCgBmBQJXYB8iXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w
ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRCQUFENDYzMkU3MTIxREU4RDIwOTk3REQx
MjAxRUZDQTFFNzI3RTY0AAoJEBIB78oecn5kxXAIAMct0GyKd0fgQfCpxzwCuOHE
Wr2sH1wjaVIhj3tRBYFvpd9OcAb5UKTUX1qyiOJrn6LJDzetKmZbiTblGDcteJx/
bCp+Zq+/dxD5FoxJEWqLDLXFipdo2i6xX+rJ9zvIOt1gmzhLuesU40lM0cVFTSZA
BMO+a6362ECT7OCNyPUK8Bo5WrLBp0rwbdbsybNFl9anB0A9CXy1Kk9hMcueuGdd
QjRJ5e3kmIzEkjbX97v52+s2inLSXSNuIBmzxYk5nYuTgWwf2jyef+rel/dLKr6e
LwZYoK1SlMSnpG3dHxNwCkfEndVSkU2XNQvxxUxwTPzmQb0cdzBaNgb1RN3/ewk=
=pc1V
-----END PGP SIGNATURE-----

--Is90jhGO75kilTfiCxPNF5lta6GvJPfeR--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?459d2639-b490-beee-9cd4-05f38983eaed>