Date: Tue, 14 Jun 2016 08:13:34 -0700 From: Sean Bruno <sbruno@freebsd.org> To: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org> Subject: lagg(4): LOR, deadlock and panic Message-ID: <459d2639-b490-beee-9cd4-05f38983eaed@freebsd.org>
next in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --Is90jhGO75kilTfiCxPNF5lta6GvJPfeR Content-Type: multipart/mixed; boundary="kUdht4diC0KWBgcQ6tLskKqTDJCRT2ow6" From: Sean Bruno <sbruno@freebsd.org> To: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org> Message-ID: <459d2639-b490-beee-9cd4-05f38983eaed@freebsd.org> Subject: lagg(4): LOR, deadlock and panic --kUdht4diC0KWBgcQ6tLskKqTDJCRT2ow6 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable tl;dr --> https://reviews.freebsd.org/D6845 Navdeep and I have been poking at an LOR that seems to be popping up in -current that is related to lagg(4) and lagg_get_counter(). root@sysdev07:~ # ifconfig lagg0 create laggport ix0 laggproto lacp 192.168.100.11/24 lagg0: link state changed to DOWN root@sysdev07:~ # ifconfig ix0 up lock order reversal: 1st 0xfffff8002d7c9190 if_addr_lock (if_addr_lock) @ /usr/home/sbruno/fbsd_head/sys/net/rtsock.c:1717 2nd 0xfffff800271a5808 if_lagg rmlock (if_lagg rmlock) @ /usr/home/sbruno/fbsd_head/sys/modules/if_lagg/../../net/if_lagg.c:1057 stack backtrace: #0 0xffffffff80aa5ab0 at witness_debugger+0x70 #1 0xffffffff80aa59a4 at witness_checkorder+0xe54 #2 0xffffffff80a42521 at _rm_rlock_debug+0x111 #3 0xffffffff82222b2c at lagg_get_counter+0x4c #4 0xffffffff80b2ebd1 at if_data_copy+0xa1 #5 0xffffffff80b533bc at sysctl_rtsock+0x56c #6 0xffffffff80a53f0a at sysctl_root_handler_locked+0x8a #7 0xffffffff80a536c8 at sysctl_root+0x188 #8 0xffffffff80a53cbe at userland_sysctl+0x16e #9 0xffffffff80a53b14 at sys___sysctl+0x74 #10 0xffffffff80eb5b3b at amd64_syscall+0x2db #11 0xffffffff80e95c4b at Xfast_syscall+0xfb Running a netstat -w 1 in the backgrouund while repeatedly creating destroying the interface lagg0 will lead to either a panic or a deadlock:= e.g. netstat -w 1 > /dev/null & while [ 1 ]; do ifconfig lagg0 destroy ifconfig lagg0 create laggport ix0 laggproto lacp 192.168.100.11/24 done When the system deadlocks on the console, kdb sees the locks held like this: KDB: enter: Break to debugger [ thread pid 11 tid 100007 ] Stopped at kdb_alt_break_internal+0x18e: movq $0,kdb_why db> show allocks No such command db> show alllocks Process 2173 (ifconfig) thread 0xfffff8002d125a00 (100186) exclusive rm if_lagg rmlock (if_lagg rmlock) r =3D 0 (0xfffff8002717e408) locked @ /usr/home/sbruno/fbsd_head/sys/modules/if_lagg/../../net/if_lagg.c:1530 exclusive sleep mutex in6_multi_mtx (in6_multi_mtx) r =3D 0 (0xffffffff81d7e288) locked @ /usr/home/sbruno/fbsd_head/sys/netinet6/in6_mcast.c:1142 Process 792 (netstat) thread 0xfffff80027e67a00 (100167) shared rw if_addr_lock (if_addr_lock) r =3D 0 (0xfffff80103e95190) locked @ /usr/home/sbruno/fbsd_head/sys/net/rtsock.c:1717 shared rw ifnet_rw (ifnet_rw) r =3D 0 (0xffffffff81d7b760) locked @ /usr/home/sbruno/fbsd_head/sys/net/rtsock.c:1713 exclusive sleep mutex Giant (Giant) r =3D 0 (0xffffffff81d55e08) locked @ /usr/home/sbruno/fbsd_head/sys/kern/kern_sysctl.c:164 This looks like the netstat is causing a call into the counter function while the destruction or creation is ongoing. Removing the LAGG_RLOCK() calls from lagg_get_counter() makes the deadlock, LOR and panic go away, however this can't be that easy. I'm unsure what the RLOCK is for in lagg_get_counter(). It appears that there is a higher lock in the ifnet access that is protecting simultaneous access already, but I'm very ignorant of what's going on here. I don't see any other driver with locks in its get_counter() functions, so I'm not sure what the best course of action here is. Sean --kUdht4diC0KWBgcQ6tLskKqTDJCRT2ow6-- --Is90jhGO75kilTfiCxPNF5lta6GvJPfeR Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQF8BAEBCgBmBQJXYB8iXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRCQUFENDYzMkU3MTIxREU4RDIwOTk3REQx MjAxRUZDQTFFNzI3RTY0AAoJEBIB78oecn5kxXAIAMct0GyKd0fgQfCpxzwCuOHE Wr2sH1wjaVIhj3tRBYFvpd9OcAb5UKTUX1qyiOJrn6LJDzetKmZbiTblGDcteJx/ bCp+Zq+/dxD5FoxJEWqLDLXFipdo2i6xX+rJ9zvIOt1gmzhLuesU40lM0cVFTSZA BMO+a6362ECT7OCNyPUK8Bo5WrLBp0rwbdbsybNFl9anB0A9CXy1Kk9hMcueuGdd QjRJ5e3kmIzEkjbX97v52+s2inLSXSNuIBmzxYk5nYuTgWwf2jyef+rel/dLKr6e LwZYoK1SlMSnpG3dHxNwCkfEndVSkU2XNQvxxUxwTPzmQb0cdzBaNgb1RN3/ewk= =pc1V -----END PGP SIGNATURE----- --Is90jhGO75kilTfiCxPNF5lta6GvJPfeR--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?459d2639-b490-beee-9cd4-05f38983eaed>