Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 14 Jun 2016 09:26:22 -0600
From:      Alan Somers <asomers@freebsd.org>
To:        Sean Bruno <sbruno@freebsd.org>
Cc:        "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject:   Re: lagg(4): LOR, deadlock and panic
Message-ID:  <CAOtMX2jHHJjW6gsXvYWNdqzq0ms0p0dYBFjSZLP3jfwahRGRMw@mail.gmail.com>
In-Reply-To: <459d2639-b490-beee-9cd4-05f38983eaed@freebsd.org>
References:  <459d2639-b490-beee-9cd4-05f38983eaed@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jun 14, 2016 at 9:13 AM, Sean Bruno <sbruno@freebsd.org> wrote:
> tl;dr --> https://reviews.freebsd.org/D6845
>
> Navdeep and I have been poking at an LOR that seems to be popping up
> in -current that is related to lagg(4) and lagg_get_counter().
>
> root@sysdev07:~ # ifconfig lagg0 create laggport ix0 laggproto lacp
> 192.168.100.11/24
> lagg0: link state changed to DOWN
> root@sysdev07:~ # ifconfig ix0 up
> lock order reversal:
>  1st 0xfffff8002d7c9190 if_addr_lock (if_addr_lock) @
> /usr/home/sbruno/fbsd_head/sys/net/rtsock.c:1717
>  2nd 0xfffff800271a5808 if_lagg rmlock (if_lagg rmlock) @
> /usr/home/sbruno/fbsd_head/sys/modules/if_lagg/../../net/if_lagg.c:1057
> stack backtrace:
> #0 0xffffffff80aa5ab0 at witness_debugger+0x70
> #1 0xffffffff80aa59a4 at witness_checkorder+0xe54
> #2 0xffffffff80a42521 at _rm_rlock_debug+0x111
> #3 0xffffffff82222b2c at lagg_get_counter+0x4c
> #4 0xffffffff80b2ebd1 at if_data_copy+0xa1
> #5 0xffffffff80b533bc at sysctl_rtsock+0x56c
> #6 0xffffffff80a53f0a at sysctl_root_handler_locked+0x8a
> #7 0xffffffff80a536c8 at sysctl_root+0x188
> #8 0xffffffff80a53cbe at userland_sysctl+0x16e
> #9 0xffffffff80a53b14 at sys___sysctl+0x74
> #10 0xffffffff80eb5b3b at amd64_syscall+0x2db
> #11 0xffffffff80e95c4b at Xfast_syscall+0xfb
>
> Running a netstat -w 1 in the backgrouund while repeatedly creating
> destroying the interface lagg0 will lead to either a panic or a deadlock:
>
> e.g. netstat -w 1 > /dev/null &
> while [ 1 ]; do
> ifconfig lagg0 destroy
> ifconfig lagg0 create laggport ix0 laggproto lacp 192.168.100.11/24
> done
>
> When the system deadlocks on the console, kdb sees the locks held like
> this:
> KDB: enter: Break to debugger
> [ thread pid 11 tid 100007 ]
> Stopped at      kdb_alt_break_internal+0x18e:   movq    $0,kdb_why
> db> show allocks
> No such command
> db> show alllocks
> Process 2173 (ifconfig) thread 0xfffff8002d125a00 (100186)
> exclusive rm if_lagg rmlock (if_lagg rmlock) r = 0
> (0xfffff8002717e408) locked @
> /usr/home/sbruno/fbsd_head/sys/modules/if_lagg/../../net/if_lagg.c:1530
> exclusive sleep mutex in6_multi_mtx (in6_multi_mtx) r = 0
> (0xffffffff81d7e288) locked @
> /usr/home/sbruno/fbsd_head/sys/netinet6/in6_mcast.c:1142
> Process 792 (netstat) thread 0xfffff80027e67a00 (100167)
> shared rw if_addr_lock (if_addr_lock) r = 0 (0xfffff80103e95190)
> locked @ /usr/home/sbruno/fbsd_head/sys/net/rtsock.c:1717
> shared rw ifnet_rw (ifnet_rw) r = 0 (0xffffffff81d7b760) locked @
> /usr/home/sbruno/fbsd_head/sys/net/rtsock.c:1713
> exclusive sleep mutex Giant (Giant) r = 0 (0xffffffff81d55e08) locked
> @ /usr/home/sbruno/fbsd_head/sys/kern/kern_sysctl.c:164
>
> This looks like the netstat is causing a call into the counter
> function while the destruction or creation is ongoing.
>
> Removing the LAGG_RLOCK() calls from lagg_get_counter() makes the
> deadlock, LOR and panic go away, however this can't be that easy.  I'm
> unsure what the RLOCK is for in lagg_get_counter().  It appears that
> there is a higher lock in the ifnet access that is protecting
> simultaneous access already, but I'm very ignorant of what's going on
> here.
>
> I don't see any other driver with locks in its get_counter()
> functions, so I'm not sure what the best course of action here is.
>
> Sean

I don't know the best answer either.  But while you're in there, are
you interested in fixing any other lagg panics too?  I've written some
ATF torture tests for lagg, but I haven't checked them into head yet
because most of them quickly panic.

-Alan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2jHHJjW6gsXvYWNdqzq0ms0p0dYBFjSZLP3jfwahRGRMw>