Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 22 Oct 2020 23:10:55 +0100
From:      Alexander V. Chernikov <melifaro@ipfw.ru>
To:        Ryan Stone <rysto32@gmail.com>, freebsd-net <freebsd-net@freebsd.org>
Subject:   Re: Panic in in6_joingroup_locked
Message-ID:  <244891603404616@mail.yandex.ru>
In-Reply-To: <CAFMmRNwZLh8G5Yc2XPQ=zaAnZCa5UuuT9_qkGUC837vYPFd%2B9g@mail.gmail.com>
References:  <CAFMmRNwZLh8G5Yc2XPQ=zaAnZCa5UuuT9_qkGUC837vYPFd%2B9g@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
21.10.2020, 23:05, "Ryan Stone" <rysto32@gmail.com>:
> Today at $WORK we saw a panic due to a race between
> in6_joingroup_locked and if_detach_internal. This happened on a
> branch that's about 2 years behind head, but the relevant code in head
> does not appear to have changed.
>
> The backtrace of the panic was this:
>
> panic: Fatal trap 9: general protection fault while in kernel mode
> Stack: --------------------------------------------------
> kernel:trap_fatal+0x96
> kernel:trap+0x76
> kernel:in6_joingroup_locked+0x2c7
> kernel:in6_joingroup+0x46
> kernel:in6_update_ifa+0x18e5
> kernel:in6_ifattach+0x4d0
> kernel:in6_if_up+0x99
> kernel:if_up+0x7d
> kernel:ifhwioctl+0xcea
> kernel:ifioctl+0x2c9
> kernel:kern_ioctl+0x29b
> kernel:sys_ioctl+0x16d
> kernel:amd64_syscall+0x327
>
> We panic'ed here, because the memory pointed to by ifma has been freed
> and filled with 0xdeadc0de:
>
> https://svnweb.freebsd.org/base/head/sys/netinet6/in6_mcast.c?revision=365071&view=markup#l421
>
> Another thread was in the process of trying to destroy the same
> interface. It had the following backtrace at the time of the panic:
>
> #0 sched_switch (td=0xfffffea654845aa0, newtd=0xfffffea266fa9aa0,
> flags=<optimized out>) at /b/mnt/src/sys/kern/sched_ule.c:2423
> #1 0xffffffff80643071 in mi_switch (flags=<optimized out>, newtd=0x0)
> at /b/mnt/src/sys/kern/kern_synch.c:605
> #2 0xffffffff80693234 in sleepq_switch (wchan=0xffffffff8139cc90
> <ifv_sx>, pri=0) at /b/mnt/src/sys/kern/subr_sleepqueue.c:612
> #3 0xffffffff806930c3 in sleepq_wait (wchan=0xffffffff8139cc90
> <ifv_sx>, pri=0) at /b/mnt/src/sys/kern/subr_sleepqueue.c:691
> #4 0xffffffff8063fcb3 in _sx_xlock_hard (sx=<optimized out>,
> x=<optimized out>, opts=0, timo=0, file=<optimized out>,
> line=<optimized out>) at
> /b/mnt/src/sys/kern/kern_sx.c:936
> #5 0xffffffff8063f313 in _sx_xlock (sx=0xffffffff8139cc90 <ifv_sx>,
> opts=0, timo=<optimized out>, file=0xffffffff80ba6d2a
> "/b/mnt/src/sys/net/i
> f_vlan.c", line=668) at /b/mnt/src/sys/kern/kern_sx.c:352
> #6 0xffffffff807558b2 in vlan_ifdetach (arg=<optimized out>,
> ifp=0xfffff8049b2ce000) at /b/mnt/src/sys/net/if_vlan.c:668
> #7 0xffffffff80747676 in if_detach_internal (vmove=0, ifp=<optimized
> out>, ifcp=<optimized out>) at /b/mnt/src/sys/net/if.c:1203
> #8 if_detach (ifp=0xfffff8049b2ce000) at /b/mnt/src/sys/net/if.c:1060
> #9 0xffffffff80756521 in vlan_clone_destroy (ifc=0xfffff802f29dbe80,
> ifp=0xfffff8049b2ce000) at /b/mnt/src/sys/net/if_vlan.c:1102
> #10 0xffffffff8074dc57 in if_clone_destroyif (ifc=0xfffff802f29dbe80,
> ifp=0xfffff8049b2ce000) at /b/mnt/src/sys/net/if_clone.c:330
> #11 0xffffffff8074dafe in if_clone_destroy (name=<optimized out>) at
> /b/mnt/src/sys/net/if_clone.c:288
> #12 0xffffffff8074b2fd in ifioctl (so=0xfffffea6363806d0,
> cmd=2149607801, data=<optimized out>, td=0xfffffea654845aa0) at
> /b/mnt/src/sys/net/if.
> c:3077
> #13 0xffffffff806aab1c in fo_ioctl (fp=<optimized out>, com=<optimized
> out>, active_cred=<unavailable>, td=<optimized out>, data=<optimized
> out>
> ) at /b/mnt/src/sys/sys/file.h:396
> #14 kern_ioctl (td=0xfffffea654845aa0, fd=4, com=<optimized out>,
> data=<unavailable>) at /b/mnt/src/sys/kern/sys_generic.c:938
> #15 0xffffffff806aa7fe in sys_ioctl (td=0xfffffea654845aa0,
> uap=0xfffffea653441b30) at /b/mnt/src/sys/kern/sys_generic.c:846
> #16 0xffffffff809ceab8 in syscallenter (td=<optimized out>) at
> /b/mnt/src/sys/amd64/amd64/../../kern/subr_syscall.c:187
> #17 amd64_syscall (td=0xfffffea654845aa0, traced=0) at
> /b/mnt/src/sys/amd64/amd64/trap.c:1196
> #18 fast_syscall_common () at /b/mnt/src/sys/amd64/amd64/exception.S:505
>
> Frame 7 was at this point in if_detach_internal
>
> https://svnweb.freebsd.org/base/head/sys/net/if.c?revision=366230&view=markup#l1206
>
> As you can see, a couple of lines up if_purgemaddrs() was called and
> freed all multicast addresses assigned to the interface, which
> destroyed the multicast address being added out from under
> in6_joingroup_locked.
[sorry, re-posting in plain text]
I don't have a solution w.r.t. multicast locking spaghetti, but from looking into a code, it looks like that
extending network epoch to the whole in6_getmulti() would fix this panic?
>
> I see two potential paths forward: either the wacky locking in
> in6_getmulti() gets fixed so that we don't have to do the "drop the
> lock to call a function that acquires that lock again" dance that
> opens up this race condition, or we fix if_addmulti so that it adds an
> additional reference to the address if retifma is non-NULL.
>
> The second option would be a KPI change that would have a nasty side
> effect of leaking the address if an existing caller wasn't fixed, but
> on the other hand the current interface seems pretty useless if it
> can't actually guarantee that the address you asked for will exist
> when you get around to trying to manipulate it.
>
> Does anybody have any thoughts on this?
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?244891603404616>