Date: Wed, 21 Oct 2020 18:04:52 -0400 From: Ryan Stone <rysto32@gmail.com> To: freebsd-net <freebsd-net@freebsd.org> Subject: Panic in in6_joingroup_locked Message-ID: <CAFMmRNwZLh8G5Yc2XPQ=zaAnZCa5UuuT9_qkGUC837vYPFd%2B9g@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Today at $WORK we saw a panic due to a race between in6_joingroup_locked and if_detach_internal. This happened on a branch that's about 2 years behind head, but the relevant code in head does not appear to have changed. The backtrace of the panic was this: panic: Fatal trap 9: general protection fault while in kernel mode Stack: -------------------------------------------------- kernel:trap_fatal+0x96 kernel:trap+0x76 kernel:in6_joingroup_locked+0x2c7 kernel:in6_joingroup+0x46 kernel:in6_update_ifa+0x18e5 kernel:in6_ifattach+0x4d0 kernel:in6_if_up+0x99 kernel:if_up+0x7d kernel:ifhwioctl+0xcea kernel:ifioctl+0x2c9 kernel:kern_ioctl+0x29b kernel:sys_ioctl+0x16d kernel:amd64_syscall+0x327 We panic'ed here, because the memory pointed to by ifma has been freed and filled with 0xdeadc0de: https://svnweb.freebsd.org/base/head/sys/netinet6/in6_mcast.c?revision=365071&view=markup#l421 Another thread was in the process of trying to destroy the same interface. It had the following backtrace at the time of the panic: #0 sched_switch (td=0xfffffea654845aa0, newtd=0xfffffea266fa9aa0, flags=<optimized out>) at /b/mnt/src/sys/kern/sched_ule.c:2423 #1 0xffffffff80643071 in mi_switch (flags=<optimized out>, newtd=0x0) at /b/mnt/src/sys/kern/kern_synch.c:605 #2 0xffffffff80693234 in sleepq_switch (wchan=0xffffffff8139cc90 <ifv_sx>, pri=0) at /b/mnt/src/sys/kern/subr_sleepqueue.c:612 #3 0xffffffff806930c3 in sleepq_wait (wchan=0xffffffff8139cc90 <ifv_sx>, pri=0) at /b/mnt/src/sys/kern/subr_sleepqueue.c:691 #4 0xffffffff8063fcb3 in _sx_xlock_hard (sx=<optimized out>, x=<optimized out>, opts=0, timo=0, file=<optimized out>, line=<optimized out>) at /b/mnt/src/sys/kern/kern_sx.c:936 #5 0xffffffff8063f313 in _sx_xlock (sx=0xffffffff8139cc90 <ifv_sx>, opts=0, timo=<optimized out>, file=0xffffffff80ba6d2a "/b/mnt/src/sys/net/i f_vlan.c", line=668) at /b/mnt/src/sys/kern/kern_sx.c:352 #6 0xffffffff807558b2 in vlan_ifdetach (arg=<optimized out>, ifp=0xfffff8049b2ce000) at /b/mnt/src/sys/net/if_vlan.c:668 #7 0xffffffff80747676 in if_detach_internal (vmove=0, ifp=<optimized out>, ifcp=<optimized out>) at /b/mnt/src/sys/net/if.c:1203 #8 if_detach (ifp=0xfffff8049b2ce000) at /b/mnt/src/sys/net/if.c:1060 #9 0xffffffff80756521 in vlan_clone_destroy (ifc=0xfffff802f29dbe80, ifp=0xfffff8049b2ce000) at /b/mnt/src/sys/net/if_vlan.c:1102 #10 0xffffffff8074dc57 in if_clone_destroyif (ifc=0xfffff802f29dbe80, ifp=0xfffff8049b2ce000) at /b/mnt/src/sys/net/if_clone.c:330 #11 0xffffffff8074dafe in if_clone_destroy (name=<optimized out>) at /b/mnt/src/sys/net/if_clone.c:288 #12 0xffffffff8074b2fd in ifioctl (so=0xfffffea6363806d0, cmd=2149607801, data=<optimized out>, td=0xfffffea654845aa0) at /b/mnt/src/sys/net/if. c:3077 #13 0xffffffff806aab1c in fo_ioctl (fp=<optimized out>, com=<optimized out>, active_cred=<unavailable>, td=<optimized out>, data=<optimized out> ) at /b/mnt/src/sys/sys/file.h:396 #14 kern_ioctl (td=0xfffffea654845aa0, fd=4, com=<optimized out>, data=<unavailable>) at /b/mnt/src/sys/kern/sys_generic.c:938 #15 0xffffffff806aa7fe in sys_ioctl (td=0xfffffea654845aa0, uap=0xfffffea653441b30) at /b/mnt/src/sys/kern/sys_generic.c:846 #16 0xffffffff809ceab8 in syscallenter (td=<optimized out>) at /b/mnt/src/sys/amd64/amd64/../../kern/subr_syscall.c:187 #17 amd64_syscall (td=0xfffffea654845aa0, traced=0) at /b/mnt/src/sys/amd64/amd64/trap.c:1196 #18 fast_syscall_common () at /b/mnt/src/sys/amd64/amd64/exception.S:505 Frame 7 was at this point in if_detach_internal https://svnweb.freebsd.org/base/head/sys/net/if.c?revision=366230&view=markup#l1206 As you can see, a couple of lines up if_purgemaddrs() was called and freed all multicast addresses assigned to the interface, which destroyed the multicast address being added out from under in6_joingroup_locked. I see two potential paths forward: either the wacky locking in in6_getmulti() gets fixed so that we don't have to do the "drop the lock to call a function that acquires that lock again" dance that opens up this race condition, or we fix if_addmulti so that it adds an additional reference to the address if retifma is non-NULL. The second option would be a KPI change that would have a nasty side effect of leaking the address if an existing caller wasn't fixed, but on the other hand the current interface seems pretty useless if it can't actually guarantee that the address you asked for will exist when you get around to trying to manipulate it. Does anybody have any thoughts on this?
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFMmRNwZLh8G5Yc2XPQ=zaAnZCa5UuuT9_qkGUC837vYPFd%2B9g>