From owner-freebsd-net@freebsd.org Thu Oct 22 22:10:59 2020 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 0B4C842C5DA for ; Thu, 22 Oct 2020 22:10:59 +0000 (UTC) (envelope-from melifaro@ipfw.ru) Received: from forward501o.mail.yandex.net (forward501o.mail.yandex.net [37.140.190.203]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4CHM4y1KPJz4Yrr for ; Thu, 22 Oct 2020 22:10:57 +0000 (UTC) (envelope-from melifaro@ipfw.ru) Received: from mxback3o.mail.yandex.net (mxback3o.mail.yandex.net [IPv6:2a02:6b8:0:1a2d::1d]) by forward501o.mail.yandex.net (Yandex) with ESMTP id A9CF61E800D5; Fri, 23 Oct 2020 01:10:55 +0300 (MSK) Received: from localhost (localhost [::1]) by mxback3o.mail.yandex.net (mxback/Yandex) with ESMTP id y4jX5wXI8z-AtlucM6L; Fri, 23 Oct 2020 01:10:55 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ipfw.ru; s=mail; t=1603404655; bh=tfm+WwEjOobnWifHRw02/alGsHXQFWDV4jdn7DXML20=; h=References:Date:Message-Id:Subject:In-Reply-To:To:From; b=LQLhfZpBOS3ndDJV262ZgElJImuaWE9G+cTnTHHvecyvzV8054NtNjkgklbJ41ydD MFgSJZVft+P2qQ+Wp1m87wB58OFmgFAMZkCs5WnOSvRIn4FtLZ3Q0/1/Kdghe8gpen MFHG/+onCl+4CA2lTdF2bEq/LASYhSWFKIz4tb70= Received: by sas1-8f5b4ec544f6.qloud-c.yandex.net with HTTP; Fri, 23 Oct 2020 01:10:55 +0300 From: Alexander V. Chernikov To: Ryan Stone , freebsd-net In-Reply-To: References: Subject: Re: Panic in in6_joingroup_locked MIME-Version: 1.0 X-Mailer: Yamail [ http://yandex.ru ] 5.0 Date: Thu, 22 Oct 2020 23:10:55 +0100 Message-Id: <244891603404616@mail.yandex.ru> Content-Transfer-Encoding: 7bit Content-Type: text/plain X-Rspamd-Queue-Id: 4CHM4y1KPJz4Yrr X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=ipfw.ru header.s=mail header.b=LQLhfZpB; dmarc=none; spf=pass (mx1.freebsd.org: domain of melifaro@ipfw.ru designates 37.140.190.203 as permitted sender) smtp.mailfrom=melifaro@ipfw.ru X-Spamd-Result: default: False [-3.67 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.98)[-0.981]; R_DKIM_ALLOW(-0.20)[ipfw.ru:s=mail]; FREEFALL_USER(0.00)[melifaro]; FROM_HAS_DN(0.00)[]; RWL_MAILSPIKE_GOOD(0.00)[37.140.190.203:from]; R_SPF_ALLOW(-0.20)[+ip4:37.140.128.0/18:c]; NEURAL_HAM_LONG(-0.98)[-0.983]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[ipfw.ru]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[ipfw.ru:+]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-1.11)[-1.111]; FREEMAIL_TO(0.00)[gmail.com,freebsd.org]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:13238, ipnet:37.140.128.0/18, country:RU]; MAILMAN_DEST(0.00)[freebsd-net]; RCVD_IN_DNSWL_LOW(-0.10)[37.140.190.203:from] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Oct 2020 22:10:59 -0000 21.10.2020, 23:05, "Ryan Stone" : > Today at $WORK we saw a panic due to a race between > in6_joingroup_locked and if_detach_internal. This happened on a > branch that's about 2 years behind head, but the relevant code in head > does not appear to have changed. > > The backtrace of the panic was this: > > panic: Fatal trap 9: general protection fault while in kernel mode > Stack: -------------------------------------------------- > kernel:trap_fatal+0x96 > kernel:trap+0x76 > kernel:in6_joingroup_locked+0x2c7 > kernel:in6_joingroup+0x46 > kernel:in6_update_ifa+0x18e5 > kernel:in6_ifattach+0x4d0 > kernel:in6_if_up+0x99 > kernel:if_up+0x7d > kernel:ifhwioctl+0xcea > kernel:ifioctl+0x2c9 > kernel:kern_ioctl+0x29b > kernel:sys_ioctl+0x16d > kernel:amd64_syscall+0x327 > > We panic'ed here, because the memory pointed to by ifma has been freed > and filled with 0xdeadc0de: > > https://svnweb.freebsd.org/base/head/sys/netinet6/in6_mcast.c?revision=365071&view=markup#l421 > > Another thread was in the process of trying to destroy the same > interface. It had the following backtrace at the time of the panic: > > #0 sched_switch (td=0xfffffea654845aa0, newtd=0xfffffea266fa9aa0, > flags=) at /b/mnt/src/sys/kern/sched_ule.c:2423 > #1 0xffffffff80643071 in mi_switch (flags=, newtd=0x0) > at /b/mnt/src/sys/kern/kern_synch.c:605 > #2 0xffffffff80693234 in sleepq_switch (wchan=0xffffffff8139cc90 > , pri=0) at /b/mnt/src/sys/kern/subr_sleepqueue.c:612 > #3 0xffffffff806930c3 in sleepq_wait (wchan=0xffffffff8139cc90 > , pri=0) at /b/mnt/src/sys/kern/subr_sleepqueue.c:691 > #4 0xffffffff8063fcb3 in _sx_xlock_hard (sx=, > x=, opts=0, timo=0, file=, > line=) at > /b/mnt/src/sys/kern/kern_sx.c:936 > #5 0xffffffff8063f313 in _sx_xlock (sx=0xffffffff8139cc90 , > opts=0, timo=, file=0xffffffff80ba6d2a > "/b/mnt/src/sys/net/i > f_vlan.c", line=668) at /b/mnt/src/sys/kern/kern_sx.c:352 > #6 0xffffffff807558b2 in vlan_ifdetach (arg=, > ifp=0xfffff8049b2ce000) at /b/mnt/src/sys/net/if_vlan.c:668 > #7 0xffffffff80747676 in if_detach_internal (vmove=0, ifp= out>, ifcp=) at /b/mnt/src/sys/net/if.c:1203 > #8 if_detach (ifp=0xfffff8049b2ce000) at /b/mnt/src/sys/net/if.c:1060 > #9 0xffffffff80756521 in vlan_clone_destroy (ifc=0xfffff802f29dbe80, > ifp=0xfffff8049b2ce000) at /b/mnt/src/sys/net/if_vlan.c:1102 > #10 0xffffffff8074dc57 in if_clone_destroyif (ifc=0xfffff802f29dbe80, > ifp=0xfffff8049b2ce000) at /b/mnt/src/sys/net/if_clone.c:330 > #11 0xffffffff8074dafe in if_clone_destroy (name=) at > /b/mnt/src/sys/net/if_clone.c:288 > #12 0xffffffff8074b2fd in ifioctl (so=0xfffffea6363806d0, > cmd=2149607801, data=, td=0xfffffea654845aa0) at > /b/mnt/src/sys/net/if. > c:3077 > #13 0xffffffff806aab1c in fo_ioctl (fp=, com= out>, active_cred=, td=, data= out> > ) at /b/mnt/src/sys/sys/file.h:396 > #14 kern_ioctl (td=0xfffffea654845aa0, fd=4, com=, > data=) at /b/mnt/src/sys/kern/sys_generic.c:938 > #15 0xffffffff806aa7fe in sys_ioctl (td=0xfffffea654845aa0, > uap=0xfffffea653441b30) at /b/mnt/src/sys/kern/sys_generic.c:846 > #16 0xffffffff809ceab8 in syscallenter (td=) at > /b/mnt/src/sys/amd64/amd64/../../kern/subr_syscall.c:187 > #17 amd64_syscall (td=0xfffffea654845aa0, traced=0) at > /b/mnt/src/sys/amd64/amd64/trap.c:1196 > #18 fast_syscall_common () at /b/mnt/src/sys/amd64/amd64/exception.S:505 > > Frame 7 was at this point in if_detach_internal > > https://svnweb.freebsd.org/base/head/sys/net/if.c?revision=366230&view=markup#l1206 > > As you can see, a couple of lines up if_purgemaddrs() was called and > freed all multicast addresses assigned to the interface, which > destroyed the multicast address being added out from under > in6_joingroup_locked. [sorry, re-posting in plain text] I don't have a solution w.r.t. multicast locking spaghetti, but from looking into a code, it looks like that extending network epoch to the whole in6_getmulti() would fix this panic? > > I see two potential paths forward: either the wacky locking in > in6_getmulti() gets fixed so that we don't have to do the "drop the > lock to call a function that acquires that lock again" dance that > opens up this race condition, or we fix if_addmulti so that it adds an > additional reference to the address if retifma is non-NULL. > > The second option would be a KPI change that would have a nasty side > effect of leaking the address if an existing caller wasn't fixed, but > on the other hand the current interface seems pretty useless if it > can't actually guarantee that the address you asked for will exist > when you get around to trying to manipulate it. > > Does anybody have any thoughts on this? > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"