Date: Mon, 07 Dec 2015 22:32:41 -0700 From: Jason <j@scre.ws> To: freebsd-net@freebsd.org Cc: sbruno@freebsd.org, kevin.bowling@kev009.com, hiren@strugglingcoder.info Subject: Multiple cores/race conditions in IPv6 RA Message-ID: <50cff74ea38f155ae616cf49f5ffb5ae@m.nitrology.com>
next in thread | raw e-mail | index | archive | help
Hi, It appears the IPv6 router advertisement code paths were written fairly lockless, assuming you would never process multiples concurrently. We are seeing multiple page faults in various places processing the messages and modifying the routing table. We have multiple L3 devices and multiple v6 blocks broadcasting these messages to hardware with dual uplinks in the same VLAN, which I believe is making us susceptible to this. Though I believe the dual uplink is all that's required for this, as it can be seen in configurations with a single v6 block. We are running stable/10 @ r285800, and it doesn't appear anything relevant has changed since then. Our other widely deployed version is 8.3-RELEASE, which does not see this issue. Upon bumping a machine from 8.3 -> 10 we can see it start to exhibit this behavior. The only change I see that might be relevant is r243148, but these cores are relatively rare, so testing is tough without a considerable deployment. So basically I'm hoping someone with a trained eye can send us in the right direction before we go down that road. Every backtrace looks pretty much like this, with the location in nd6_rtr differing: panic: page fault #0 doadump (textdump=1) at pcpu.h:219 #1 0xffffffff8075fa07 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451 #2 0xffffffff8075fe05 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:758 #3 0xffffffff8075fc93 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:687 #4 0xffffffff80acdf9b in trap_fatal (frame=<value optimized out>, eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:851 #5 0xffffffff80ace29d in trap_pfault (frame=0xfffffe0f959b0ff0, usermode=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:674 #6 0xffffffff80acd93a in trap (frame=0xfffffe0f959b0ff0) at /usr/src/sys/amd64/amd64/trap.c:440 #7 0xffffffff80ab3932 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 #8 0xffffffff808a5550 in nd6_ra_input (m=<value optimized out>, off=<value optimized out>, icmp6len=<value optimized out>) at /usr/src/sys/netinet6/nd6_rtr.c:739 #9 0xffffffff8087f31f in icmp6_input (mp=<value optimized out>, offp=0xfffffe0f959b167c, proto=<value optimized out>) at /usr/src/sys/netinet6/icmp6.c:808 #10 0xffffffff808949fc in ip6_input (m=0xfffff8002e743200) at /usr/src/sys/netinet6/ip6_input.c:1019 #11 0xffffffff80832f02 in netisr_dispatch_src (proto=<value optimized out>, source=<value optimized out>, m=0x1) at /usr/src/sys/net/netisr.c:976 #12 0xffffffff8082a226 in ether_demux (ifp=<value optimized out>, m=0xfffff8002e743200) at /usr/src/sys/net/if_ethersubr.c:851 #13 0xffffffff8082aece in ether_nh_input (m=<value optimized out>) at /usr/src/sys/net/if_ethersubr.c:646 #14 0xffffffff80832f02 in netisr_dispatch_src (proto=<value optimized out>, source=<value optimized out>, m=0x1) at /usr/src/sys/net/netisr.c:976 I'll link to GH for the various relevant bits, because I know everyone can agree it's the superior RCS. It appears to be that most of these are caused by the dr struct being freed by concurrent processing: https://github.com/freebsd/freebsd/blob/e5ee1c2b414851b17663cb491e2f2317a0af9bda/sys/netinet6/nd6_rtr.c#L578 https://github.com/freebsd/freebsd/blob/e5ee1c2b414851b17663cb491e2f2317a0af9bda/sys/netinet6/nd6_rtr.c#L654 https://github.com/freebsd/freebsd/blob/e5ee1c2b414851b17663cb491e2f2317a0af9bda/sys/netinet6/nd6_rtr.c#L728 https://github.com/freebsd/freebsd/blob/e5ee1c2b414851b17663cb491e2f2317a0af9bda/sys/netinet6/nd6_rtr.c#L739 https://github.com/freebsd/freebsd/blob/e5ee1c2b414851b17663cb491e2f2317a0af9bda/sys/netinet6/nd6_rtr.c#L800 https://github.com/freebsd/freebsd/blob/e5ee1c2b414851b17663cb491e2f2317a0af9bda/sys/netinet6/nd6_rtr.c#L1312 Thanks for any assistance, Jason
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50cff74ea38f155ae616cf49f5ffb5ae>