From owner-freebsd-net@freebsd.org Thu Oct 17 23:34:24 2019 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id A1162160479 for ; Thu, 17 Oct 2019 23:34:24 +0000 (UTC) (envelope-from jacob.e.keller@intel.com) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "orsmga105.jf.intel.com", Issuer "Sectigo RSA Organization Validation Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 46vQVR60LCz3NMb; Thu, 17 Oct 2019 23:34:23 +0000 (UTC) (envelope-from jacob.e.keller@intel.com) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Oct 2019 16:34:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.67,309,1566889200"; d="scan'208";a="199534948" Received: from orsmsx106.amr.corp.intel.com ([10.22.225.133]) by orsmga003.jf.intel.com with ESMTP; 17 Oct 2019 16:34:10 -0700 Received: from orsmsx159.amr.corp.intel.com (10.22.240.24) by ORSMSX106.amr.corp.intel.com (10.22.225.133) with Microsoft SMTP Server (TLS) id 14.3.439.0; Thu, 17 Oct 2019 16:34:09 -0700 Received: from orsmsx121.amr.corp.intel.com ([169.254.10.88]) by ORSMSX159.amr.corp.intel.com ([169.254.11.61]) with mapi id 14.03.0439.000; Thu, 17 Oct 2019 16:34:09 -0700 From: "Keller, Jacob E" To: 'John Baldwin' , "'freebsd-net@freebsd.org'" CC: "'shurd@llnw.com'" , "Joyner, Eric" Subject: RE: panic on invalid ifp pointer in iflib drivers Thread-Topic: panic on invalid ifp pointer in iflib drivers Thread-Index: AdWEZQwvUgSbd6eoRs2vy4mdxXWSPwA3ezeAAACRPQAAAINf4A== Date: Thu, 17 Oct 2019 23:34:09 +0000 Message-ID: <02874ECE860811409154E81DA85FBB58969314A1@ORSMSX121.amr.corp.intel.com> References: <02874ECE860811409154E81DA85FBB589692E0D4@ORSMSX121.amr.corp.intel.com> <23f1e835-5dbb-055b-3768-f361311a9387@FreeBSD.org> <02874ECE860811409154E81DA85FBB5896931470@ORSMSX121.amr.corp.intel.com> In-Reply-To: <02874ECE860811409154E81DA85FBB5896931470@ORSMSX121.amr.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiYTA4ODA1YjktMjJkYy00Mzg1LThkNDYtNjNjZGI2MzVkZDkxIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiMVwvSkExYlg2XC9TZDRtK2pBNzdVSFUydE1sS00ya1E0T1crejBnMHljTnplWnhCSjlGVGdWdGNEMnQ2ZzJLWFpKIn0= x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.2.0.6 dlp-reaction: no-action x-originating-ip: [10.22.254.140] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Rspamd-Queue-Id: 46vQVR60LCz3NMb X-Spamd-Bar: --------- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=pass (policy=none) header.from=intel.com; spf=pass (mx1.freebsd.org: domain of jacob.e.keller@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=jacob.e.keller@intel.com X-Spamd-Result: default: False [-9.77 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; RCVD_COUNT_FIVE(0.00)[5]; HAS_XOIP(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; R_SPF_ALLOW(-0.20)[+ip4:134.134.136.100/32]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(-3.77)[ip: (-9.88), ipnet: 134.134.136.0/24(-4.95), asn: 4983(-3.96), country: US(-0.05)]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; RCVD_IN_DNSWL_MED(-0.20)[100.136.134.134.list.dnswl.org : 127.0.9.2]; DMARC_POLICY_ALLOW(-0.50)[intel.com,none]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:4983, ipnet:134.134.136.0/24, country:US]; RCVD_TLS_LAST(0.00)[]; WHITELIST_SPF_DKIM(-3.00)[intel.com:s:+] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Oct 2019 23:34:24 -0000 > -----Original Message----- > From: Keller, Jacob E > Sent: Thursday, October 17, 2019 4:22 PM > To: John Baldwin ; freebsd-net@freebsd.org > Cc: shurd@llnw.com; Joyner, Eric > Subject: RE: panic on invalid ifp pointer in iflib drivers >=20 > > -----Original Message----- > > From: John Baldwin > > Sent: Thursday, October 17, 2019 9:31 AM > > To: Keller, Jacob E ; freebsd-net@freebsd.org > > Cc: shurd@llnw.com; Joyner, Eric > > Subject: Re: panic on invalid ifp pointer in iflib drivers > > > > Nominally, ifnet drivers should call ether_ifdetach first to remove pub= lic > > references to the ifnet and only call their stop routine after that has= returned. > > This ensures any open if_ioctl invocations have completed, etc. before = the > > stop routine is invoked. Otherwise you are open to a race where the in= teface > > can be upped via an ioctl after you have stopped the hardware. > > > > Any other references to the ifnet via eventhandlers, etc. should also b= e > > deregistered before calling the stop routine. >=20 > Looks like iflib moved this much later when we refactored to add a shared > function to deregister VLAN handlers... >=20 > > > > After the hardware is stopped, interrupt handlers should be torn down a= nd > > callouts > > and tasks drained to ensure there are no other references to the ifp ou= tside of > > the thread running detach. > > > > After that you can release device resources, destroy mutexes, free the = ifp, etc. > > Note that drivers have to be prepared for ether_ifdetach to invoke if_i= octl (e.g. > > when detaching bpf), but of the drivers I've looked at this has general= ly been a > > non-issue. > > > > It sounds like iflib should be doing the detach before calling iflib_st= op. > > >=20 > I tested a patch that moved ether_ifdetach above the call to iflib_stop. >=20 > This seems to have made the issue significantly harder to reproduce, but = after > multiple attach/detach cycles with IPv6 traffic: (INVARIANTS and WITNESS = are > enabled, as well as meguard protecting ifnet) >=20 > Unread portion of the kernel message buffer: > Kernel page fault with the following non-sleepable locks held: > exclusive sleep mutex ip6qlock (ip6qlock) r =3D 0 (0xfffffe00007aa848) lo= cked @ > /usr/src/sys/netinet6/frag6.c:849 > shared rw vnet_rwlock (vnet_rwlock) r =3D 0 (0xffffffff820be700) locked @ > /usr/src/sys/netinet6/frag6.c:845 > stack backtrace: > #0 0xffffffff80bb6f83 at witness_debugger+0x73 > #1 0xffffffff80bb7fa2 at witness_warn+0x442 > #2 0xffffffff8108a0f3 at trap_pfault+0x53 > #3 0xffffffff810896e4 at trap+0x2b4 > #4 0xffffffff8106201c at calltrap+0x8 > #5 0xffffffff80d8c07a at icmp6_error+0x4aa > #6 0xffffffff80d8b30e at frag6_freef+0x10e > #7 0xffffffff80d8b551 at frag6_slowtimo+0x111 > #8 0xffffffff80bdcda4 at pfslowtimo+0x54 > #9 0xffffffff80b65bdf at softclock_call_cc+0x13f > #10 0xffffffff80b65f9c at softclock+0x7c > #11 0xffffffff80b0f857 at ithread_loop+0x187 > #12 0xffffffff80b0c4a4 at fork_exit+0x84 > #13 0xffffffff8106305e at fork_trampoline+0xe >=20 > Fatal trap 12: page fault while in kernel mode > cpuid =3D 0; apic id =3D 00 > fault virtual address =3D 0xfffffe0000825dd8 > fault code =3D supervisor read data, page not present > instruction pointer =3D 0x20:0xffffffff80d8c5b2 > stack pointer =3D 0x28:0xfffffe1fc28c6ff0 > frame pointer =3D 0x28:0xfffffe1fc28c7090 > code segment =3D base 0x0, limit 0xfffff, type 0x1b > =3D DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags =3D interrupt enabled, resume, IOPL =3D 0 > current process =3D 12 (swi4: clock (0)) > trap number =3D 12 > panic: page fault > cpuid =3D 0 > time =3D 1571354026 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xfffffe1fc28c6cb0 > vpanic() at vpanic+0x19d/frame 0xfffffe1fc28c6d00 > panic() at panic+0x43/frame 0xfffffe1fc28c6d60 > trap_fatal() at trap_fatal+0x39c/frame 0xfffffe1fc28c6dc0 > trap_pfault() at trap_pfault+0x62/frame 0xfffffe1fc28c6e10 > trap() at trap+0x2b4/frame 0xfffffe1fc28c6f20 > calltrap() at calltrap+0x8/frame 0xfffffe1fc28c6f20 > --- trap 0xc, rip =3D 0xffffffff80d8c5b2, rsp =3D 0xfffffe1fc28c6ff0, rbp= =3D > 0xfffffe1fc28c7090 --- > icmp6_reflect() at icmp6_reflect+0x242/frame 0xfffffe1fc28c7090 > icmp6_error() at icmp6_error+0x4aa/frame 0xfffffe1fc28c70e0 > frag6_freef() at frag6_freef+0x10e/frame 0xfffffe1fc28c7130 > frag6_slowtimo() at frag6_slowtimo+0x111/frame 0xfffffe1fc28c7180 > pfslowtimo() at pfslowtimo+0x54/frame 0xfffffe1fc28c71b0 > softclock_call_cc() at softclock_call_cc+0x13f/frame 0xfffffe1fc28c7260 > softclock() at softclock+0x7c/frame 0xfffffe1fc28c7290 > ithread_loop() at ithread_loop+0x187/frame 0xfffffe1fc28c72f0 > fork_exit() at fork_exit+0x84/frame 0xfffffe1fc28c7330 > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe1fc28c7330 > --- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 --- > KDB: enter: panic >=20 >=20 > Hmm.. now that I look at that more closely I think it's a separate issue. >=20 KGDB shows this as the spot where it panics: (kgdb) list /usr/src/sys/netinet6/icmp6.c:2129 2124 src6 =3D ia->ia_addr.sin6_addr; 2125 srcp =3D &src6; 2126 2127 if (m->m_pkthdr.rcvif !=3D NULL) { 2128 /* XXX: This may not be the outgoin= g interface */ 2129 hlim =3D ND_IFINFO(m->m_pkthdr.rcvi= f)->chlim; 2130 } else 2131 hlim =3D V_ip6_defhlim; 2132 } 2133 if (ia !=3D NULL) It looks like a received packet ends up with the stale IFP pointer... Thanks, Jake > > -- > > John Baldwin