From owner-freebsd-net@freebsd.org Wed Oct 16 21:16:28 2019 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id D8B951500A9 for ; Wed, 16 Oct 2019 21:16:28 +0000 (UTC) (envelope-from jacob.e.keller@intel.com) Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "orsmga104.jf.intel.com", Issuer "Sectigo RSA Organization Validation Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 46tlTl1cw0z4Gkw; Wed, 16 Oct 2019 21:16:26 +0000 (UTC) (envelope-from jacob.e.keller@intel.com) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Oct 2019 14:16:24 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.67,305,1566889200"; d="scan'208,217";a="208492722" Received: from orsmsx106.amr.corp.intel.com ([10.22.225.133]) by fmsmga001.fm.intel.com with ESMTP; 16 Oct 2019 14:16:23 -0700 Received: from orsmsx114.amr.corp.intel.com (10.22.240.10) by ORSMSX106.amr.corp.intel.com (10.22.225.133) with Microsoft SMTP Server (TLS) id 14.3.439.0; Wed, 16 Oct 2019 14:16:23 -0700 Received: from orsmsx121.amr.corp.intel.com ([169.254.10.88]) by ORSMSX114.amr.corp.intel.com ([169.254.8.228]) with mapi id 14.03.0439.000; Wed, 16 Oct 2019 14:16:23 -0700 From: "Keller, Jacob E" To: "freebsd-net@freebsd.org" CC: "shurd@llnw.com" , "Joyner, Eric" , John Baldwin Subject: panic on invalid ifp pointer in iflib drivers Thread-Topic: panic on invalid ifp pointer in iflib drivers Thread-Index: AdWEZQwvUgSbd6eoRs2vy4mdxXWSPw== Date: Wed, 16 Oct 2019 21:16:22 +0000 Message-ID: <02874ECE860811409154E81DA85FBB589692E0D4@ORSMSX121.amr.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMGQ5MjU0NDgtN2Y3MS00OWRjLWFkZGMtYzNkOTQ3NjIzZTMyIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiTU5jc2NMVkhRQ04wNG5PUVgweENBUnExVVpvN1doMU5KVlhvR2xcL2g3MlZPMGk2c1Fuc0pObk12ejdSQ054QXoifQ== x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.2.0.6 dlp-reaction: no-action x-originating-ip: [10.22.254.139] MIME-Version: 1.0 X-Rspamd-Queue-Id: 46tlTl1cw0z4Gkw X-Spamd-Bar: ------- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=pass (policy=none) header.from=intel.com; spf=pass (mx1.freebsd.org: domain of jacob.e.keller@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=jacob.e.keller@intel.com X-Spamd-Result: default: False [-7.77 / 15.00]; ARC_NA(0.00)[]; TO_DN_EQ_ADDR_SOME(0.00)[]; RCVD_COUNT_FIVE(0.00)[5]; HAS_XOIP(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; R_SPF_ALLOW(-0.20)[+ip4:134.134.136.31/32]; NEURAL_HAM_LONG(-0.00)[nan,0]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; IP_SCORE(-3.77)[ip: (-9.89), ipnet: 134.134.136.0/24(-4.95), asn: 4983(-3.96), country: US(-0.05)]; NEURAL_HAM_MEDIUM(-0.00)[nan,0]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_MED(-0.20)[31.136.134.134.list.dnswl.org : 127.0.9.2]; DMARC_POLICY_ALLOW(-0.50)[intel.com,none]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; ASN(0.00)[asn:4983, ipnet:134.134.136.0/24, country:US]; RCVD_TLS_LAST(0.00)[]; WHITELIST_SPF_DKIM(-3.00)[intel.com:s:+] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Oct 2019 21:16:28 -0000 Hi, I'm investigating an issue on the iflib ixl driver in 11.3-RELEASE as well = as 12-RELEASE. We found a panic in that occurs if SCTP/IPv6 traffic is bein= g transmitted while the device is detached: Fatal trap 12: page fault while in kernel mode cpuid =3D 0; apic id =3D 00 fault virtual address =3D 0xfffffe0000411e38 fault code =3D supervisor read data, page not present instruction pointer =3D 0x20:0xffffffff80c84700 stack pointer =3D 0x28:0xfffffe2f4351b600 frame pointer =3D 0x28:0xfffffe2f4351b650 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 12 (swi4: clock (0)) trap number =3D 12 panic: page fault cpuid =3D 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe2f4351b= 2c0 vpanic() at vpanic+0x17e/frame 0xfffffe2f4351b320 panic() at panic+0x43/frame 0xfffffe2f4351b380 trap_fatal() at trap_fatal+0x369/frame 0xfffffe2f4351b3d0 trap_pfault() at trap_pfault+0x62/frame 0xfffffe2f4351b420 trap() at trap+0x2b3/frame 0xfffffe2f4351b530 calltrap() at calltrap+0x8/frame 0xfffffe2f4351b530 --- trap 0xc, rip =3D 0xffffffff80c84700, rsp =3D 0xfffffe2f4351b600, rbp = =3D 0xfffffe2f4351b650 --- in6_selecthlim() at in6_selecthlim+0x20/frame 0xfffffe2f4351b650 sctp_lowlevel_chunk_output() at sctp_lowlevel_chunk_output+0xeb2/frame 0xff= fffe2f4351b790 sctp_chunk_output() at sctp_chunk_output+0x68c/frame 0xfffffe2f4351c110 sctp_timeout_handler() at sctp_timeout_handler+0x2d8/frame 0xfffffe2f4351c1= 80 softclock_call_cc() at softclock_call_cc+0x15b/frame 0xfffffe2f4351c230 softclock() at softclock+0x7c/frame 0xfffffe2f4351c260 intr_event_execute_handlers() at intr_event_execute_handlers+0x9a/frame 0xf= ffffe2f4351c2a0 ithread_loop() at ithread_loop+0xb7/frame 0xfffffe2f4351c2f0 fork_exit() at fork_exit+0x84/frame 0xfffffe2f4351c330 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe2f4351c330 --- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 --- KDB: enter: panic >From what I've gathered so far, it appears that the issue is a use-after-fr= ee where the SCTP stack gets an ifp pointer that's no longer valid. We've r= eproduced this issue on multiple iflib-based drivers, including ixl and the= recently published ice driver code (available on phabricator). Additionally, we cannot reproduce it on legacy-stack drivers for ixl, or a = mellanox 100G board we have. This leads me to believe that it's an issue in= iflib rather than in the specific device drivers. I am not sure exactly what's going wrong here... anyone have suggestions? I= thought it might be an issue of when ether_ifdetach is called. That functi= on is supposed to clear all of the pre-existing routes from the route entry= list. I'm thinking maybe somehow a route gets added after ether_ifdetach i= s called. In the iflib_device_deregister function, ether_ifdetach is called just afte= r iflib_stop, (which would call a device's if_stop routine), and then the t= ask queues are shutdown, a driver's ifdi_detach handler is called, and the = ifp is free'd at the end. In the ixl legacy driver, ether_ifdetach is calle= d prior to the stop routine. However, in the mlx5 driver, it's called after= a call to close_locked()... So I'm really not sure exactly what could cause a stale ifp pointer to get = into the route entry list. Thanks, Jake