From owner-freebsd-current@freebsd.org Fri Apr 24 13:42:11 2020 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id B77852B50BA; Fri, 24 Apr 2020 13:42:11 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 497wMR4M5mz4MpC; Fri, 24 Apr 2020 13:42:11 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: from venus.codepro.be (venus.codepro.be [5.9.86.228]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mx1.codepro.be", Issuer "Let's Encrypt Authority X3" (verified OK)) (Authenticated sender: kp) by smtp.freebsd.org (Postfix) with ESMTPSA id 7991B1A126; Fri, 24 Apr 2020 13:42:11 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: by venus.codepro.be (Postfix, authenticated sender kp) id 9296C21E7F; Fri, 24 Apr 2020 15:42:09 +0200 (CEST) From: "Kristof Provost" To: d@delphij.net Cc: freebsd-current@freebsd.org, freebsd-stable@freebsd.org Subject: Re: CFT: if_bridge performance improvements Date: Fri, 24 Apr 2020 15:42:08 +0200 X-Mailer: MailMate (1.13.1r5671) Message-ID: <544E27A6-D799-4AF3-B4B7-1E68D5D50698@FreeBSD.org> In-Reply-To: <8634ec5c-a509-d2dd-8f5c-31efcbd50340@delphij.net> References: <5377E42E-4C01-4BCC-B934-011AC3448B54@FreeBSD.org> <8e0e2bf1-27cd-1a99-b266-c7223255942f@delphij.net> <8634ec5c-a509-d2dd-8f5c-31efcbd50340@delphij.net> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8"; format=flowed; markup=markdown Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Apr 2020 13:42:11 -0000 On 22 Apr 2020, at 18:15, Xin Li wrote: > On 4/22/20 01:45, Kristof Provost wrote: >> On 22 Apr 2020, at 10:20, Xin Li wrote: >>> Hi, >>> >>> On 4/14/20 02:51, Kristof Provost wrote: >>>> Hi, >>>> >>>> Thanks to support from The FreeBSD Foundation I’ve been able to >>>> work on >>>> improving the throughput of if_bridge. >>>> It changes the (data path) locking to use the NET_EPOCH >>>> infrastructure. >>>> Benchmarking shows substantial improvements (x5 in test setups). >>>> >>>> This work is ready for wider testing now. >>>> >>>> It’s under review here: https://reviews.freebsd.org/D24250 >>>> >>>> Patch for CURRENT: https://reviews.freebsd.org/D24250?download=true >>>> Patches for stable/12: >>>> https://people.freebsd.org/~kp/if_bridge/stable_12/ >>>> >>>> I’m not currently aware of any panics or issues resulting from >>>> these >>>> patches. >>> >>> I have observed the following panic with latest stable/12 after >>> applying >>> the stable_12 patchset, it appears like a race condition related >>> NULL >>> pointer deference, but I haven't took a deeper look yet. >>> >>> The box have 7 igb(4) NICs, with several bridge and VLAN configured >>> acting as a router.  Please let me know if you need additional >>> information; I can try -CURRENT as well, but it would take some time >>> as >>> the box is relatively slow (it's a ZFS based system so I can create >>> a >>> separate boot environment for -CURRENT if needed, but that would >>> take >>> some time as I might have to upgrade the packages, should there be >>> any >>> ABI breakages). >>> >> Thanks for the report. I don’t immediately see how this could >> happen. >> >> Are you running an L2 firewall on that bridge by any chance? An >> earlier >> version of the patch had issues with a stray unlock in that code >> path. > > I don't think I have a L2 firewall (I assume means filtering based on > MAC address like what can be done with e.g. ipfw? The bridges were > created on vlan interfaces though, do they count as L2 firewall?), the > system is using pf with a few NAT rules: > That backtrace looks identical to the one Peter reported, up to and including the offset in the bridge_input() function. Given that there’s no likely way to end up with a NULL mutex either I have to assume that it’s a case of trying to unlock a locked mutex, and the most likely reason is that you ran into the same problem Peter ran into. The current version of the patch should resolve it. Best regards, Kristof