From owner-freebsd-stable@FreeBSD.ORG Sat Nov 26 08:01:55 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C8A22106566B for ; Sat, 26 Nov 2011 08:01:55 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta04.emeryville.ca.mail.comcast.net (qmta04.emeryville.ca.mail.comcast.net [76.96.30.40]) by mx1.freebsd.org (Postfix) with ESMTP id AAE658FC0A for ; Sat, 26 Nov 2011 08:01:55 +0000 (UTC) Received: from omta06.emeryville.ca.mail.comcast.net ([76.96.30.51]) by qmta04.emeryville.ca.mail.comcast.net with comcast id 1k1n1i00616AWCUA4k1osN; Sat, 26 Nov 2011 08:01:48 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta06.emeryville.ca.mail.comcast.net with comcast id 1k1W1i00H1t3BNj8Sk1Whh; Sat, 26 Nov 2011 08:01:31 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 61419102C19; Sat, 26 Nov 2011 00:01:53 -0800 (PST) Date: Sat, 26 Nov 2011 00:01:53 -0800 From: Jeremy Chadwick To: Kris Bauer Message-ID: <20111126080153.GA33335@icarus.home.lan> References: <4ECEF6FD.5050006@freebsd.org> <4ED077BF.10205@freebsd.org> <20111126075647.GA33048@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20111126075647.GA33048@icarus.home.lan> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: kerbzo@gmail.com, freebsd-stable@freebsd.org, stb@lassitu.de, raul@turing.b2n.org, george+freebsd@m5p.com, Steven Hartland , Lawrence Stewart , FreeBSD Release Engineering Team Subject: Re: TCP Reassembly Issues X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 Nov 2011 08:01:55 -0000 On Fri, Nov 25, 2011 at 11:56:47PM -0800, Jeremy Chadwick wrote: > On Sat, Nov 26, 2011 at 12:49:24AM -0600, Kris Bauer wrote: > > On Fri, Nov 25, 2011 at 11:23 PM, Lawrence Stewart wrote: > > > > > On 11/25/11 13:01, Lawrence Stewart wrote: > > > > > >> On 11/24/11 18:02, Kris Bauer wrote: > > >> > > >>> Hello, > > >>> > > >>> I am currently experiencing an issue with FreeBSD 9.0-RC2 r227852 > > >>> where the > > >>> net.inet.tcp.reass.curesegments value is constantly increasing (and not > > >>> descreasing when there is nominal traffic with the box). It is causing > > >>> tcp > > >>> slowdowns as described with kern/155407: > > >>> > > >>> Exhausted net.inet.tcp.reass.maxsegments block recovering tcp session > > >>> (for > > >>> this socket and any other socket waiting for retransmited packets). After > > >>> exhausted net.inet.tcp.reass.maxsegments allocation new entry in > > >>> tcp_reass > > >>> failed (for this socket and any other socket waiting for retransmited > > >>> packets). > > >>> > > >>> I have increased the reass.maxsegments value to 16384 to temporarily > > >>> avoid > > >>> the problem, but the cursegments number keeps rising and it seems it will > > >>> occur again. > > >>> > > >>> Is this an issue that anyone else has seen? I can provide more > > >>> information > > >>> if need be. > > >>> > > >> > > >> Thanks Kris, Raul and Stefan for the reports, I'll look into this. > > >> > > > > > > I think I've got it - a stupid 1 line logic bug. My apologies for missing > > > it when I reviewed the patch which introduced the bug (patch was committed > > > to head as r226113, MFCed to stable/9 as r226228). > > > > > > Due to some miscommunication, the initial patch was committed to and MFCed > > > from head much later than it should have been in the 9.0 release cycle and > > > instead of being included in the BETAs, didn't make it in until 9.0-RC1 I > > > believe i.e. only RC1 and RC2 should be experiencing the issue. > > > > > > Could those who have reported the bug and are able to recompile their > > > kernel to test a patch please try the following and report back to the list: > > > > > > > > > http://people.freebsd.org/~lstewart/patches/misctcp/tcp_reass_plugzoneleak_10.x.r227986.patch > > > > > > The patch is against head r227986 but will apply and work correctly for > > > 9.0 as well. > > > > > > Cheers, > > > Lawrence > > > > > > > I have patched, recompiled, and rebooted. net.inet.tcp.reass.cursegments > > is no longer incrementing, and connectivity is holding steady. If anything > > changes over the next couple of hours, I'll be sure to report it; but all > > preliminary signs of the problem are gone. > > > > Thanks for all the help! > > Let's not be hasty in concluding everything is fixed. Why I'm a bit on > edge about this: I took the time to find the CVS commits that induced > this issue in the first place, and it seems there is some history. > > The commit that caused this problem to begin with was supposedly a fix > for a different problem: > > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_reass.c#rev1.375 > > A week later, that commit went from HEAD/MAIN into RELENG_9: > > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_reass.c#rev1.374.2.2 > > Be sure to read the description of the problem that was being fixed in > the first place. I've also CC'd the original problem reporter, Steven > Hartland, because we're going to need him to try the above patch from > Lawrence to make sure there aren't other problems. Meaning: for all we > know, the above fix might work great for Kris but cause problems for > Steve. > > This entire situation leads me to believe very few people are doing > quality testing of RELENG_9, yet we're already into 9.0-RC2. Please > don't tell me "that's exactly why you should be running RELENG_9!"; that > is completely backwards and I refuse to get into a flame war about it, > because it's this simple: 90%+ of those running FreeBSD on servers need > something that's stable, we can't risk wonkiness (especially of this > degree!) on systems taking production traffic. Did no one actually test > the change *thoroughly*? Imagine had this lay dormant until 9.0-RELEASE. > > Lawrence: please don't take my comments personally or to mean "you broke > it and caused this mess!" It's meant to read more along the lines of > "you committed a fix for something that broke other bits badly, but > nobody noticed this, including the original reporter of a different > problem? How/why?" You get the idea. Re-sending, because the "Tested by" commit line had someone who replaced the "@" character with "-at-", so my mail client assumed the Email address was on my local machine. Sorry about that folks. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |