Date: Sat, 26 Nov 2011 00:01:53 -0800 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: Kris Bauer <kristoph.bauer@gmail.com> Cc: kerbzo@gmail.com, freebsd-stable@freebsd.org, stb@lassitu.de, raul@turing.b2n.org, george+freebsd@m5p.com, Steven Hartland <killing@multiplay.co.uk>, Lawrence Stewart <lstewart@freebsd.org>, FreeBSD Release Engineering Team <re@freebsd.org> Subject: Re: TCP Reassembly Issues Message-ID: <20111126080153.GA33335@icarus.home.lan> In-Reply-To: <20111126075647.GA33048@icarus.home.lan> References: <CAPNZ-Wq38=F3o2hYuYF_unBj3SZQ52XhVhdcwQ8PE_vU9xc2YA@mail.gmail.com> <4ECEF6FD.5050006@freebsd.org> <4ED077BF.10205@freebsd.org> <CAPNZ-WqZsSjcO=dVZpOOMtB_Y_hNcj%2BpYDA4nWPXX9kY9Vj1Wg@mail.gmail.com> <20111126075647.GA33048@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Nov 25, 2011 at 11:56:47PM -0800, Jeremy Chadwick wrote: > On Sat, Nov 26, 2011 at 12:49:24AM -0600, Kris Bauer wrote: > > On Fri, Nov 25, 2011 at 11:23 PM, Lawrence Stewart <lstewart@freebsd.org>wrote: > > > > > On 11/25/11 13:01, Lawrence Stewart wrote: > > > > > >> On 11/24/11 18:02, Kris Bauer wrote: > > >> > > >>> Hello, > > >>> > > >>> I am currently experiencing an issue with FreeBSD 9.0-RC2 r227852 > > >>> where the > > >>> net.inet.tcp.reass.curesegments value is constantly increasing (and not > > >>> descreasing when there is nominal traffic with the box). It is causing > > >>> tcp > > >>> slowdowns as described with kern/155407: > > >>> > > >>> Exhausted net.inet.tcp.reass.maxsegments block recovering tcp session > > >>> (for > > >>> this socket and any other socket waiting for retransmited packets). After > > >>> exhausted net.inet.tcp.reass.maxsegments allocation new entry in > > >>> tcp_reass > > >>> failed (for this socket and any other socket waiting for retransmited > > >>> packets). > > >>> > > >>> I have increased the reass.maxsegments value to 16384 to temporarily > > >>> avoid > > >>> the problem, but the cursegments number keeps rising and it seems it will > > >>> occur again. > > >>> > > >>> Is this an issue that anyone else has seen? I can provide more > > >>> information > > >>> if need be. > > >>> > > >> > > >> Thanks Kris, Raul and Stefan for the reports, I'll look into this. > > >> > > > > > > I think I've got it - a stupid 1 line logic bug. My apologies for missing > > > it when I reviewed the patch which introduced the bug (patch was committed > > > to head as r226113, MFCed to stable/9 as r226228). > > > > > > Due to some miscommunication, the initial patch was committed to and MFCed > > > from head much later than it should have been in the 9.0 release cycle and > > > instead of being included in the BETAs, didn't make it in until 9.0-RC1 I > > > believe i.e. only RC1 and RC2 should be experiencing the issue. > > > > > > Could those who have reported the bug and are able to recompile their > > > kernel to test a patch please try the following and report back to the list: > > > > > > > > > http://people.freebsd.org/~lstewart/patches/misctcp/tcp_reass_plugzoneleak_10.x.r227986.patch > > > > > > The patch is against head r227986 but will apply and work correctly for > > > 9.0 as well. > > > > > > Cheers, > > > Lawrence > > > > > > > I have patched, recompiled, and rebooted. net.inet.tcp.reass.cursegments > > is no longer incrementing, and connectivity is holding steady. If anything > > changes over the next couple of hours, I'll be sure to report it; but all > > preliminary signs of the problem are gone. > > > > Thanks for all the help! > > Let's not be hasty in concluding everything is fixed. Why I'm a bit on > edge about this: I took the time to find the CVS commits that induced > this issue in the first place, and it seems there is some history. > > The commit that caused this problem to begin with was supposedly a fix > for a different problem: > > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_reass.c#rev1.375 > > A week later, that commit went from HEAD/MAIN into RELENG_9: > > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_reass.c#rev1.374.2.2 > > Be sure to read the description of the problem that was being fixed in > the first place. I've also CC'd the original problem reporter, Steven > Hartland, because we're going to need him to try the above patch from > Lawrence to make sure there aren't other problems. Meaning: for all we > know, the above fix might work great for Kris but cause problems for > Steve. > > This entire situation leads me to believe very few people are doing > quality testing of RELENG_9, yet we're already into 9.0-RC2. Please > don't tell me "that's exactly why you should be running RELENG_9!"; that > is completely backwards and I refuse to get into a flame war about it, > because it's this simple: 90%+ of those running FreeBSD on servers need > something that's stable, we can't risk wonkiness (especially of this > degree!) on systems taking production traffic. Did no one actually test > the change *thoroughly*? Imagine had this lay dormant until 9.0-RELEASE. > > Lawrence: please don't take my comments personally or to mean "you broke > it and caused this mess!" It's meant to read more along the lines of > "you committed a fix for something that broke other bits badly, but > nobody noticed this, including the original reporter of a different > problem? How/why?" You get the idea. Re-sending, because the "Tested by" commit line had someone who replaced the "@" character with "-at-", so my mail client assumed the Email address was on my local machine. Sorry about that folks. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20111126080153.GA33335>