Date: Thu, 11 Aug 2011 11:13:22 +1000 From: Lawrence Stewart <lstewart@freebsd.org> To: Steven Hartland <killing@multiplay.co.uk> Cc: freebsd-net@freebsd.org, Andre Oppermann <andre@freebsd.org>, slw@zxy.spb.ru Subject: Re: tcp failing to recover from a packet loss under 8.2-RELEASE? Message-ID: <4E432CB2.3030700@freebsd.org> In-Reply-To: <20229216858044E4881642284F245750@multiplay.co.uk> References: <E18D678F05BB4F3B93ADEB304CCA8504@multiplay.co.uk><1F95A4C2D54E4F369830143CBDB5FF86@multiplay.co.uk><4E37C0F2.4080004@freebsd.org> <2B063B6D95AA4C27B004C50D96393F91@multiplay.co.uk> <C706DEE346684B8DB06CFC090F556E72@multiplay.co.uk> <4E3AA66A.6060605@freebsd.org> <20229216858044E4881642284F245750@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
On 08/05/11 00:19, Steven Hartland wrote: > ----- Original Message ----- From: "Lawrence Stewart" > <lstewart@freebsd.org> [snip] >>> So I suppose the question is should maxsegments be larger by >>> default due to the recent changes e.g. >>> - V_tcp_reass_maxseg = nmbclusters / 16; >>> + V_tcp_reass_maxseg = nmbclusters / 8; >>> >>> or is the correct fix something more involved? >> >> I'm not sure if bumping the value is appropriate - we have always >> expected users to tune their network stack to perform well when used >> in "unusual" scenarios - a large BDP fibre path still being in the >> "unusual" category. > > TBH I wouldn't classify a latency of 7ms @ 100Mbps unusal in the slightest > in this day and age. Are the TCP sessions experiencing the problem terminating on either side of that link i.e. is the RTT of the connection 7ms? Or does the fibre link form one part of the path connections are traversing? Based on your symptoms, I believe the latter is the case (the BDP of a 7ms 100Mbps fiber link is a lot smaller than your pre-tweaked reass max queue limit and therefore shouldn't have caused stalls), in which case it's not the characteristics of your fiber link that matter in their own right, but the characteristics of the complete path from sender to receiver. >> The real fix which is somewhere down on my todo list is to make all >> these memory constraints elastic and respond to VM pressure, thus >> negating the need for a hard limit at all. This would solve many if >> not most of the TCP tuning problems we currently have with one foul >> swoop and would greatly reduce the need for tuning in many situations >> that currently are in the "needs manual tuning" basket. > > This would indeed be a great improvement. > >> Andre and Steven, I'm a bit too sleepy to properly review your >> combined proposed changes right now and will follow up in the next few >> days instead. > > No problem, we've increased nmbclusters on all our machines and there now > performing fine in the problem scenario so no rush, look forward to your > feedback when you've had some sleep :) Steven, as far as my reading of the code informs me, your additional sanity checking is unnecessary - the segment only gets added to the reassembly list where the calls to LIST_INSERT_* are, and the uma_zfree() in the "if (p != NULL)" block shouldn't ever be called if the incoming segment is equal to rcv_nxt. However, I would like to see some additional sanity checking added to Andre's base patch in the form of some KASSERTs. There are a number of hidden assumptions in the current code and I think explicitly noting them with KASSERTs would be useful. I'm also paranoid about leaking a stack allocated tseg_qent across calls to tcp_reass() as that would be a horrendous bug to diagnose. Here's my tweaked version of Andre's patch: http://people.freebsd.org/~lstewart/patches/misctcp/tcp_reass.c-logdebug%2bmissingsegment-20110811-lstewart.diff It has only been compile tested at this point. BTW, when a patch is eventually committed, the logging changes should be done separately to the KASSERT/backup stack allocated tseg_qent change. Cheers, Lawrence
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E432CB2.3030700>