Date: Fri, 22 Jan 2016 16:02:02 -0600 From: Matthew Grooms <mgrooms@shrew.net> To: freebsd-net@freebsd.org Subject: Re: pf state disappearing [ adaptive timeout bug ] Message-ID: <56A2A6DA.1040304@shrew.net> In-Reply-To: <CAKOb=YYwPO-VYG9wC7x1eBDPFQUnv48PC2XV%2BacRm2Sa9P%2BXOw@mail.gmail.com> References: <56A003B8.9090104@shrew.net> <CAKOb=YakqYqeGYUh3PKm-PGQma7E69ZPAtAe7og3byN7s5d4SA@mail.gmail.com> <56A13531.8090209@shrew.net> <CAKOb=YYwPO-VYG9wC7x1eBDPFQUnv48PC2XV%2BacRm2Sa9P%2BXOw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 1/22/2016 3:35 PM, Nick Rogers wrote: > On Thu, Jan 21, 2016 at 11:44 AM, Matthew Grooms <mgrooms@shrew.net> wrote: > >> # pfctl -si >> Status: Enabled for 0 days 02:25:41 Debug: Urgent >> >> State Table Total Rate >> current entries 77759 >> searches 483831701 55352.0/s >> inserts 825821 94.5/s >> removals 748060 85.6/s >> Counters >> match 27118754 3102.5/s >> bad-offset 0 0.0/s >> fragment 0 0.0/s >> short 0 0.0/s >> normalize 0 0.0/s >> memory 0 0.0/s >> bad-timestamp 0 0.0/s >> congestion 0 0.0/s >> ip-option 6655 0.8/s >> proto-cksum 0 0.0/s >> state-mismatch 0 0.0/s >> state-insert 0 0.0/s >> state-limit 0 0.0/s >> src-limit 0 0.0/s >> synproxy 0 0.0/s >> >> # pfctl -st >> tcp.first 120s >> tcp.opening 30s >> tcp.established 86400s >> tcp.closing 900s >> tcp.finwait 45s >> tcp.closed 90s >> tcp.tsdiff 30s >> udp.first 600s >> udp.single 600s >> udp.multiple 900s >> icmp.first 20s >> icmp.error 10s >> other.first 60s >> other.single 30s >> other.multiple 60s >> frag 30s >> interval 10s >> adaptive.start 90000 states >> adaptive.end 120000 states >> src.track 0s >> >> I think there may be a problem with the code that calculates adaptive >> timeout values that is making it way too aggressive. If by default it's >> supposed to decrease linearly between %60 and %120 of the state table max, >> I shouldn't be loosing TCP connections that are only idle for a few minutes >> when the sate table is < %70 full. Unfortunately that appears to be the >> case. At most this should have decreased the 86400s timeout by %17 to >> 72000s for established TCP connections. > That doesn't make sense to me either. Even if the math is off by a factor > of 10 the state should live for about 24 minutes. > >> I've tested this for a few hours now and all my idle SSH sessions have >> been rock solid. If anyone else is scratching their head over a problem >> like this, I would suggest disabling the adaptive timeout feature or >> increasing it to a much higher value. Maybe one of the pf maintainers can >> chime in and shed some light on why this is happening. If not, I'm going to >> file a bug report as this certainly feels like one. >> > Did you go with making adaptive timeout less aggressive or disable it > entirely? I would think that if adaptive timeout is really that broken more > people would notice this problem, especially myself since I have many > servers running a very short tcp.established timeout, but the fact that you > are noticing this kind of weirdness has me concerned about how the adaptive > setting is affecting my environment. I increased the value to 90K for the 10K limit. Yes, it's concerning. Today I setup a test environment at about 1/10th the connections to see if I could reproduce the issue on a smaller scale, but had no luck. I'm trying to find a cmd line test program that will generate enough tcp connections so I can reproduce it on a similar scale to my production environment. So far I haven't found anything that will do the trick. I may end up rolling my own. I'll reply back to the list if I can find a way to reproduce this. Thanks again, -Matthew
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?56A2A6DA.1040304>