From owner-freebsd-net@freebsd.org  Thu Jan 21 19:45:12 2016
Return-Path: <owner-freebsd-net@freebsd.org>
Delivered-To: freebsd-net@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0B3EAA8C926
 for <freebsd-net@mailman.ysv.freebsd.org>;
 Thu, 21 Jan 2016 19:45:12 +0000 (UTC)
 (envelope-from mgrooms@shrew.net)
Received: from mx1.shrew.net (mx1.shrew.net [38.97.5.131])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id AEB771E57
 for <freebsd-net@freebsd.org>; Thu, 21 Jan 2016 19:45:11 +0000 (UTC)
 (envelope-from mgrooms@shrew.net)
Received: from mail.shrew.net (mail.shrew.prv [10.24.10.20])
 by mx1.shrew.net (8.14.7/8.14.7) with ESMTP id u0LJgO5n096209
 for <freebsd-net@freebsd.org>; Thu, 21 Jan 2016 13:42:24 -0600 (CST)
 (envelope-from mgrooms@shrew.net)
Received: from [10.16.32.30] (72-48-144-84.static.grandenetworks.net
 [72.48.144.84])
 by mail.shrew.net (Postfix) with ESMTPSA id 4776018C688
 for <freebsd-net@freebsd.org>; Thu, 21 Jan 2016 13:42:19 -0600 (CST)
From: Matthew Grooms <mgrooms@shrew.net>
Subject: Re: pf state disappearing [ adaptive timeout bug ]
To: freebsd-net@freebsd.org
References: <56A003B8.9090104@shrew.net>
 <CAKOb=YakqYqeGYUh3PKm-PGQma7E69ZPAtAe7og3byN7s5d4SA@mail.gmail.com>
Message-ID: <56A13531.8090209@shrew.net>
Date: Thu, 21 Jan 2016 13:44:49 -0600
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101
 Thunderbird/38.5.1
MIME-Version: 1.0
In-Reply-To: <CAKOb=YakqYqeGYUh3PKm-PGQma7E69ZPAtAe7og3byN7s5d4SA@mail.gmail.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3
 (mx1.shrew.net [10.24.10.10]); Thu, 21 Jan 2016 13:42:24 -0600 (CST)
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Jan 2016 19:45:12 -0000

On 1/21/2016 11:04 AM, Nick Rogers wrote:
> On Wed, Jan 20, 2016 at 2:01 PM, Matthew Grooms<mgrooms@shrew.net>  wrote:
>
>> All,
>>
>> I have a curious problem with a lightly loaded pair of pf firewall running
>> on FreeBSD 10.2-RELEASE.  I'm noticing TCP entries are disappearing from
>> the state table for no good reason that I can see. The entry limit is set
>> to 100000 and I never see the system go over about 70000 entries, so we
>> shouldn't be hitting the configured limit ...
>>
> In my experience if you hit the state limit, new connections/states are
> dropped and existing states are unaffected.

Aha! You shook something out of the dusty depths of my slow brain :) I 
believe that what you say is true as long as adaptive timeouts are 
disabled, which by default they are not ...

            Timeout values can be reduced adaptively as the number of 
state ta-
            ble entries grows.

            adaptive.start
                  When the number of state entries exceeds this value, 
adaptive
                  scaling begins.  All timeout values are scaled 
linearly with
                  factor (adaptive.end - number of states) / (adaptive.end -
                  adaptive.start).
            adaptive.end
                  When reaching this number of state entries, all 
timeout val-
                  ues become zero, effectively purging all state entries 
imme-
                  diately.  This value is used to define the scale 
factor, it
                  should not actually be reached (set a lower state 
limit, see
                  below).

            Adaptive timeouts are enabled by default, with an adaptive.start
            value equal to 60% of the state limit, and an adaptive.end value
            equal to 120% of the state limit.  They can be disabled by 
setting
            both adaptive.start and adaptive.end to 0.
>> # pfctl -sm
>> states        hard limit   100000
>> src-nodes     hard limit   100000
>> frags         hard limit    50000
>> table-entries hard limit   200000
>>
>> # pfctl -si
>> Status: Enabled for 78 days 14:24:18          Debug: Urgent
>>
>> State Table                          Total             Rate
>>    current entries                    67829
>>    searches                    113412118733        16700.2/s
>>    inserts                        386313496           56.9/s
>>    removals                       386245667           56.9/s
>> Counters
>>    match                          441731678           65.0/s
>>    bad-offset                             0            0.0/s
>>    fragment                            1090            0.0/s
>>    short                                220            0.0/s
>>    normalize                            761            0.0/s
>>    memory                                 0            0.0/s
>>    bad-timestamp                          0            0.0/s
>>    congestion                             0            0.0/s
>>    ip-option                        4366487            0.6/s
>>    proto-cksum                            0            0.0/s
>>    state-mismatch                     50334            0.0/s
>>    state-insert                          10            0.0/s
>>    state-limit                            0            0.0/s
>>    src-limit                              0            0.0/s
>>    synproxy                               0            0.0/s
>>
>> This problem is easy to reproduce by establishing an SSH connection to the
>> firewall itself, letting it sit for a while and then examining the state
>> table. After a connection is made, I can see the entry with an
>> established:established state ...
>>
>> # pfctl -ss | grep X.X.X.X | grep 63446
>> all tcp Y.Y.Y.Y:22 <- X.X.X.X:63446       ESTABLISHED:ESTABLISHED
>>
>> If I let the SSH session sit for a while and then try to type into the
>> terminal on the client end, the connection stalls and produces a network
>> error message. When I look at the pf state table again, the state entry for
>> the connection is no longer visible. However, the ssh process is still
>> running and I still see the TCP connection established in the output of
>> netstat ...
>>
>> # netstat -na | grep 63446
>> tcp4       0      0 Y.Y.Y.Y.22         X.X.X.X.63446     ESTABLISHED
>>
>> When I observe the packet flow in TCP dump when a connection stalls,
>> packets being sent from the client are visible on the physical interface
>> but are shown as blocked on the pflog0 interface.
>>
> Does this happen with non-SSH connections? It sounds like your SSH
> client/server interaction is not performing a keep-alive frequently enough
> to keep the PF state established. If no packets are sent over the
> connection (state) for some time, then PF will timeout (remove) the state.
> At this point your SSH client still believes it has a successful
> connection, so it tries to send packets when you resume typing, but they
> are blocked by your PF rules which likely specify "flags S/SA keep state",
> either explicitly or implicitly (it is the filter rule default), which
> means block packets that don't match an existing state or are not part of
> the initial SYN handshake of the TCP connection.

It happened with UDP SIP and log running HTTP sessions that sit idle as 
well. The SSH connection was just the easiest to test. Besides that, the 
default TCP timeout value for established connections is quite high at 
86400s. An established TCP connection should be able to sit for a full 
day with no traffic before the related state table entry gets evicted.

> Look at your settings in pf.conf for "timeout tcp.established", which
> affects how long before an idle ESTABLISHED state will timeout. Also look
> into ClientAliveInterval in sshd configuration, which I believe is 0
> (disabled) by default, which means it will let the client timeout without
> sending a keep-alive. If you don't want PF to force timeout an idle SSH
> connection, then ideally ClientAliveInterval is less than or equal (i.e.,
> more-frequent) to PF's tcp.established timeout value.

Thanks for the suggestion! I completely forgot about the adaptive 
timeout options until I double checked the settings based on you reply 
:) My values are set to default for TCP and extended a bit for UDP. The 
adaptive.start value was calculated at 60k for the 100k state limit. 
That in particular looked way too relevant to be a coincidence. After 
increasing the value to 90k, my total state count started increasing and 
leveled out around 75k. It's always hovered around 65k up until now, so 
10k sate entries were being discarded on a regular basis ...

# pfctl -si
Status: Enabled for 0 days 02:25:41           Debug: Urgent

State Table                          Total             Rate
   current entries                    77759
   searches                       483831701        55352.0/s
   inserts                           825821           94.5/s
   removals                          748060           85.6/s
Counters
   match                           27118754         3102.5/s
   bad-offset                             0            0.0/s
   fragment                               0            0.0/s
   short                                  0            0.0/s
   normalize                              0            0.0/s
   memory                                 0            0.0/s
   bad-timestamp                          0            0.0/s
   congestion                             0            0.0/s
   ip-option                           6655            0.8/s
   proto-cksum                            0            0.0/s
   state-mismatch                         0            0.0/s
   state-insert                           0            0.0/s
   state-limit                            0            0.0/s
   src-limit                              0            0.0/s
   synproxy                               0            0.0/s

# pfctl -st
tcp.first                   120s
tcp.opening                  30s
tcp.established           86400s
tcp.closing                 900s
tcp.finwait                  45s
tcp.closed                   90s
tcp.tsdiff                   30s
udp.first                   600s
udp.single                  600s
udp.multiple                900s
icmp.first                   20s
icmp.error                   10s
other.first                  60s
other.single                 30s
other.multiple               60s
frag                         30s
interval                     10s
adaptive.start            90000 states
adaptive.end             120000 states
src.track                     0s

I think there may be a problem with the code that calculates adaptive 
timeout values that is making it way too aggressive. If by default it's 
supposed to decrease linearly between %60 and %120 of the state table 
max, I shouldn't be loosing TCP connections that are only idle for a few 
minutes when the sate table is < %70 full. Unfortunately that appears to 
be the case. At most this should have decreased the 86400s timeout by 
%17 to 72000s for established TCP connections.

I've tested this for a few hours now and all my idle SSH sessions have 
been rock solid. If anyone else is scratching their head over a problem 
like this, I would suggest disabling the adaptive timeout feature or 
increasing it to a much higher value. Maybe one of the pf maintainers 
can chime in and shed some light on why this is happening. If not, I'm 
going to file a bug report as this certainly feels like one.

Thanks again,

-Matthew