Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Jan 2017 12:31:06 -0800
From:      Bakul Shah <bakul@bitblocks.com>
To:        Alan Somers <asomers@freebsd.org>
Cc:        Kristof Provost <kp@freebsd.org>, FreeBSD Net <freebsd-net@freebsd.org>
Subject:   Re: pf & NAT issue
Message-ID:  <20170120203106.CD2C8124AEA4@mail.bitblocks.com>
In-Reply-To: Your message of "Fri, 20 Jan 2017 08:47:43 MST." <CAOtMX2hTcEkw_WzgtcEEipGY391zB=skrk7O=dknRMMG%2BDa%2BBA@mail.gmail.com>
References:  <20170120083555.ACCF9124AEA4@mail.bitblocks.com> <7C29D00C-94C0-4550-B1B2-CE307482B544@FreeBSD.org> <CAOtMX2hTcEkw_WzgtcEEipGY391zB=skrk7O=dknRMMG%2BDa%2BBA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 20 Jan 2017 08:47:43 MST Alan Somers <asomers@freebsd.org> wrote:
> On Fri, Jan 20, 2017 at 3:48 AM, Kristof Provost <kp@freebsd.org> wrote:
> > On 20 Jan 2017, at 9:35, Bakul Shah wrote:
> >>
> >> pf seems to drop NAT connections quite a bit. This seems to
> >> happen much more frequently if there are delays involved (slow
> >> server or interactive use). Almost seems like pf losing
> >> track of NATted connections due to an uninitialized
> >> variable....  Often a retry or two works. Connecting from
> >> outside to forwarded connections to NATTED hosts works fine.
> >>
> >> This problem started after ungrading to freebsd-10. Is there a
> >> bug fix in works or a known work around (other than using ipfw
> >> or reverting to 9, which I don't want to do)?
> >>
> > The problem you describe doesn't immediately ring a bell.
> >
> > We'll have to gather a bit more information:
> >
> >  * What FreeBSD version are you running exactly?
> >  * What's your pf.conf?
> >  * Can you perform a network capture of rejected/failed connections? Ideally
> >    both on LAN and WAN on the gateway machine. Please capture full packets
> > (so
> >    tcpdump -s0 -w lan.pcap) as pcap files).
> >  * What networking cards are you using?
> >
> > Regards,
> > Kristof
> 
> Under heavy load, pf can drop information from its state table.  You
> can try increasing state table limits to see if it helps the problem.
> Read the "set limits" section of the pf man page.
> 
> -Alan

Thanks for the suggestions. Here's some info. My inline
comments are indented.

$ uname -rm
10.3-RELEASE-p4 i386

$ netstat -n | grep tcp | wc -l
13
	So the machine is lightly loaded.

$ grep -v ^# /etc/pf.conf|uniq
ext_if="rl0"
int_if="em0"
nat on $ext_if inet from ! ($ext_if) to any -> ($ext_if)

	I took out rdr entries during testing. They don't seem
	to affect this issue. I had changed src.track timeout
	to 30 seconds but that didn't seem to change anything.

$ pfctl -s memory
states        hard limit    10000
src-nodes     hard limit    10000
frags         hard limit     5000
table-entries hard limit   200000

$ pfctl -s info
Status: Enabled for 167 days 13:40:11         Debug: Urgent

State Table                          Total             Rate
  current entries                        0               
  searches                      2870986757          198.3/s # this seems high...
  inserts                          3428240            0.2/s
  removals                         3428240            0.2/s
Counters
  match                         1482741914          102.4/s
  bad-offset                             0            0.0/s
  fragment                               1            0.0/s
  short                                  0            0.0/s
  normalize                              0            0.0/s
  memory                                 0            0.0/s
  bad-timestamp                          0            0.0/s
  congestion                             0            0.0/s
  ip-option                             31            0.0/s
  proto-cksum                            0            0.0/s
  state-mismatch                     28931            0.0/s
  state-insert                           1            0.0/s
  state-limit                            0            0.0/s
  src-limit                              0            0.0/s
  synproxy                               0            0.0/s

$ tcpdump -ni rl0 host ftp4.freebsd.org # in one window
$ tcpdump -ni em0 host 192.168.125.7	# in another

	On an internal machine I did "telnet ftp4.freebsd.org
	ftp", waited for a while and then typed something.
	The following trace is interspersed in the correct
	sequence. Traffic on rl0 (external) is prefixed with <
	and traffic on em0 (internal )with >.

> 11:56:05.743745 IP 192.168.125.7.65042 > 149.20.1.200.21: Flags [S], seq 3080825146, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 176000 ecr 0], length 0
< 11:56:05.743776 IP 173.228.5.8.63716 > 149.20.1.200.21: Flags [S], seq 3080825146, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 176000 ecr 0], length 0

< 11:56:05.763294 IP 149.20.1.200.21 > 173.228.5.8.63716: Flags [S.], seq 3912707359, ack 3080825147, win 65535, options [mss 1460,nop,wscale 11,sackOK,TS val 1468113699 ecr 176000], length 0
> 11:56:05.763313 IP 149.20.1.200.21 > 192.168.125.7.65042: Flags [S.], seq 3912707359, ack 3080825147, win 65535, options [mss 1460,nop,wscale 11,sackOK,TS val 1468113699 ecr 176000], length 0

> 11:56:05.764106 IP 192.168.125.7.65042 > 149.20.1.200.21: Flags [.], ack 1, win 1026, options [nop,nop,TS val 176021 ecr 1468113699], length 0
< 11:56:05.764121 IP 173.228.5.8.63716 > 149.20.1.200.21: Flags [.], ack 1, win 1026, options [nop,nop,TS val 176021 ecr 1468113699], length 0

< 11:56:05.789192 IP 149.20.1.200.21 > 173.228.5.8.63716: Flags [P.], seq 1:55, ack 1, win 32, options [nop,nop,TS val 1468113725 ecr 176021], length 54
> 11:56:05.789204 IP 149.20.1.200.21 > 192.168.125.7.65042: Flags [P.], seq 1:55, ack 1, win 32, options [nop,nop,TS val 1468113725 ecr 176021], length 54

> 11:56:05.895660 IP 192.168.125.7.65042 > 149.20.1.200.21: Flags [.], ack 55, win 1026, options [nop,nop,TS val 176152 ecr 1468113725], length 0
< 11:56:05.895675 IP 173.228.5.8.63716 > 149.20.1.200.21: Flags [.], ack 55, win 1026, options [nop,nop,TS val 176152 ecr 1468113725], length 0

> 11:56:28.168693 IP 192.168.125.7.65042 > 149.20.1.200.21: Flags [P.], seq 1:10, ack 55, win 1026, options [nop,nop,TS val 198426 ecr 1468113725], length 9
< 11:56:28.168712 IP 173.228.5.8.52015 > 149.20.1.200.21: Flags [P.], seq 3080825147:3080825156, ack 3912707414, win 1026, options [nop,nop,TS val 198426 ecr 1468113725], length 9

	Right here we see the problem. NAT mapping for the
	port changed from 63716 to 52015.

Bakul



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170120203106.CD2C8124AEA4>