From owner-freebsd-pf@FreeBSD.ORG Fri Jun 19 07:24:12 2015 Return-Path: Delivered-To: freebsd-pf@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 32823708 for ; Fri, 19 Jun 2015 07:24:12 +0000 (UTC) (envelope-from freebsd-pf@dino.sk) Received: from mailhost.netlabit.sk (mailhost.netlabit.sk [84.245.65.72]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C09807AD for ; Fri, 19 Jun 2015 07:24:10 +0000 (UTC) (envelope-from freebsd-pf@dino.sk) Received: from zeta.dino.sk (fw1.dino.sk [84.245.95.252]) (AUTH: LOGIN milan) by mailhost.netlabit.sk with ESMTPA; Fri, 19 Jun 2015 09:18:58 +0200 id 00EB081E.5583C262.000002F6 Date: Fri, 19 Jun 2015 09:18:57 +0200 From: Milan Obuch To: freebsd-pf@freebsd.org Subject: Large scale NAT with PF - some weird problem Message-ID: <20150619091857.304b707b@zeta.dino.sk> X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.27; i386-portbld-freebsd10.1) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-pf@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Technical discussion and general questions about packet filter \(pf\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Jun 2015 07:24:12 -0000 Hi, I am managing FreeBSD 9 based router for a network using PF for NAT. I think I can call it large scale - there is approximately 3000 customers' devices (home routers and similar) with private IPs in segment 172.16.0.0/12 translated to /23 public address block. Basically, in pf.conf, there is nat on $if_ext from $net_int to any -> $pool_ext round-robin sticky-address and handful of binat on $if_ext from 172.16.x.y to any -> a.b.c.d statements. It works, basically, but for some time now there are some intermitent outages. When it occurs, customer's device loses access to internet. I can verify it with simple ping to any address outside of the network. The weird thing is, I can see icmp request packets coming out of external interface, but no icmp echo packets coming back. While I can't verify on uplink router that these replies are actually coming in on interface, I am pretty sure it does, but they are not visible in tcpdump's output. (When I am pinging some device outside of the network, which is under my control, I can see there both icmp requests and icmp echo packets. Also, if I ping address to which thich ping is translated from outside, I see it on external interface coming in.) I think I have a problem with same table being too small, but no idea where it is. It is not state table, I have set limit states 500000 in my pf.conf, and pfctl -vs info tells State Table Total Rate current entries 36668 searches 1996138369 29280.5/s inserts 15757727 231.1/s removals 15770004 231.3/s so I think I have plenty of room here. It was set in past when issue a bit similar occured and using bigger state table solved it. Also, pfctl -vs state | grep shows states for not working ping as all icmp a.b.c.d:538 <- 172.16.x.y:538 0:0 all icmp e.f.g.h:40011 (172.16.x.y:538) -> a.b.c.d:40011 0:0 where a.b.c.d is address being used as ping target (outside of network), 172.16.x.y is address of device with trouble access to internet, and e.f.g.h is translated address for this device, allocated dynamically. After doing /etc/rc.d/pf restart if works again, so I think, again, issue is with some table being too small. Restart empties it and things begin to work. Does this sound familiar to anybody? I was trying to find some tuning guide for pf and large scale nat, but no success yet. I would be gratefull for any help. Regards, Milan