From owner-freebsd-pf@freebsd.org Mon Jun 29 07:43:22 2015 Return-Path: Delivered-To: freebsd-pf@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D429898F4CB for ; Mon, 29 Jun 2015 07:43:22 +0000 (UTC) (envelope-from freebsd-pf@dino.sk) Received: from mailhost.netlabit.sk (mailhost.netlabit.sk [84.245.65.72]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 50637157B for ; Mon, 29 Jun 2015 07:43:21 +0000 (UTC) (envelope-from freebsd-pf@dino.sk) Received: from zeta.dino.sk (fw1.dino.sk [84.245.95.252]) (AUTH: LOGIN milan) by mailhost.netlabit.sk with ESMTPA; Mon, 29 Jun 2015 09:43:18 +0200 id 000F1815.5590F716.00002349 Date: Mon, 29 Jun 2015 09:43:17 +0200 From: Milan Obuch To: Daniel Hartmeier Cc: Ian FREISLICH , freebsd-pf@freebsd.org Subject: Re: Large scale NAT with PF - some weird problem Message-ID: <20150629094317.5a0cd61a@zeta.dino.sk> In-Reply-To: <20150629065838.GA13722@insomnia.benzedrine.ch> References: <20150620182432.62797ec5@zeta.dino.sk> <20150619091857.304b707b@zeta.dino.sk> <14e119e8fa8.2755.abfb21602af57f30a7457738c46ad3ae@capeaugusta.com> <20150621195753.7b162633@zeta.dino.sk> <20150623112331.668395d1@zeta.dino.sk> <20150628100609.635544e0@zeta.dino.sk> <20150629065838.GA13722@insomnia.benzedrine.ch> X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.27; i386-portbld-freebsd10.1) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-pf@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Technical discussion and general questions about packet filter \(pf\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Jun 2015 07:43:23 -0000 On Mon, 29 Jun 2015 08:58:38 +0200 Daniel Hartmeier wrote: > On Sun, Jun 28, 2015 at 10:06:09AM +0200, Milan Obuch wrote: > > > So, now I am at 10.2-PRERELEASE, r284884, and the issue is still > > here. It is totally weird, just change of IP the device is being > > natted to makes the issue disappear for this particular customer, > > but as soon as this exact IP is used again, the issue is here again. > > I'd go over the entire network config (pf.conf, pfctl -sa, rc.conf, > netstat -anr, ifconfig, arp -an) and look for any mistake, like a > typo or a netmask which isn't what you thought it is (like on an > alias), or for any weirdness related to that IP address. > > Daniel Thanks for hint, there is some logic in there, however grep /etc/* yields nothing, it is never mentioned in any config, just as part of pool in pf.conf statement nat on $if_ext from to any -> $pool_ext round-robin sticky-address It is not mentioned in 'pfctl -sa' output, 'arp -an' output, 'netstat -anr' output... nowhere. I did not mention this box runs quagga for configuring network, mainly routing via OSPF, but I do not think it is relevant to the problem I see as this is basically userland process communicating with forwarding path in kernel to configure routing, nothing else, and, naturally, it does not work with this particular IP either. I should have seen it otherwise in some of above mentioned commands output, I think. Just to repeat myself a bit, when this problematic state occurs, some intenal IP is translated to this one offending public IP, and communication is broken in such a way I see no returning packets from outside world on uplink interface in tcpdump even if I know they are there because I can ping some other box outside where I can verify that and they are there... I just found some other strange, to me, thing - in 'pfctl -sa' output, section SOURCE TRACKING NODES, almost all entries are in form -> ( states ..., connections ..., rate ... ) but there are some of them with first IP being public, second one 0.0.0.0 - where they could come from? Also, there are only couple of them, but in one is something even a bit more weird - in parens is 'states 4294967295', which seems a bit absurd to me, also, worth to mention, it is 0xffffffff in hexadecimal, and this looks like some underflow issue in the code. Maybe this deserves some closer pf developer's attention, I just don't know who that could be... smells like a bug. Regards, Milan