Date: Wed, 31 Aug 2022 01:42:16 +0000 From: tt78347 <tt78347@protonmail.com> To: Lutz Donnerhacke <lutz@donnerhacke.de> Cc: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org> Subject: Re: IPFW NAT intermittently fails to redirect UDP packets; seeking DTrace scripts or other advice Message-ID: <HhQgyIwbcYuEmalaD76MpikeMKmy7_yGN0V3iFPeY2g9TlDmttL9pDKMNRd6ScLpfkoydEj0mHDLsIJQfKUoJF9OwdYwap6Ui5v7MtwdJjg=@protonmail.com> In-Reply-To: <20220830161145.GA31694@belenus.iks-jena.de> References: <gg17I_Npe7ROH1jMb1q1NImxP-WeYJ1Onu-QT6OKzybIsUP1GLxQyhTqHXO6rqTSJlI9t776Kb_cfCdps8xH5aaSWxTerm8MCaG2qb0i770=@protonmail.com> <20220830161145.GA31694@belenus.iks-jena.de>
next in thread | previous in thread | raw e-mail | index | archive | help
=20 > Only a quick look ... >=20 > There is no guarantee, that the ports of the UDP packets are not modified= by > libalias (NAT is designed to do exactly this modification). So some of th= e > matches seems to be a bit optimistic, >=20 > > - This system has net.inet.ip.fw.one_pass=3D0 >=20 >=20 > man ipfw > To let the packet continue after being (de)aliased, set the sysctl > variable net.inet.ip.fw.one_pass to 0. For more information about > aliasing modes, refer to libalias(3). >=20 > Hence the NAT is applied multiple times if the path through the rules is = a > bit unlucky. >=20 Thank you for your response. Thanks for bringing up this point about ports. I had not thought about it. However, I'm not sure exactly what you mean here. redirect_port should not= =20 change the destination port of incoming packets, and if I am not mistaken, = rule=20 452 should allow all relevant incoming packets through (after they have bee= n=20 processed by NAT). Unless I have made a foolish error, rules 450-452 specif= y=20 destination ports. On the other hand, since we are forwarding it's true that incoming and outg= oing=20 packets are evaluated by the firewall twice in this case, once at the exter= nal=20 interface and once at bridge0 (or on the epair, I'm not sure which, I think= I've=20 seen both). I don't see how that could be causing an issue, since even when= the=20 packet it as the bridge, it should still match "via $extif", since "recv=20 $extif" is still true. So it would still match 450-452.=20 Though, I can't rule out that I have a major misunderstanding about how IPF= W=20 works- it has happened before. In fact, as I do some further experimenting,= I'm=20 starting to doubt whether what I said above is correct. >=20 > The traces show, that the problematic cases are those where the packets a= re > not (de)aliased. This can be the case, when libalias has no more free por= ts > available for aliasing. In such a case, the packet is returned unmodified > (unaliased) with an error code. I'm not sure, if this will cause a packet > drop or not, especially in the one_pass=3D0 case. >=20 > It might be possible, that duplicate packets (or quickly repeated ones) > trigger an unintended aliasing of the source port. This will create an fl= ow > in the NAT table which is handled before the port redirection. And it mig= ht > miss the rules with explicit port numbers. >=20 > But this will be probably the wrong idea. I am intrigued by this idea of unintended creation of NAT flows. It's not= =20 something I am an expert in by any means. However, I do not think source po= rts=20 are changing here under any circumstances, because I have never witnessed a= =20 packet trace with any ports aside from 500 and 4500. But, what you have said about NAT flows being inadvertently created is stil= l=20 interesting, and I had not thought about it. It sounds like it could be a= =20 factor. I will experiment further. Is there a good way to examine the conte= nts=20 of this table? I will also mention that while this overall setup was working properly prio= r to=20 my upgrade to 12.3, I did not have rules 450-452 specified explicitly as I = do=20 here. I had placed them here early on in an attempt to fix the issue. Prior to the upgrade, all NAT was handled in 500-540. More information / report on today's observations: I'm not sure if any of this information is useful, but here it is in case i= t=20 provides any clues. This issue has actually been happening more frequently now that I've starte= d to=20 experiment with it more and also after moving some traffic off of this host= . It happened again today, and I was actually able to start natd (previously = I=20 had an error, but I've now invoked it in the foreground using -v). Specifyi= ng a=20 divert rule for this natd instance on rule 445 fixed the issue, but only fo= r=20 about 20 minutes. As I was experimenting to try to see if my rules were wro= ng, it started to work again, apparently not due to any experimental changes I = had=20 made, since after eliminating these changes, it still continued to work as= =20 expected. I also witnessed something quite extraordinary and, to me, inexplicable. So= =20 far, I've been talking about a specific host that has been having this prob= lem.=20 I've referred to its IP address as 1.1.1.1, and for my packet traces, I've = been=20 making reference to an external host whose packets often have this issue on= =20 1.1.1.1, calling that external host 2.2.2.2; this is the external host whos= e=20 packets are in the packet traces I posted. Just now, as I was doing some experimentation on 1.1.1.1 as mentioned above= ,=20 the same issue was produced on 2.2.2.2 (with other hosts on my network), ev= en=20 though 2.2.2.2 has not experienced this issue for 6 months or more. Inciden= tally,=20 2.2.2.2 sends and receives far, far more UDP on 500,4500 than 1.1.1.1. The = only=20 distinguishable thing that I did on 2.2.2.2 before the issue occurred was t= o=20 try to initiate an outgoing IKE connection repeatedly to 1.1.1.1, as I was= =20 experimenting on 1.1.1.1. But I can't imagine that this is the first time t= hat=20 I've done that.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?HhQgyIwbcYuEmalaD76MpikeMKmy7_yGN0V3iFPeY2g9TlDmttL9pDKMNRd6ScLpfkoydEj0mHDLsIJQfKUoJF9OwdYwap6Ui5v7MtwdJjg=>