From nobody Wed Aug 31 01:42:16 2022 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4MHRkV3qSyz4brKN for ; Wed, 31 Aug 2022 01:42:26 +0000 (UTC) (envelope-from tt78347@protonmail.com) Received: from mail-40132.protonmail.ch (mail-40132.protonmail.ch [185.70.40.132]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "protonmail.com", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4MHRkT3xcJz3Kn4 for ; Wed, 31 Aug 2022 01:42:25 +0000 (UTC) (envelope-from tt78347@protonmail.com) Date: Wed, 31 Aug 2022 01:42:16 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail3; t=1661910142; x=1662169342; bh=QikfrWa3Vbzaio8KajzBe4QjOrUeq48dRlgGVvhAtU4=; h=Date:To:From:Cc:Reply-To:Subject:Message-ID:In-Reply-To: References:Feedback-ID:From:To:Cc:Date:Subject:Reply-To: Feedback-ID:Message-ID; b=OEkFLrOL7XSoC6b1INv7C6P8E6/NZWvJpjwd5IXrTers6krG+cmD2tlgndJCioWxa TKHIkG88XIb+ciWjhpes0QpcqtZiCJMxYAYuCYWZLPi+emYhtBjn1YUXFnXykZCLe0 +SHwkSrRTOc3MDP3sp+xgzO7aCtaHcAW40IuCudjfoDBih5RUFBHzuk57XN4Zj0ZDN 8mgq7pOCksDYP0jlnGyfSnLzEAN9t/rBPefsCASOEXEder5c8S9SdhjgHkiTGScCnX LIXM0eUBhiVG/GT1eQFtnrW/q7WnWeEOa79eztTq4s11fpmU69tt5fQg8VeVIKaYwf bJ19paBbvXCrg== To: Lutz Donnerhacke From: tt78347 Cc: "freebsd-net@freebsd.org" Reply-To: tt78347 Subject: Re: IPFW NAT intermittently fails to redirect UDP packets; seeking DTrace scripts or other advice Message-ID: In-Reply-To: <20220830161145.GA31694@belenus.iks-jena.de> References: <20220830161145.GA31694@belenus.iks-jena.de> Feedback-ID: 43781169:user:proton List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4MHRkT3xcJz3Kn4 X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; dkim=pass header.d=protonmail.com header.s=protonmail3 header.b=OEkFLrOL; dmarc=pass (policy=quarantine) header.from=protonmail.com; spf=pass (mx1.freebsd.org: domain of tt78347@protonmail.com designates 185.70.40.132 as permitted sender) smtp.mailfrom=tt78347@protonmail.com X-Spamd-Result: default: False [0.00 / 15.00]; NEURAL_SPAM_MEDIUM(0.99)[0.988]; NEURAL_SPAM_LONG(0.70)[0.695]; NEURAL_HAM_SHORT(-0.68)[-0.682]; DMARC_POLICY_ALLOW(-0.50)[protonmail.com,quarantine]; R_SPF_ALLOW(-0.20)[+ip4:185.70.40.0/24]; R_DKIM_ALLOW(-0.20)[protonmail.com:s=protonmail3]; MIME_GOOD(-0.10)[text/plain]; HAS_REPLYTO(0.00)[tt78347@protonmail.com]; MLMMJ_DEST(0.00)[freebsd-net@freebsd.org]; FROM_EQ_ENVFROM(0.00)[]; RCVD_COUNT_ZERO(0.00)[0]; FREEMAIL_ENVFROM(0.00)[protonmail.com]; MIME_TRACE(0.00)[0:+]; RCPT_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_SOME(0.00)[]; ASN(0.00)[asn:62371, ipnet:185.70.40.0/24, country:CH]; FROM_HAS_DN(0.00)[]; REPLYTO_EQ_FROM(0.00)[]; TO_DN_EQ_ADDR_SOME(0.00)[]; DKIM_TRACE(0.00)[protonmail.com:+]; TO_DN_SOME(0.00)[]; FREEMAIL_REPLYTO(0.00)[protonmail.com]; FREEMAIL_FROM(0.00)[protonmail.com]; ARC_NA(0.00)[]; MID_RHS_MATCH_FROM(0.00)[] X-ThisMailContainsUnwantedMimeParts: N =20 > Only a quick look ... >=20 > There is no guarantee, that the ports of the UDP packets are not modified= by > libalias (NAT is designed to do exactly this modification). So some of th= e > matches seems to be a bit optimistic, >=20 > > - This system has net.inet.ip.fw.one_pass=3D0 >=20 >=20 > man ipfw > To let the packet continue after being (de)aliased, set the sysctl > variable net.inet.ip.fw.one_pass to 0. For more information about > aliasing modes, refer to libalias(3). >=20 > Hence the NAT is applied multiple times if the path through the rules is = a > bit unlucky. >=20 Thank you for your response. Thanks for bringing up this point about ports. I had not thought about it. However, I'm not sure exactly what you mean here. redirect_port should not= =20 change the destination port of incoming packets, and if I am not mistaken, = rule=20 452 should allow all relevant incoming packets through (after they have bee= n=20 processed by NAT). Unless I have made a foolish error, rules 450-452 specif= y=20 destination ports. On the other hand, since we are forwarding it's true that incoming and outg= oing=20 packets are evaluated by the firewall twice in this case, once at the exter= nal=20 interface and once at bridge0 (or on the epair, I'm not sure which, I think= I've=20 seen both). I don't see how that could be causing an issue, since even when= the=20 packet it as the bridge, it should still match "via $extif", since "recv=20 $extif" is still true. So it would still match 450-452.=20 Though, I can't rule out that I have a major misunderstanding about how IPF= W=20 works- it has happened before. In fact, as I do some further experimenting,= I'm=20 starting to doubt whether what I said above is correct. >=20 > The traces show, that the problematic cases are those where the packets a= re > not (de)aliased. This can be the case, when libalias has no more free por= ts > available for aliasing. In such a case, the packet is returned unmodified > (unaliased) with an error code. I'm not sure, if this will cause a packet > drop or not, especially in the one_pass=3D0 case. >=20 > It might be possible, that duplicate packets (or quickly repeated ones) > trigger an unintended aliasing of the source port. This will create an fl= ow > in the NAT table which is handled before the port redirection. And it mig= ht > miss the rules with explicit port numbers. >=20 > But this will be probably the wrong idea. I am intrigued by this idea of unintended creation of NAT flows. It's not= =20 something I am an expert in by any means. However, I do not think source po= rts=20 are changing here under any circumstances, because I have never witnessed a= =20 packet trace with any ports aside from 500 and 4500. But, what you have said about NAT flows being inadvertently created is stil= l=20 interesting, and I had not thought about it. It sounds like it could be a= =20 factor. I will experiment further. Is there a good way to examine the conte= nts=20 of this table? I will also mention that while this overall setup was working properly prio= r to=20 my upgrade to 12.3, I did not have rules 450-452 specified explicitly as I = do=20 here. I had placed them here early on in an attempt to fix the issue. Prior to the upgrade, all NAT was handled in 500-540. More information / report on today's observations: I'm not sure if any of this information is useful, but here it is in case i= t=20 provides any clues. This issue has actually been happening more frequently now that I've starte= d to=20 experiment with it more and also after moving some traffic off of this host= . It happened again today, and I was actually able to start natd (previously = I=20 had an error, but I've now invoked it in the foreground using -v). Specifyi= ng a=20 divert rule for this natd instance on rule 445 fixed the issue, but only fo= r=20 about 20 minutes. As I was experimenting to try to see if my rules were wro= ng, it started to work again, apparently not due to any experimental changes I = had=20 made, since after eliminating these changes, it still continued to work as= =20 expected. I also witnessed something quite extraordinary and, to me, inexplicable. So= =20 far, I've been talking about a specific host that has been having this prob= lem.=20 I've referred to its IP address as 1.1.1.1, and for my packet traces, I've = been=20 making reference to an external host whose packets often have this issue on= =20 1.1.1.1, calling that external host 2.2.2.2; this is the external host whos= e=20 packets are in the packet traces I posted. Just now, as I was doing some experimentation on 1.1.1.1 as mentioned above= ,=20 the same issue was produced on 2.2.2.2 (with other hosts on my network), ev= en=20 though 2.2.2.2 has not experienced this issue for 6 months or more. Inciden= tally,=20 2.2.2.2 sends and receives far, far more UDP on 500,4500 than 1.1.1.1. The = only=20 distinguishable thing that I did on 2.2.2.2 before the issue occurred was t= o=20 try to initiate an outgoing IKE connection repeatedly to 1.1.1.1, as I was= =20 experimenting on 1.1.1.1. But I can't imagine that this is the first time t= hat=20 I've done that.