From owner-freebsd-net@FreeBSD.ORG Wed Dec 30 21:30:03 2009 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 825F0106566B for ; Wed, 30 Dec 2009 21:30:03 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outC.internet-mail-service.net (outc.internet-mail-service.net [216.240.47.226]) by mx1.freebsd.org (Postfix) with ESMTP id 6545E8FC18 for ; Wed, 30 Dec 2009 21:30:03 +0000 (UTC) Received: from idiom.com (mx0.idiom.com [216.240.32.160]) by out.internet-mail-service.net (Postfix) with ESMTP id 2EFD7B3E6; Wed, 30 Dec 2009 13:30:03 -0800 (PST) X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e Received: from julian-mac.elischer.org (h-67-100-89-137.snfccasy.static.covad.net [67.100.89.137]) by idiom.com (Postfix) with ESMTP id 6E9432D6019; Wed, 30 Dec 2009 13:30:02 -0800 (PST) Message-ID: <4B3BC659.7010707@elischer.org> Date: Wed, 30 Dec 2009 13:30:01 -0800 From: Julian Elischer User-Agent: Thunderbird 2.0.0.23 (Macintosh/20090812) MIME-Version: 1.0 To: Ian Smith References: <20091230002447.GA55727@onelab2.iet.unipi.it> <4B3AA290.8000508@elischer.org> <20091230221119.L81420@sola.nimnet.asn.au> In-Reply-To: <20091230221119.L81420@sola.nimnet.asn.au> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Luigi Rizzo , net@freebsd.org Subject: Re: RFC: documented and actual behaviour of "ipfw tee" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Dec 2009 21:30:03 -0000 Ian Smith wrote: > On Tue, 29 Dec 2009, Julian Elischer wrote: > > Luigi Rizzo wrote: > > > There a difference between the documented and actual behaviour of > > > "ipfw tee" which occurs when there are multiple rules with the same > > > number, e.g. > > > > > > rule_id number body > > > r1 500 tee port1 dst-ip 1.2.3.0/24 > > > r2 500 tee port2 dst-ip 1.2.4.0/24 > > > r3 500 accept ip from any to any > > > r4 510 count ip from any to any > > > > > > + the manpage says "processing continues with the NEXT RULE" > > > (so after r1 we have r2, then r3, ...); > > > + the implementation behaves as "processing continues with the > > > NEXT NUMBERED RULE" (ie. after 500 continues with 510). > > > > > > > TEE should go to the next RULE with the original packet, but if > > you reinject the tee'd copy of the packet it should go to the > > next rule NUMBER. > > Which is what happens now, right? Same behaviour on tee reinjection as > divert does seem consistent. So if there is a problem, it's only with > the original packet continuing with the next rule if same-numbered? from Luigi's description I'm not sure what happens now.. :-) teh two cases are different. Processing with the original packet acts as if the rule had done nothing. Processiong with a reinjected packet acts the same as a reinjected divert packet.. i.e. next rule NUMBER not next rule. > > > > The actual behaviour is an artifact of how "divert" is implemented: > > > diverted packet only carry the rule number so we cannot tell, on a > > > reinject, which of the rules numbered "500" matched, and we restart > > > from the next one. Tee was implemented as an extension of divert. > > It seems fair that tee act the same as divert on reinjection, and this > can't be changed without breaking existing divert socket code eg natd? it's also the only way it can work really. It can't tell the difference between rules with the same number. > > > > Skipping rules in my opinion is very unintuitive, but there is > > > no way to fix it (unless we extend the API) as the rule_id is only > > > known within the kernel. > > > > > > For 'tee', however, packets the situation is different because the > > > copy of the packet that remains in the kernel does not lose knowledge > > > of the matching rule so we can easily continue from the very next > > > rule, same as it happens for dummynet packets with one_pass=0 (and > > > tee'd netgraph packets, which I think already do "the right thing"). > > Hmm. After divert you can match 'diverted' to distinguish reinjected > packets later. Does/can/should this apply to reinjected tee'd packets? yes, there is no such thing as a reinjected tee packet as teh user app can't tell if it was diverted or teed. > > Similarly perhaps, with a set of same-numbered nat rules, are mapped > packets 'reinjected' at the next rule, or the next higher-numbered rule? I think NAT processing in the kernel can keep track od where it is up to, so next RULE. (differnet from userland nat via divert). > > > > Since I am doing some work in this are of the code, I'd like to ask > > > opinions on how to proceed: > > > > > > A. preserve the current behaviour and fix the manpage; > > I tend to this, though probably not knowing all the ramifications, > especially not having played with ng_ipfw stuff at all. > > So for A, here's what we have, with suggested clarification in []: > > divert port > Divert packets that match this rule to the divert(4) socket bound > to port port. The search terminates. [Reinjected packets continue > at the next higher-numbered rule.] > > tee port > Send a copy of packets matching this rule to the divert(4) socket > bound to port port. The search continues with the next rule. > [Reinjected packets continue at the next higher-numbered rule.] yes > > > > B. fix the code to behave as the manpage says; > > Seems it's already correct regarding the original packet, and just needs > clarifying re the reinjected packets, if I'm following this right? I think the man page should reflec the behavious mentionned above. i.e. copy and original packets continue at differnet rules. > > > > C. introduce a sysctl to choose between A and B. > > > Of course this moves the problem on which default > > > to choose :) no > > > > > > Because it is a very special case that I doubt many people have hit, > > > I'd be inclined to do B and consider the old behaviour a bug. no the original behaviour was not accidental. They were never going to come to teh same rule unless the next rule is on a different number. > > Mike Makonnen's ipfw-classifyd can reinject packets at specified rule > numbers by tcp/udp port classification by updating the tag/number, and > has the same issue. There was some confusion there too regarding this, > that I think a man clarification may have helped avoid. > > I'm also a bit confused by apparent overloading of one_pass function for > dummynet pipe, netgraph, ng_tee and now nat too. What if you want to > do kernel nat but wanted one_pass behaviour for pipes? Separate issue > but similar distinction between divert vs in-kernel behaviour maybe? yes that has sort of worried me too, but I haven't hit it in practice (yet). > > FWIW, Ian