From owner-freebsd-net@FreeBSD.ORG Mon Aug 21 20:03:29 2006 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2C68C16A4EB for ; Mon, 21 Aug 2006 20:03:29 +0000 (UTC) (envelope-from cristjc@comcast.net) Received: from rwcrmhc12.comcast.net (rwcrmhc12.comcast.net [216.148.227.152]) by mx1.FreeBSD.org (Postfix) with ESMTP id D4B4843DC5 for ; Mon, 21 Aug 2006 20:00:17 +0000 (GMT) (envelope-from cristjc@comcast.net) Received: from goku.cjclark.org (c-24-6-168-219.hsd1.ca.comcast.net[24.6.168.219]) by comcast.net (rwcrmhc12) with ESMTP id <20060821195941m1200sfifee>; Mon, 21 Aug 2006 19:59:42 +0000 Received: from goku.cjclark.org (localhost. [127.0.0.1]) by goku.cjclark.org (8.13.3/8.12.8) with ESMTP id k7LJxftE023088; Mon, 21 Aug 2006 12:59:43 -0700 (PDT) (envelope-from cristjc@comcast.net) Received: (from cjc@localhost) by goku.cjclark.org (8.13.3/8.13.1/Submit) id k7LJxdCY023083; Mon, 21 Aug 2006 12:59:39 -0700 (PDT) (envelope-from cristjc@comcast.net) X-Authentication-Warning: goku.cjclark.org: cjc set sender to cristjc@comcast.net using -f Date: Mon, 21 Aug 2006 12:59:38 -0700 From: "Crist J. Clark" To: Daniel Hartmeier Message-ID: <20060821195938.GA16332@goku.cjclark.org> References: <20060818235756.25f72db4.rosti.bsd@gmail.com> <20060821092350.GL20788@insomnia.benzedrine.cx> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20060821092350.GL20788@insomnia.benzedrine.cx> User-Agent: Mutt/1.4.2.1i X-URL: http://people.freebsd.org/~cjc/ Cc: Rostislav Krasny , freebsd-net@freebsd.org Subject: Re: PF or "traceroute -e -P TCP" bug? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: cjclark@alum.mit.edu List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Aug 2006 20:03:29 -0000 On Mon, Aug 21, 2006 at 11:23:50AM +0200, Daniel Hartmeier wrote: > [ I'm CC'ing Crist, maybe he can explain why -e behaves like it does ] > > On Fri, Aug 18, 2006 at 11:57:56PM +0300, Rostislav Krasny wrote: > > > I've tried the new "-e" traceroute option on today's RELENG_6 and > > found following problem: > > > > > traceroute -nq 1 -e -P TCP -p 80 216.136.204.117 > > As I understand the -e option, that should send a sequence of TCP SYNs > with > > - constant source port (randomly picked per invokation) It's actually trivial encoding of the traceroute process ID so that two traceroute programs running simultaenously do not clobber each other. However, this becomes important. > - constant destination port 80 Yes, the whole point of "-e." > - increasing TTL per probe Yes, the basic kludge that makes traceroute work. Here is the basic explanation behind the changes, http://docs.freebsd.org/cgi/getmsg.cgi?fetch=414378+0+archive/2005/freebsd-net/20050925.freebsd-net [snip] > What you changed in your patch is switching to a sequential (instead of > constant) source port. This forces creation of one state per probe, > treating each probe as a separate connection. I don't think that's in > the spirit of the -e option. There's really no need for that, once the > underlying problem is fixed. Creating multiple state entries in a firewall really has no concequence as far as the operation of the "-e" option goes. It doesn't have any affect on the three essential characeristics of the probe that you listed above. > So, why doesn't -e without your patch produce probes that all match a > single state entry? > > Look at how the TCP sequence numbers are generated across the probes: > > tcp->th_seq = (tcp->th_sport << 16) | (tcp->th_dport + > (fixedPort ? outdata->seq : 0)); > > This is the problem. traceroute increments the sequence number with each > probe. I don't know why that is done. Why not use the same th_seq for > all probes, like an ISN (initial sequence number) would be re-used in > retransmissions in a real TCP handshake? 'Cause I needed to include that traceroute sequence number somewhere since it wasn't in the destination port any more. > If you create state on the first TCP SYN pf sees, pf will note the ISN > from the traceroute side. When pf sees further SYNs from that side, it > will deal with them like with any client retransmitting the SYN of the > handshake (before the peer replies with a SYN+ACK, giving its side's > ISN). Subsequent TCP SYNs with different ISN matching the address/port > pairs will be blocked by pf. That may be a little strict on the part of pf. One has to balance the "liberal in what you accept" versus being overly strict in security software. But it would be difficult to come up with a legitimate reason for a host to send SYNs with differing ISNs to the same source-IP-source-port-destination-IP-destination-port-tuple on any timescale less than the MSL. > If this happens on the IP forwarding path (i.e. pf blocks the packet > outgoing), the stack produces the ICMP host unreachable error that shows > up as "!H" in traceroute. I assume you have a "pass out on $ext_if keep > state" rule, and don't filter on the internal interface. If you add > stateful filtering on the internal interface, I think you'll find that > subsequent TCP SYNs are blocked without eliciting the ICMP error. > > I suggest traceroute with -e uses fixed th_seq, as in > > - tcp->th_seq = (tcp->th_sport << 16) | (tcp->th_dport + > - (fixedPort ? outdata->seq : 0)); > + tcp->th_seq = (tcp->th_sport << 16) tcp->th_dport; > > Maybe the (fixedPort?:) operands were mistakenly switched, and you want to > increment th_seq when -e is NOT used, but I can't think off-hand why you > would. The ISNs do increment when the "-e" option is not used since the dport increments. That's why I didn't realize incrementing the SYN might cause new problems. The problem with this patch is that we don't have the sequence number anywhere in the TCP header. (Don't bring up the IP header please. That's a whole 'nother issue.) So, to expand on the three points above, we need (1) fixed destination port, (2) to increment IP TTL, (3) the sequence number encoded in some head field, and (3) a source port chosen so that multiple traceroute invocations do not share any src-sport-dst-dport-tuples during their lifetime. In the past, using the PID worked for the sport, but think about what happens if you start with the PID then start incrementing or decrementing, you get overlaps (unless your system does a decent job with random PIDs; not the default for FreeBSD unfortunately). The patch to freebsd-net addresses these problems. It changes the sorce port so that we don't have overlapping src-sport-dst-dport-tuples, and uses a base source port from the LSBs of the clock for a "random" number. That would seem to fix the problem. The only question would be is that a good way to pick the base source port? It's probably good enough, although some kind of hash of the PID might be better. -- Crist J. Clark | cjclark@alum.mit.edu