From owner-freebsd-net@FreeBSD.ORG Mon Aug 21 15:10:14 2006 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C2D1716A4E7 for ; Mon, 21 Aug 2006 15:10:14 +0000 (UTC) (envelope-from rosti.bsd@gmail.com) Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.186]) by mx1.FreeBSD.org (Postfix) with ESMTP id 17B2A43D5A for ; Mon, 21 Aug 2006 15:10:11 +0000 (GMT) (envelope-from rosti.bsd@gmail.com) Received: by nf-out-0910.google.com with SMTP id n29so2107272nfc for ; Mon, 21 Aug 2006 08:10:11 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:date:from:to:cc:subject:message-id:in-reply-to:references:x-mailer:mime-version:content-type:content-transfer-encoding; b=JGwOFqpgv9EGbKNf5tVp7caz+kQUTitrO5k15JCsFhcAnmK87gXaX4d2h7JCKKsugD+NuOaVDMlm5HGtLxUQjqnqgKjVa+va4rVemDUmiFZZSCgtYCRJIl9qtWomgIgybcbZrgojrxa24oFKH949LVta9pqRD7dz2nxnFYRpUoo= Received: by 10.49.90.4 with SMTP id s4mr7815458nfl; Mon, 21 Aug 2006 08:10:11 -0700 (PDT) Received: from saturn.lan ( [212.143.154.227]) by mx.gmail.com with ESMTP id 37sm54556hua.2006.08.21.08.10.08; Mon, 21 Aug 2006 08:10:10 -0700 (PDT) Date: Mon, 21 Aug 2006 18:09:45 +0300 From: Rostislav Krasny To: Daniel Hartmeier Message-Id: <20060821180945.6a75bc44.rosti.bsd@gmail.com> In-Reply-To: <20060821092350.GL20788@insomnia.benzedrine.cx> References: <20060818235756.25f72db4.rosti.bsd@gmail.com> <20060821092350.GL20788@insomnia.benzedrine.cx> X-Mailer: Sylpheed version 2.2.7 (GTK+ 2.8.20; i386-portbld-freebsd6.1) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org Subject: Re: PF or "traceroute -e -P TCP" bug? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Aug 2006 15:10:14 -0000 On Mon, 21 Aug 2006 11:23:50 +0200 Daniel Hartmeier wrote: > [ I'm CC'ing Crist, maybe he can explain why -e behaves like it does ] > > On Fri, Aug 18, 2006 at 11:57:56PM +0300, Rostislav Krasny wrote: > > > I've tried the new "-e" traceroute option on today's RELENG_6 and > > found following problem: > > > > > traceroute -nq 1 -e -P TCP -p 80 216.136.204.117 > > As I understand the -e option, that should send a sequence of TCP SYNs > with > > - constant source port (randomly picked per invokation) > - constant destination port 80 > - increasing TTL per probe > > Assuming you pass the packets with pf, it matters whether you create > state or not. Filtering statelessly (without 'keep state'), there should > be no problem at all. I assume you're filtering statefully. I don't use 'keep state' in any pf rule. But I use a nat rule like this: nat on $ext_if from $internal_net to any -> ($ext_if) and according to 'pfctl -s state' any NAT-ed TCP connection creates a state. For example, during the above traceroute: self tcp 192.168.1.2:34345 -> xxx.xxx.xxx.xxx:50646 -> 216.136.204.117:80 SYN_SENT:CLOSED > With constant source and destination ports, the first probe should > create a state entry and all further probes (of the same traceroute > invokation) should match that state entry. > > What you changed in your patch is switching to a sequential (instead of > constant) source port. This forces creation of one state per probe, > treating each probe as a separate connection. Correct. > I don't think that's in > the spirit of the -e option. There's really no need for that, once the > underlying problem is fixed. > > So, why doesn't -e without your patch produce probes that all match a > single state entry? By the way, I asked a friend from IRC to try "traceroute -e -P TCP" through his router which does NATing by natd and it worked there. > Look at how the TCP sequence numbers are generated across the probes: > > tcp->th_seq = (tcp->th_sport << 16) | (tcp->th_dport + > (fixedPort ? outdata->seq : 0)); > > This is the problem. traceroute increments the sequence number with each > probe. I don't know why that is done. Why not use the same th_seq for > all probes, like an ISN (initial sequence number) would be re-used in > retransmissions in a real TCP handshake? > > If you create state on the first TCP SYN pf sees, pf will note the ISN > from the traceroute side. When pf sees further SYNs from that side, it > will deal with them like with any client retransmitting the SYN of the > handshake (before the peer replies with a SYN+ACK, giving its side's > ISN). Subsequent TCP SYNs with different ISN matching the address/port > pairs will be blocked by pf. > > If this happens on the IP forwarding path (i.e. pf blocks the packet > outgoing), the stack produces the ICMP host unreachable error that shows > up as "!H" in traceroute. I assume you have a "pass out on $ext_if keep > state" rule, and don't filter on the internal interface. If you add > stateful filtering on the internal interface, I think you'll find that > subsequent TCP SYNs are blocked without eliciting the ICMP error. > > I suggest traceroute with -e uses fixed th_seq, as in > > - tcp->th_seq = (tcp->th_sport << 16) | (tcp->th_dport + > - (fixedPort ? outdata->seq : 0)); > + tcp->th_seq = (tcp->th_sport << 16) tcp->th_dport; Even if I add accidentally deleted '|' it doesn't fix the problem: > traceroute -nq 1 -e -P TCP -p 80 www.freebsd.org traceroute to www.freebsd.org (216.136.204.117), 64 hops max, 52 byte packets 1 192.168.1.1 0.525 ms 2 10.0.0.138 2.122 ms 3 * 4 * 5 * 6 * 7 * 8 * 9 * 10 152.63.3.122 191.562 ms 11 * 12 * ^C I can decrease number of the "*" hops by -w option: > traceroute -nq 1 -e -w 10 -P TCP -p 80 www.freebsd.org traceroute to www.freebsd.org (216.136.204.117), 64 hops max, 52 byte packets 1 192.168.1.1 0.506 ms 2 10.0.0.138 1.886 ms 3 * 4 * 5 * 6 * 7 212.143.12.45 151.282 ms 8 * ^C According to repeatedly ran 'pfctl -s state | grep 216.136.204.117' it really has some relation to TCP states in the pf. Before the 212.143.12.45 hop the state closed and after that hop a new state created. And by the way, I think a tcp_check() function checks tcp->th_seq incorrectly: tcp->th_seq == (ident << 16) | (port + seq) In original version or after my patch it should be changed to this: tcp->th_seq == (htons(ident) << 16) | (port + (fixedPort ? seq : 0)) and after your patch to this: tcp->th_seq == (htons(ident) << 16) | port It looks like the return value of the tcp_check() isn't used anywhere anyway.