From owner-freebsd-net@FreeBSD.ORG  Mon Aug 21 15:10:14 2006
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
X-Original-To: freebsd-net@freebsd.org
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id C2D1716A4E7
	for <freebsd-net@freebsd.org>; Mon, 21 Aug 2006 15:10:14 +0000 (UTC)
	(envelope-from rosti.bsd@gmail.com)
Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.186])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 17B2A43D5A
	for <freebsd-net@freebsd.org>; Mon, 21 Aug 2006 15:10:11 +0000 (GMT)
	(envelope-from rosti.bsd@gmail.com)
Received: by nf-out-0910.google.com with SMTP id n29so2107272nfc
	for <freebsd-net@freebsd.org>; Mon, 21 Aug 2006 08:10:11 -0700 (PDT)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com;
	h=received:date:from:to:cc:subject:message-id:in-reply-to:references:x-mailer:mime-version:content-type:content-transfer-encoding;
	b=JGwOFqpgv9EGbKNf5tVp7caz+kQUTitrO5k15JCsFhcAnmK87gXaX4d2h7JCKKsugD+NuOaVDMlm5HGtLxUQjqnqgKjVa+va4rVemDUmiFZZSCgtYCRJIl9qtWomgIgybcbZrgojrxa24oFKH949LVta9pqRD7dz2nxnFYRpUoo=
Received: by 10.49.90.4 with SMTP id s4mr7815458nfl;
	Mon, 21 Aug 2006 08:10:11 -0700 (PDT)
Received: from saturn.lan ( [212.143.154.227])
	by mx.gmail.com with ESMTP id 37sm54556hua.2006.08.21.08.10.08;
	Mon, 21 Aug 2006 08:10:10 -0700 (PDT)
Date: Mon, 21 Aug 2006 18:09:45 +0300
From: Rostislav Krasny <rosti.bsd@gmail.com>
To: Daniel Hartmeier <daniel@benzedrine.cx>
Message-Id: <20060821180945.6a75bc44.rosti.bsd@gmail.com>
In-Reply-To: <20060821092350.GL20788@insomnia.benzedrine.cx>
References: <20060818235756.25f72db4.rosti.bsd@gmail.com>
	<20060821092350.GL20788@insomnia.benzedrine.cx>
X-Mailer: Sylpheed version 2.2.7 (GTK+ 2.8.20; i386-portbld-freebsd6.1)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Cc: freebsd-net@freebsd.org
Subject: Re: PF or "traceroute -e -P TCP" bug?
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 21 Aug 2006 15:10:14 -0000

On Mon, 21 Aug 2006 11:23:50 +0200
Daniel Hartmeier <daniel@benzedrine.cx> wrote:

> [ I'm CC'ing Crist, maybe he can explain why -e behaves like it does ]
> 
> On Fri, Aug 18, 2006 at 11:57:56PM +0300, Rostislav Krasny wrote:
> 
> > I've tried the new "-e" traceroute option on today's RELENG_6 and
> > found following problem:
> > 
> > > traceroute -nq 1 -e -P TCP -p 80 216.136.204.117
> 
> As I understand the -e option, that should send a sequence of TCP SYNs
> with
> 
>   - constant source port (randomly picked per invokation)
>   - constant destination port 80
>   - increasing TTL per probe
> 
> Assuming you pass the packets with pf, it matters whether you create
> state or not. Filtering statelessly (without 'keep state'), there should
> be no problem at all. I assume you're filtering statefully.

I don't use 'keep state' in any pf rule. But I use a nat rule like this:

nat on $ext_if from $internal_net to any -> ($ext_if)

and according to 'pfctl -s state' any NAT-ed TCP connection creates a
state. For example, during the above traceroute:

self tcp 192.168.1.2:34345 -> xxx.xxx.xxx.xxx:50646 -> 216.136.204.117:80       SYN_SENT:CLOSED

> With constant source and destination ports, the first probe should
> create a state entry and all further probes (of the same traceroute
> invokation) should match that state entry.
> 
> What you changed in your patch is switching to a sequential (instead of
> constant) source port. This forces creation of one state per probe,
> treating each probe as a separate connection.

Correct.

> I don't think that's in
> the spirit of the -e option. There's really no need for that, once the
> underlying problem is fixed.
> 
> So, why doesn't -e without your patch produce probes that all match a
> single state entry?

By the way, I asked a friend from IRC to try "traceroute -e -P TCP"
through his router which does NATing by natd and it worked there.

> Look at how the TCP sequence numbers are generated across the probes:
> 
> 	tcp->th_seq = (tcp->th_sport << 16) | (tcp->th_dport +
> 	    (fixedPort ? outdata->seq : 0));
> 
> This is the problem. traceroute increments the sequence number with each
> probe. I don't know why that is done. Why not use the same th_seq for
> all probes, like an ISN (initial sequence number) would be re-used in
> retransmissions in a real TCP handshake?
> 
> If you create state on the first TCP SYN pf sees, pf will note the ISN
> from the traceroute side. When pf sees further SYNs from that side, it
> will deal with them like with any client retransmitting the SYN of the
> handshake (before the peer replies with a SYN+ACK, giving its side's
> ISN). Subsequent TCP SYNs with different ISN matching the address/port
> pairs will be blocked by pf.
> 
> If this happens on the IP forwarding path (i.e. pf blocks the packet
> outgoing), the stack produces the ICMP host unreachable error that shows
> up as "!H" in traceroute. I assume you have a "pass out on $ext_if keep
> state" rule, and don't filter on the internal interface. If you add
> stateful filtering on the internal interface, I think you'll find that
> subsequent TCP SYNs are blocked without eliciting the ICMP error.
> 
> I suggest traceroute with -e uses fixed th_seq, as in
> 
> -	tcp->th_seq = (tcp->th_sport << 16) | (tcp->th_dport +
> -	   (fixedPort ? outdata->seq : 0));
> +	tcp->th_seq = (tcp->th_sport << 16) tcp->th_dport;

Even if I add accidentally deleted '|' it doesn't fix the problem:

> traceroute -nq 1 -e -P TCP -p 80 www.freebsd.org
traceroute to www.freebsd.org (216.136.204.117), 64 hops max, 52 byte packets
 1  192.168.1.1  0.525 ms
 2  10.0.0.138  2.122 ms
 3  *
 4  *
 5  *
 6  *
 7  *
 8  *
 9  *
10  152.63.3.122  191.562 ms
11  *
12  *
^C

I can decrease number of the "*" hops by -w option:

> traceroute -nq 1 -e -w 10 -P TCP -p 80 www.freebsd.org
traceroute to www.freebsd.org (216.136.204.117), 64 hops max, 52 byte packets
 1  192.168.1.1  0.506 ms
 2  10.0.0.138  1.886 ms
 3  *
 4  *
 5  *
 6  *
 7  212.143.12.45  151.282 ms
 8  *
^C

According to repeatedly ran 'pfctl -s state | grep 216.136.204.117'
it really has some relation to TCP states in the pf. Before the
212.143.12.45 hop the state closed and after that hop a new state
created.

And by the way, I think a tcp_check() function checks tcp->th_seq
incorrectly:

tcp->th_seq == (ident << 16) | (port + seq)

In original version or after my patch it should be changed to this:

tcp->th_seq == (htons(ident) << 16) | (port + (fixedPort ? seq : 0))

and after your patch to this:

tcp->th_seq == (htons(ident) << 16) | port

It looks like the return value of the tcp_check() isn't used anywhere
anyway.