Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 11 May 2008 23:56:25 -0700
From:      Bakul Shah <bakul@bitblocks.com>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        freebsd-current@freebsd.org, Julian Elischer <julian@elischer.org>
Subject:   Re: tcp over slow links broken? 
Message-ID:  <20080512065626.6858F5B75@mail.bitblocks.com>
In-Reply-To: Your message of "Sun, 11 May 2008 12:07:34 PDT." <200805111907.m4BJ7YE7005447@apollo.backplane.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 11 May 2008 12:07:34 PDT Matthew Dillon <dillon@apollo.backplane.com>  wrote:
>     Hmm.  It looks like C has gone deaf, not B.  B is retransmitting from
>     sequence 4744 which is the last sequence that C acked.  C is then not
>     acking any further packets.

Yes indeed.

> 14:22:42.411144 IP B.55535 > C.ssh: . 7664:9124(1460) ack 2016 win 65535
> 14:22:42.411259 IP B.55535 > C.ssh: . 9124:10584(1460) ack 2016 win 65535
> 14:22:42.468350 IP C.ssh > B.55535: . ack 4744 win 65535
> 14:22:42.490556 IP C.ssh > B.55535: . ack 4744 win 65535
> 14:22:42.830171 IP B.55535 > C.ssh: . 4744:6204(1460) ack 2016 win 65535
	...
> 
>     This sounds like a packet filter state issue.  My guess is that
>     PF running on B is getting confused.  Either PF is getting confused,
>     or the packet is getting munged somehow to the point where PF refuses
>     to bridge it.

I had already tried this.

>     The A->C path (the one that is working) is going through PF's NAT rules.
>     The B->C path is probably going through a different set of PF rules.
> 
>     I suggest capturing a trace on C to see if C is actually receiving 
>     B's retransmissions.

Finally this evening thanks to my friend Rob Warnock's help
this got narrowed down quite a bit.  We captured a trace on C
and saw that it was not seeing the [4744:6204) data range
packet or any of its retransmits.  But this was a perfectly
valid packet on B (verified with tcpdump -v + manual header
checksumming).  Then Rob recalled having run across mbuf
alignment issues in the past so to check for that I swapped
NICs around and the problem stayed with the NIC, an old DEC
21140 card!

So this was not related to pf or a slow link but most likely
due to mbuf misalignment (IIRC de requires aligned mbufs).
There is just one commit on if_de.c during past April.
Perhaps this is due to a side effect of that (bpf is not
given a packet during device attach) or perhaps some change
elsewhere.

Thanks for your & Julian's help!



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080512065626.6858F5B75>