Date: Sun, 11 May 2008 23:56:25 -0700 From: Bakul Shah <bakul@bitblocks.com> To: Matthew Dillon <dillon@apollo.backplane.com> Cc: freebsd-current@freebsd.org, Julian Elischer <julian@elischer.org> Subject: Re: tcp over slow links broken? Message-ID: <20080512065626.6858F5B75@mail.bitblocks.com> In-Reply-To: Your message of "Sun, 11 May 2008 12:07:34 PDT." <200805111907.m4BJ7YE7005447@apollo.backplane.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 11 May 2008 12:07:34 PDT Matthew Dillon <dillon@apollo.backplane.com> wrote: > Hmm. It looks like C has gone deaf, not B. B is retransmitting from > sequence 4744 which is the last sequence that C acked. C is then not > acking any further packets. Yes indeed. > 14:22:42.411144 IP B.55535 > C.ssh: . 7664:9124(1460) ack 2016 win 65535 > 14:22:42.411259 IP B.55535 > C.ssh: . 9124:10584(1460) ack 2016 win 65535 > 14:22:42.468350 IP C.ssh > B.55535: . ack 4744 win 65535 > 14:22:42.490556 IP C.ssh > B.55535: . ack 4744 win 65535 > 14:22:42.830171 IP B.55535 > C.ssh: . 4744:6204(1460) ack 2016 win 65535 ... > > This sounds like a packet filter state issue. My guess is that > PF running on B is getting confused. Either PF is getting confused, > or the packet is getting munged somehow to the point where PF refuses > to bridge it. I had already tried this. > The A->C path (the one that is working) is going through PF's NAT rules. > The B->C path is probably going through a different set of PF rules. > > I suggest capturing a trace on C to see if C is actually receiving > B's retransmissions. Finally this evening thanks to my friend Rob Warnock's help this got narrowed down quite a bit. We captured a trace on C and saw that it was not seeing the [4744:6204) data range packet or any of its retransmits. But this was a perfectly valid packet on B (verified with tcpdump -v + manual header checksumming). Then Rob recalled having run across mbuf alignment issues in the past so to check for that I swapped NICs around and the problem stayed with the NIC, an old DEC 21140 card! So this was not related to pf or a slow link but most likely due to mbuf misalignment (IIRC de requires aligned mbufs). There is just one commit on if_de.c during past April. Perhaps this is due to a side effect of that (bpf is not given a packet during device attach) or perhaps some change elsewhere. Thanks for your & Julian's help!
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080512065626.6858F5B75>