Date: Wed, 26 Oct 2011 07:53:36 -0400 From: John Baldwin <jhb@freebsd.org> To: Pawel Jakub Dawidek <pjd@freebsd.org> Cc: Kostik Belousov <kostikbel@gmail.com>, Lawrence Stewart <lstewart@freebsd.org>, freebsd-current@freebsd.org, Andre Oppermann <andre@freebsd.org>, freebsd-net@freebsd.org Subject: Re: 9.0-RC1 panic in tcp_input: negative winow. Message-ID: <201110260753.37264.jhb@freebsd.org> In-Reply-To: <20111026075431.GB1672@garage.freebsd.pl> References: <20111022084931.GD1697@garage.freebsd.pl> <201110240814.22368.jhb@freebsd.org> <20111026075431.GB1672@garage.freebsd.pl>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday, October 26, 2011 3:54:31 am Pawel Jakub Dawidek wrote: > On Mon, Oct 24, 2011 at 08:14:22AM -0400, John Baldwin wrote: > > On Sunday, October 23, 2011 11:58:28 am Pawel Jakub Dawidek wrote: > > > On Sun, Oct 23, 2011 at 11:44:45AM +0300, Kostik Belousov wrote: > > > > On Sun, Oct 23, 2011 at 08:10:38AM +0200, Pawel Jakub Dawidek wrote: > > > > > My suggestion would be that if we won't be able to fix it before 9.0, > > > > > we should turn this assertion off, as the system seems to be able to > > > > > recover. > > > > > > > > Shipped kernels have all assertions turned off. > > > > > > Yes, I'm aware of that, but many people compile their production kernels > > > with INVARIANTS/INVARIANT_SUPPORT to fail early instead of eg. > > > corrupting data. I'd be fine in moving this under DIAGNOSTIC or changing > > > it into a printf, so it will be visible. > > > > No, the kernel is corrupting things in other places when this is true, so > > if you are running with INVARIANTS, we want to know about it. Specifically, > > in several places in TCP we assume that rcv_adv >= rcv_nxt, and depend on > > being able to do 'rcv_adv - rcv_nxt'. > > > > In this case, it looks like the difference is consistently less than one > > frame. I suspect the other end of the connection is sending just beyond the > > end of the advertised window (it probably assumes it is better to send a full > > frame if it has that much pending data even though part of it is beyond the > > window edge vs sending a truncated packet that just fills the window) and that > > that frame is accepted ok in the header prediction case and it's ACK is > > delayed, but the next packet to arrive then trips over this assumption. > > > > Since 'win' is guaranteed to be non-negative and we explicitly cast > > 'rcv_adv - rcv_nxt' to (int) in the following line that the assert is checking > > for: > > > > tp->rcv_wnd = imax(win, (int)(tp->rcv_adv - tp->rcv_nxt)); > > > > I think we already handle this case ok and perhaps the assertion can just be > > removed? Not sure if others feel that it warrants a comment to note that this > > is the case being handled. > > I added debug to the places where rcv_adv and rcv_nxt are modified. Here > is what happens before the panic occurs: > > tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022361548 rcv_adv 4022360100 diff -1448 > tcp_do_segment:2847 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022362298 rcv_adv 4022361548 diff -750 > tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022363746 rcv_adv 4022362298 diff -1448 > tcp_do_segment:2847 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022364836 rcv_adv 4022363746 diff -1090 > tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022366284 rcv_adv 4022364836 diff -1448 > tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022370628 rcv_adv 4022369690 diff -938 > tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022379140 rcv_adv 4022377692 diff -1448 > tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022387792 rcv_adv 4022386344 diff -1448 > tcp_do_segment:2847 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022388890 rcv_adv 4022387792 diff -1098 > tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022390338 rcv_adv 4022388890 diff -1448 > tcp_do_segment:2847 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022394563 rcv_adv 4022394342 diff -221 > panic: tcp_input negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022394563 rcv_adv 4022394342 win=0 diff -221 > > I can send you the full log if you want, I've plenty of messages where > rcv_adv < rcv_nxt, not all of them trigger this assertion. The assertion would be triggered when the next packet arrives (as I said above). Try modifying your debugging output to also log if the ACK is delayed. I suspect it is not delayed until the last one. (Pushing out an ACK will reset rcv_adv to be beyond rcv_nxt in tcp_output(), so in the case of an immediate ACK, rcv_nxt > rcv_adv is only a transient condition all under a single lock invocation so never visible to other consumers of the protocol control block.) If that is what you see, then that confirms what I guessed above and I will likely just remove the assertion in tcp_input() and patch the timewait code to handle this case. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201110260753.37264.jhb>