Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Oct 2011 08:14:22 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-current@freebsd.org
Cc:        Kostik Belousov <kostikbel@gmail.com>, Lawrence Stewart <lstewart@freebsd.org>, Andre Oppermann <andre@freebsd.org>, Pawel Jakub Dawidek <pjd@freebsd.org>, freebsd-net@freebsd.org
Subject:   Re: 9.0-RC1 panic in tcp_input: negative winow.
Message-ID:  <201110240814.22368.jhb@freebsd.org>
In-Reply-To: <20111023155827.GH1697@garage.freebsd.pl>
References:  <20111022084931.GD1697@garage.freebsd.pl> <20111023084445.GB50300@deviant.kiev.zoral.com.ua> <20111023155827.GH1697@garage.freebsd.pl>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sunday, October 23, 2011 11:58:28 am Pawel Jakub Dawidek wrote:
> On Sun, Oct 23, 2011 at 11:44:45AM +0300, Kostik Belousov wrote:
> > On Sun, Oct 23, 2011 at 08:10:38AM +0200, Pawel Jakub Dawidek wrote:
> > > My suggestion would be that if we won't be able to fix it before 9.0,
> > > we should turn this assertion off, as the system seems to be able to
> > > recover.
> > 
> > Shipped kernels have all assertions turned off.
> 
> Yes, I'm aware of that, but many people compile their production kernels
> with INVARIANTS/INVARIANT_SUPPORT to fail early instead of eg.
> corrupting data. I'd be fine in moving this under DIAGNOSTIC or changing
> it into a printf, so it will be visible.

No, the kernel is corrupting things in other places when this is true, so
if you are running with INVARIANTS, we want to know about it.   Specifically,
in several places in TCP we assume that rcv_adv >= rcv_nxt, and depend on
being able to do 'rcv_adv - rcv_nxt'.

In this case, it looks like the difference is consistently less than one 
frame.  I suspect the other end of the connection is sending just beyond the 
end of the advertised window (it probably assumes it is better to send a full 
frame if it has that much pending data even though part of it is beyond the 
window edge vs sending a truncated packet that just fills the window) and that
that frame is accepted ok in the header prediction case and it's ACK is 
delayed, but the next packet to arrive then trips over this assumption.

Since 'win' is guaranteed to be non-negative and we explicitly cast
'rcv_adv - rcv_nxt' to (int) in the following line that the assert is checking
for:

	tp->rcv_wnd = imax(win, (int)(tp->rcv_adv - tp->rcv_nxt));

I think we already handle this case ok and perhaps the assertion can just be
removed?  Not sure if others feel that it warrants a comment to note that this
is the case being handled.

Also, I'm not sure if this case can "leak" into the timewait code?  If so, we 
will need to fix this case:

	/*
	 * Recover last window size sent.
	 */
	KASSERT(SEQ_GEQ(tp->rcv_adv, tp->rcv_nxt),
	    ("tcp_twstart negative window: tp %p rcv_nxt %u rcv_adv %u", tp,
	    tp->rcv_nxt, tp->rcv_adv));
	tw->last_win = (tp->rcv_adv - tp->rcv_nxt) >> tp->rcv_scale;

So that it sets last_win to 0 instead of some really huge value.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201110240814.22368.jhb>