Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 12 Dec 2011 11:00:23 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-current@freebsd.org
Cc:        Kostik Belousov <kostikbel@gmail.com>, Lawrence Stewart <lstewart@freebsd.org>, Andre Oppermann <andre@freebsd.org>, Pawel Jakub Dawidek <pjd@freebsd.org>, freebsd-net@freebsd.org
Subject:   Re: 9.0-RC1 panic in tcp_input: negative winow.
Message-ID:  <201112121100.23567.jhb@freebsd.org>
In-Reply-To: <201110240814.22368.jhb@freebsd.org>
References:  <20111022084931.GD1697@garage.freebsd.pl> <20111023155827.GH1697@garage.freebsd.pl> <201110240814.22368.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Monday, October 24, 2011 8:14:22 am John Baldwin wrote:
> On Sunday, October 23, 2011 11:58:28 am Pawel Jakub Dawidek wrote:
> > On Sun, Oct 23, 2011 at 11:44:45AM +0300, Kostik Belousov wrote:
> > > On Sun, Oct 23, 2011 at 08:10:38AM +0200, Pawel Jakub Dawidek wrote:
> > > > My suggestion would be that if we won't be able to fix it before 9.0,
> > > > we should turn this assertion off, as the system seems to be able to
> > > > recover.
> > > 
> > > Shipped kernels have all assertions turned off.
> > 
> > Yes, I'm aware of that, but many people compile their production kernels
> > with INVARIANTS/INVARIANT_SUPPORT to fail early instead of eg.
> > corrupting data. I'd be fine in moving this under DIAGNOSTIC or changing
> > it into a printf, so it will be visible.
> 
> No, the kernel is corrupting things in other places when this is true, so
> if you are running with INVARIANTS, we want to know about it.   Specifically,
> in several places in TCP we assume that rcv_adv >= rcv_nxt, and depend on
> being able to do 'rcv_adv - rcv_nxt'.
> 
> In this case, it looks like the difference is consistently less than one 
> frame.  I suspect the other end of the connection is sending just beyond the 
> end of the advertised window (it probably assumes it is better to send a full 
> frame if it has that much pending data even though part of it is beyond the 
> window edge vs sending a truncated packet that just fills the window) and that
> that frame is accepted ok in the header prediction case and it's ACK is 
> delayed, but the next packet to arrive then trips over this assumption.
> 
> Since 'win' is guaranteed to be non-negative and we explicitly cast
> 'rcv_adv - rcv_nxt' to (int) in the following line that the assert is checking
> for:
> 
> 	tp->rcv_wnd = imax(win, (int)(tp->rcv_adv - tp->rcv_nxt));
> 
> I think we already handle this case ok and perhaps the assertion can just be
> removed?  Not sure if others feel that it warrants a comment to note that this
> is the case being handled.
> 
> Also, I'm not sure if this case can "leak" into the timewait code?  If so, we 
> will need to fix this case:
> 
> 	/*
> 	 * Recover last window size sent.
> 	 */
> 	KASSERT(SEQ_GEQ(tp->rcv_adv, tp->rcv_nxt),
> 	    ("tcp_twstart negative window: tp %p rcv_nxt %u rcv_adv %u", tp,
> 	    tp->rcv_nxt, tp->rcv_adv));
> 	tw->last_win = (tp->rcv_adv - tp->rcv_nxt) >> tp->rcv_scale;
> 
> So that it sets last_win to 0 instead of some really huge value.

An update.  I've sent Pawel a testing patch to see if my hypothesis is correct
(www.freebsd.org/~jhb/patches/tcp_negwin_test.patch).  If it is then I intend
to commit www.freebsd.org/~jhb/patches/tcp_negwin2.patch as the fix.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201112121100.23567.jhb>