Date: Mon, 24 Oct 2011 08:14:22 -0400 From: John Baldwin <jhb@freebsd.org> To: freebsd-current@freebsd.org Cc: Kostik Belousov <kostikbel@gmail.com>, Lawrence Stewart <lstewart@freebsd.org>, Andre Oppermann <andre@freebsd.org>, Pawel Jakub Dawidek <pjd@freebsd.org>, freebsd-net@freebsd.org Subject: Re: 9.0-RC1 panic in tcp_input: negative winow. Message-ID: <201110240814.22368.jhb@freebsd.org> In-Reply-To: <20111023155827.GH1697@garage.freebsd.pl> References: <20111022084931.GD1697@garage.freebsd.pl> <20111023084445.GB50300@deviant.kiev.zoral.com.ua> <20111023155827.GH1697@garage.freebsd.pl>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sunday, October 23, 2011 11:58:28 am Pawel Jakub Dawidek wrote: > On Sun, Oct 23, 2011 at 11:44:45AM +0300, Kostik Belousov wrote: > > On Sun, Oct 23, 2011 at 08:10:38AM +0200, Pawel Jakub Dawidek wrote: > > > My suggestion would be that if we won't be able to fix it before 9.0, > > > we should turn this assertion off, as the system seems to be able to > > > recover. > > > > Shipped kernels have all assertions turned off. > > Yes, I'm aware of that, but many people compile their production kernels > with INVARIANTS/INVARIANT_SUPPORT to fail early instead of eg. > corrupting data. I'd be fine in moving this under DIAGNOSTIC or changing > it into a printf, so it will be visible. No, the kernel is corrupting things in other places when this is true, so if you are running with INVARIANTS, we want to know about it. Specifically, in several places in TCP we assume that rcv_adv >= rcv_nxt, and depend on being able to do 'rcv_adv - rcv_nxt'. In this case, it looks like the difference is consistently less than one frame. I suspect the other end of the connection is sending just beyond the end of the advertised window (it probably assumes it is better to send a full frame if it has that much pending data even though part of it is beyond the window edge vs sending a truncated packet that just fills the window) and that that frame is accepted ok in the header prediction case and it's ACK is delayed, but the next packet to arrive then trips over this assumption. Since 'win' is guaranteed to be non-negative and we explicitly cast 'rcv_adv - rcv_nxt' to (int) in the following line that the assert is checking for: tp->rcv_wnd = imax(win, (int)(tp->rcv_adv - tp->rcv_nxt)); I think we already handle this case ok and perhaps the assertion can just be removed? Not sure if others feel that it warrants a comment to note that this is the case being handled. Also, I'm not sure if this case can "leak" into the timewait code? If so, we will need to fix this case: /* * Recover last window size sent. */ KASSERT(SEQ_GEQ(tp->rcv_adv, tp->rcv_nxt), ("tcp_twstart negative window: tp %p rcv_nxt %u rcv_adv %u", tp, tp->rcv_nxt, tp->rcv_adv)); tw->last_win = (tp->rcv_adv - tp->rcv_nxt) >> tp->rcv_scale; So that it sets last_win to 0 instead of some really huge value. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201110240814.22368.jhb>