From owner-freebsd-net@FreeBSD.ORG Mon Dec 12 16:00:25 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0E0DE1065673; Mon, 12 Dec 2011 16:00:25 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id D659C8FC14; Mon, 12 Dec 2011 16:00:24 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [96.47.65.170]) by cyrus.watson.org (Postfix) with ESMTPSA id 8A98446B09; Mon, 12 Dec 2011 11:00:24 -0500 (EST) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 1B7D3B96E; Mon, 12 Dec 2011 11:00:24 -0500 (EST) From: John Baldwin To: freebsd-current@freebsd.org Date: Mon, 12 Dec 2011 11:00:23 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p8; KDE/4.5.5; amd64; ; ) References: <20111022084931.GD1697@garage.freebsd.pl> <20111023155827.GH1697@garage.freebsd.pl> <201110240814.22368.jhb@freebsd.org> In-Reply-To: <201110240814.22368.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201112121100.23567.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 12 Dec 2011 11:00:24 -0500 (EST) Cc: Kostik Belousov , Lawrence Stewart , Andre Oppermann , Pawel Jakub Dawidek , freebsd-net@freebsd.org Subject: Re: 9.0-RC1 panic in tcp_input: negative winow. X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Dec 2011 16:00:25 -0000 On Monday, October 24, 2011 8:14:22 am John Baldwin wrote: > On Sunday, October 23, 2011 11:58:28 am Pawel Jakub Dawidek wrote: > > On Sun, Oct 23, 2011 at 11:44:45AM +0300, Kostik Belousov wrote: > > > On Sun, Oct 23, 2011 at 08:10:38AM +0200, Pawel Jakub Dawidek wrote: > > > > My suggestion would be that if we won't be able to fix it before 9.0, > > > > we should turn this assertion off, as the system seems to be able to > > > > recover. > > > > > > Shipped kernels have all assertions turned off. > > > > Yes, I'm aware of that, but many people compile their production kernels > > with INVARIANTS/INVARIANT_SUPPORT to fail early instead of eg. > > corrupting data. I'd be fine in moving this under DIAGNOSTIC or changing > > it into a printf, so it will be visible. > > No, the kernel is corrupting things in other places when this is true, so > if you are running with INVARIANTS, we want to know about it. Specifically, > in several places in TCP we assume that rcv_adv >= rcv_nxt, and depend on > being able to do 'rcv_adv - rcv_nxt'. > > In this case, it looks like the difference is consistently less than one > frame. I suspect the other end of the connection is sending just beyond the > end of the advertised window (it probably assumes it is better to send a full > frame if it has that much pending data even though part of it is beyond the > window edge vs sending a truncated packet that just fills the window) and that > that frame is accepted ok in the header prediction case and it's ACK is > delayed, but the next packet to arrive then trips over this assumption. > > Since 'win' is guaranteed to be non-negative and we explicitly cast > 'rcv_adv - rcv_nxt' to (int) in the following line that the assert is checking > for: > > tp->rcv_wnd = imax(win, (int)(tp->rcv_adv - tp->rcv_nxt)); > > I think we already handle this case ok and perhaps the assertion can just be > removed? Not sure if others feel that it warrants a comment to note that this > is the case being handled. > > Also, I'm not sure if this case can "leak" into the timewait code? If so, we > will need to fix this case: > > /* > * Recover last window size sent. > */ > KASSERT(SEQ_GEQ(tp->rcv_adv, tp->rcv_nxt), > ("tcp_twstart negative window: tp %p rcv_nxt %u rcv_adv %u", tp, > tp->rcv_nxt, tp->rcv_adv)); > tw->last_win = (tp->rcv_adv - tp->rcv_nxt) >> tp->rcv_scale; > > So that it sets last_win to 0 instead of some really huge value. An update. I've sent Pawel a testing patch to see if my hypothesis is correct (www.freebsd.org/~jhb/patches/tcp_negwin_test.patch). If it is then I intend to commit www.freebsd.org/~jhb/patches/tcp_negwin2.patch as the fix. -- John Baldwin