From owner-freebsd-net@FreeBSD.ORG Wed Oct 26 18:11:18 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 29A5F1065673; Wed, 26 Oct 2011 18:11:18 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) by mx1.freebsd.org (Postfix) with ESMTP id E3A968FC15; Wed, 26 Oct 2011 18:11:17 +0000 (UTC) Received: from julian-mac.elischer.org (home-nat.elischer.org [67.100.89.137]) (authenticated bits=0) by vps1.elischer.org (8.14.4/8.14.4) with ESMTP id p9QHb4aN057563 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Wed, 26 Oct 2011 10:37:07 -0700 (PDT) (envelope-from julian@freebsd.org) Message-ID: <4EA8453B.2090808@freebsd.org> Date: Wed, 26 Oct 2011 10:36:59 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-US; rv:1.9.2.23) Gecko/20110920 Thunderbird/3.1.15 MIME-Version: 1.0 To: Pawel Jakub Dawidek References: <20111022084931.GD1697@garage.freebsd.pl> <20111023084445.GB50300@deviant.kiev.zoral.com.ua> <20111023155827.GH1697@garage.freebsd.pl> <201110240814.22368.jhb@freebsd.org> <20111026075431.GB1672@garage.freebsd.pl> In-Reply-To: <20111026075431.GB1672@garage.freebsd.pl> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Andre Oppermann , John Baldwin , freebsd-net@freebsd.org, freebsd-current@freebsd.org, Kostik Belousov , Lawrence Stewart Subject: Re: 9.0-RC1 panic in tcp_input: negative winow. X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Oct 2011 18:11:18 -0000 On 10/26/11 12:54 AM, Pawel Jakub Dawidek wrote: > On Mon, Oct 24, 2011 at 08:14:22AM -0400, John Baldwin wrote: >> On Sunday, October 23, 2011 11:58:28 am Pawel Jakub Dawidek wrote: >>> On Sun, Oct 23, 2011 at 11:44:45AM +0300, Kostik Belousov wrote: >>>> On Sun, Oct 23, 2011 at 08:10:38AM +0200, Pawel Jakub Dawidek wrote: >>>>> My suggestion would be that if we won't be able to fix it before 9.0, >>>>> we should turn this assertion off, as the system seems to be able to >>>>> recover. >>>> Shipped kernels have all assertions turned off. >>> Yes, I'm aware of that, but many people compile their production kernels >>> with INVARIANTS/INVARIANT_SUPPORT to fail early instead of eg. >>> corrupting data. I'd be fine in moving this under DIAGNOSTIC or changing >>> it into a printf, so it will be visible. >> No, the kernel is corrupting things in other places when this is true, so >> if you are running with INVARIANTS, we want to know about it. Specifically, >> in several places in TCP we assume that rcv_adv>= rcv_nxt, and depend on >> being able to do 'rcv_adv - rcv_nxt'. >> >> In this case, it looks like the difference is consistently less than one >> frame. I suspect the other end of the connection is sending just beyond the >> end of the advertised window (it probably assumes it is better to send a full >> frame if it has that much pending data even though part of it is beyond the >> window edge vs sending a truncated packet that just fills the window) and that >> that frame is accepted ok in the header prediction case and it's ACK is >> delayed, but the next packet to arrive then trips over this assumption. >> >> Since 'win' is guaranteed to be non-negative and we explicitly cast >> 'rcv_adv - rcv_nxt' to (int) in the following line that the assert is checking >> for: >> >> tp->rcv_wnd = imax(win, (int)(tp->rcv_adv - tp->rcv_nxt)); >> >> I think we already handle this case ok and perhaps the assertion can just be >> removed? Not sure if others feel that it warrants a comment to note that this >> is the case being handled. > I added debug to the places where rcv_adv and rcv_nxt are modified. Here > is what happens before the panic occurs: > > tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022361548 rcv_adv 4022360100 diff -1448 > tcp_do_segment:2847 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022362298 rcv_adv 4022361548 diff -750 > tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022363746 rcv_adv 4022362298 diff -1448 > tcp_do_segment:2847 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022364836 rcv_adv 4022363746 diff -1090 > tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022366284 rcv_adv 4022364836 diff -1448 > tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022370628 rcv_adv 4022369690 diff -938 > tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022379140 rcv_adv 4022377692 diff -1448 > tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022387792 rcv_adv 4022386344 diff -1448 > tcp_do_segment:2847 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022388890 rcv_adv 4022387792 diff -1098 > tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022390338 rcv_adv 4022388890 diff -1448 > tcp_do_segment:2847 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022394563 rcv_adv 4022394342 diff -221 > panic: tcp_input negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022394563 rcv_adv 4022394342 win=0 diff -221 > > I can send you the full log if you want, I've plenty of messages where > rcv_adv< rcv_nxt, not all of them trigger this assertion. > Might be a good place to use the new sifter tool.