From owner-freebsd-net@FreeBSD.ORG Fri Oct 28 00:29:37 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E2C061065672; Fri, 28 Oct 2011 00:29:37 +0000 (UTC) (envelope-from lstewart@freebsd.org) Received: from lauren.room52.net (lauren.room52.net [210.50.193.198]) by mx1.freebsd.org (Postfix) with ESMTP id 55BCB8FC08; Fri, 28 Oct 2011 00:29:37 +0000 (UTC) Received: from lstewart1.loshell.room52.net (ppp59-167-184-191.static.internode.on.net [59.167.184.191]) by lauren.room52.net (Postfix) with ESMTPSA id 147097E820; Fri, 28 Oct 2011 11:29:35 +1100 (EST) Message-ID: <4EA9F76E.9010008@freebsd.org> Date: Fri, 28 Oct 2011 11:29:34 +1100 From: Lawrence Stewart User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:7.0.1) Gecko/20111016 Thunderbird/7.0.1 MIME-Version: 1.0 To: John Baldwin References: <20111022084931.GD1697@garage.freebsd.pl> <201110240814.22368.jhb@freebsd.org> <20111026075431.GB1672@garage.freebsd.pl> <201110260753.37264.jhb@freebsd.org> In-Reply-To: <201110260753.37264.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=0.0 required=5.0 tests=UNPARSEABLE_RELAY autolearn=unavailable version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on lauren.room52.net Cc: Kostik Belousov , Andre Oppermann , freebsd-current@freebsd.org, Pawel Jakub Dawidek , freebsd-net@freebsd.org Subject: Re: 9.0-RC1 panic in tcp_input: negative winow. X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Oct 2011 00:29:38 -0000 On 10/26/11 22:53, John Baldwin wrote: > On Wednesday, October 26, 2011 3:54:31 am Pawel Jakub Dawidek wrote: >> On Mon, Oct 24, 2011 at 08:14:22AM -0400, John Baldwin wrote: >>> On Sunday, October 23, 2011 11:58:28 am Pawel Jakub Dawidek wrote: >>>> On Sun, Oct 23, 2011 at 11:44:45AM +0300, Kostik Belousov wrote: >>>>> On Sun, Oct 23, 2011 at 08:10:38AM +0200, Pawel Jakub Dawidek wrote: >>>>>> My suggestion would be that if we won't be able to fix it before 9.0, >>>>>> we should turn this assertion off, as the system seems to be able to >>>>>> recover. >>>>> >>>>> Shipped kernels have all assertions turned off. >>>> >>>> Yes, I'm aware of that, but many people compile their production kernels >>>> with INVARIANTS/INVARIANT_SUPPORT to fail early instead of eg. >>>> corrupting data. I'd be fine in moving this under DIAGNOSTIC or changing >>>> it into a printf, so it will be visible. >>> >>> No, the kernel is corrupting things in other places when this is true, so >>> if you are running with INVARIANTS, we want to know about it. Specifically, >>> in several places in TCP we assume that rcv_adv>= rcv_nxt, and depend on >>> being able to do 'rcv_adv - rcv_nxt'. >>> >>> In this case, it looks like the difference is consistently less than one >>> frame. I suspect the other end of the connection is sending just beyond the >>> end of the advertised window (it probably assumes it is better to send a full >>> frame if it has that much pending data even though part of it is beyond the >>> window edge vs sending a truncated packet that just fills the window) and that >>> that frame is accepted ok in the header prediction case and it's ACK is >>> delayed, but the next packet to arrive then trips over this assumption. >>> >>> Since 'win' is guaranteed to be non-negative and we explicitly cast >>> 'rcv_adv - rcv_nxt' to (int) in the following line that the assert is checking >>> for: >>> >>> tp->rcv_wnd = imax(win, (int)(tp->rcv_adv - tp->rcv_nxt)); >>> >>> I think we already handle this case ok and perhaps the assertion can just be >>> removed? Not sure if others feel that it warrants a comment to note that this >>> is the case being handled. >> >> I added debug to the places where rcv_adv and rcv_nxt are modified. Here >> is what happens before the panic occurs: >> >> tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022361548 rcv_adv 4022360100 diff -1448 >> tcp_do_segment:2847 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022362298 rcv_adv 4022361548 diff -750 >> tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022363746 rcv_adv 4022362298 diff -1448 >> tcp_do_segment:2847 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022364836 rcv_adv 4022363746 diff -1090 >> tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022366284 rcv_adv 4022364836 diff -1448 >> tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022370628 rcv_adv 4022369690 diff -938 >> tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022379140 rcv_adv 4022377692 diff -1448 >> tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022387792 rcv_adv 4022386344 diff -1448 >> tcp_do_segment:2847 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022388890 rcv_adv 4022387792 diff -1098 >> tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022390338 rcv_adv 4022388890 diff -1448 >> tcp_do_segment:2847 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022394563 rcv_adv 4022394342 diff -221 >> panic: tcp_input negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022394563 rcv_adv 4022394342 win=0 diff -221 >> >> I can send you the full log if you want, I've plenty of messages where >> rcv_adv< rcv_nxt, not all of them trigger this assertion. > > The assertion would be triggered when the next packet arrives (as I said > above). Try modifying your debugging output to also log if the ACK is > delayed. I suspect it is not delayed until the last one. (Pushing out an > ACK will reset rcv_adv to be beyond rcv_nxt in tcp_output(), so in the case > of an immediate ACK, rcv_nxt> rcv_adv is only a transient condition all > under a single lock invocation so never visible to other consumers of the > protocol control block.) If that is what you see, then that confirms what > I guessed above and I will likely just remove the assertion in tcp_input() > and patch the timewait code to handle this case. > Pawel, have you been able to confirm John's hypothesis? What I don't quite get is why we haven't had a lot more reports of this issue... Cheers, Lawrence