From owner-freebsd-net@FreeBSD.ORG Sat Oct 18 11:56:49 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B8F44AE2 for ; Sat, 18 Oct 2014 11:56:49 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 90DB96C4 for ; Sat, 18 Oct 2014 11:56:49 +0000 (UTC) Received: from ralph.baldwin.cx (pool-173-70-85-31.nwrknj.fios.verizon.net [173.70.85.31]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id F1603B93C; Sat, 18 Oct 2014 07:56:47 -0400 (EDT) From: John Baldwin To: Jason Wolfe Subject: Re: ixgbe(4) spin lock held too long Date: Sat, 18 Oct 2014 07:42:58 -0400 Message-ID: <1569387.ZCJSvuukWl@ralph.baldwin.cx> User-Agent: KMail/4.12.5 (FreeBSD/10.1-BETA2; KDE/4.12.5; amd64; ; ) In-Reply-To: References: <1410203348.1343.1.camel@bruno> <201410161523.32415.jhb@freebsd.org> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Sat, 18 Oct 2014 07:56:48 -0400 (EDT) Cc: Sean Bruno , freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Oct 2014 11:56:49 -0000 On Friday, October 17, 2014 11:32:13 PM Jason Wolfe wrote: > Producing 10G of random traffic against a server with this assertion > added took about 2 hours to panic, so if it turns out we need anything > further it should be pretty quick. > > #4 list > 2816 * timer and remember to restart (more output or persist). > 2817 * If there is more data to be acked, restart retransmit > 2818 * timer, using current (possibly backed-off) value. > 2819 */ > 2820 if (th->th_ack == tp->snd_max) { > 2821 tcp_timer_activate(tp, TT_REXMT, 0); > 2822 needoutput = 1; > 2823 } else if (!tcp_timer_active(tp, TT_PERSIST)) > 2824 tcp_timer_activate(tp, TT_REXMT, tp->t_rxtcur); Bah, this is just a bug in my assertion. Rather than having a separate tcp_timer_deactivate() routine, a delta of 0 passed to tcp_timer_activate() means "stop the timer". My assertions were incorrect and need to exclude the stop case. Here is an updated patch (or you can just fix yours locally): Index: tcp_timer.c =================================================================== --- tcp_timer.c (revision 273219) +++ tcp_timer.c (working copy) @@ -869,10 +869,16 @@ tcp_timer_activate(struct tcpcb *tp, int timer_typ case TT_REXMT: t_callout = &tp->t_timers->tt_rexmt; f_callout = tcp_timer_rexmt; + if (callout_active(&tp->t_timers->tt_persist) && + delta != 0) + panic("scheduling retransmit with persist active"); break; case TT_PERSIST: t_callout = &tp->t_timers->tt_persist; f_callout = tcp_timer_persist; + if (callout_active(&tp->t_timers->tt_rexmt) && + delta != 0) + panic("scheduling persist with retransmit active"); break; case TT_KEEP: t_callout = &tp->t_timers->tt_keep; -- John Baldwin