From owner-freebsd-net@FreeBSD.ORG Thu Oct 23 21:12:46 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DD099396; Thu, 23 Oct 2014 21:12:46 +0000 (UTC) Received: from mail-wi0-x231.google.com (mail-wi0-x231.google.com [IPv6:2a00:1450:400c:c05::231]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5304B1C7; Thu, 23 Oct 2014 21:12:46 +0000 (UTC) Received: by mail-wi0-f177.google.com with SMTP id ex7so2961819wid.16 for ; Thu, 23 Oct 2014 14:12:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=OwRDfsZqnKitZgnojANMw5tjW7+OoTvS3QmCT3qo4VE=; b=xxVEMCzqxbJfF2+IvbOw1L0JivHaftyA914EDBeA/+FEWQeF19zJpCSFjF1uCDbVfg f4yLhy78LLVj39kLUY1WzypHy/+7dTa4sC6bW85/I6DxaEjaDuvUoBDuYkcue0xwg2/y q16c2GDvdcJhedM/xk3nJ4Zi2ZyKWjUJosVQzBXNdoJgWFhEGEhlIviSvZKSUXd6LwtY X6Jzj+X43YI3KBLJ5OQJlDwStVzeKLW4SKxirC1Gz5+hBQ9GdtuqCAJyncC5g+Gl1yyG Bd/9BsPqOPiGXHehAM462L6pSaCOO3jyHwV/2TipQqPPluiuiPMneqZHrfVpb5bOmPOh Gcqg== MIME-Version: 1.0 X-Received: by 10.180.187.130 with SMTP id fs2mr711853wic.24.1414098764548; Thu, 23 Oct 2014 14:12:44 -0700 (PDT) Received: by 10.217.67.201 with HTTP; Thu, 23 Oct 2014 14:12:44 -0700 (PDT) In-Reply-To: <1569387.ZCJSvuukWl@ralph.baldwin.cx> References: <1410203348.1343.1.camel@bruno> <201410161523.32415.jhb@freebsd.org> <1569387.ZCJSvuukWl@ralph.baldwin.cx> Date: Thu, 23 Oct 2014 14:12:44 -0700 Message-ID: Subject: Re: ixgbe(4) spin lock held too long From: Jason Wolfe To: John Baldwin Content-Type: text/plain; charset=UTF-8 Cc: Sean Bruno , freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Oct 2014 21:12:47 -0000 On Sat, Oct 18, 2014 at 4:42 AM, John Baldwin wrote: > On Friday, October 17, 2014 11:32:13 PM Jason Wolfe wrote: >> Producing 10G of random traffic against a server with this assertion >> added took about 2 hours to panic, so if it turns out we need anything >> further it should be pretty quick. >> >> #4 list >> 2816 * timer and remember to restart (more output or persist). >> 2817 * If there is more data to be acked, restart retransmit >> 2818 * timer, using current (possibly backed-off) value. >> 2819 */ >> 2820 if (th->th_ack == tp->snd_max) { >> 2821 tcp_timer_activate(tp, TT_REXMT, 0); >> 2822 needoutput = 1; >> 2823 } else if (!tcp_timer_active(tp, TT_PERSIST)) >> 2824 tcp_timer_activate(tp, TT_REXMT, tp->t_rxtcur); > > Bah, this is just a bug in my assertion. Rather than having a separate > tcp_timer_deactivate() routine, a delta of 0 passed to tcp_timer_activate() > means "stop the timer". My assertions were incorrect and need to exclude the > stop case. Here is an updated patch (or you can just fix yours locally): > > Index: tcp_timer.c > =================================================================== > --- tcp_timer.c (revision 273219) > +++ tcp_timer.c (working copy) > @@ -869,10 +869,16 @@ tcp_timer_activate(struct tcpcb *tp, int timer_typ > case TT_REXMT: > t_callout = &tp->t_timers->tt_rexmt; > f_callout = tcp_timer_rexmt; > + if (callout_active(&tp->t_timers->tt_persist) && > + delta != 0) > + panic("scheduling retransmit with persist active"); > break; > case TT_PERSIST: > t_callout = &tp->t_timers->tt_persist; > f_callout = tcp_timer_persist; > + if (callout_active(&tp->t_timers->tt_rexmt) && > + delta != 0) > + panic("scheduling persist with retransmit active"); > break; > case TT_KEEP: > t_callout = &tp->t_timers->tt_keep; > > > -- > John Baldwin John, panic: tcp_setpersist: retransmit pending (kgdb) bt #0 doadump (textdump=1) at pcpu.h:219 #1 0xffffffff806facb1 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:452 #2 0xffffffff806fb014 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:759 #3 0xffffffff808467d3 in tcp_setpersist (tp=) at /usr/src/sys/netinet/tcp_output.c:1619 #4 0xffffffff8084e7b6 in tcp_timer_persist (xtp=0xfffff804ec124c00) at /usr/src/sys/netinet/tcp_timer.c:467 #5 0xffffffff8070d95e in softclock_call_cc (c=0xfffff804ec124ec0, cc=0xffffffff81263380, direct=0) at /usr/src/sys/kern/kern_timeout.c:687 #6 0xffffffff8070dce4 in softclock (arg=) at /usr/src/sys/kern/kern_timeout.c:816 #7 0xffffffff806d16f3 in intr_event_execute_handlers (p=, ie=0xfffff80015214400) at /usr/src/sys/kern/kern_intr.c:1263 #8 0xffffffff806d2056 in ithread_loop (arg=0xfffff800151f7ee0) at /usr/src/sys/kern/kern_intr.c:1276 #9 0xffffffff806cf481 in fork_exit (callout=0xffffffff806d1fc0 , arg=0xfffff800151f7ee0, frame=0xfffffe1f9e9b0ac0) at /usr/src/sys/kern/kern_fork.c:996 #10 0xffffffff80a67c0e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:606 (kgdb) frame 3 #3 0xffffffff808467d3 in tcp_setpersist (tp=) at /usr/src/sys/netinet/tcp_output.c:1619 1619 panic("tcp_setpersist: retransmit pending"); (kgdb) list 1614 int t = ((tp->t_srtt >> 2) + tp->t_rttvar) >> 1; 1615 int tt; 1616 1617 tp->t_flags &= ~TF_PREVVALID; 1618 if (tcp_timer_active(tp, TT_REXMT)) 1619 panic("tcp_setpersist: retransmit pending"); 1620 /* 1621 * Start/restart persistance timer. 1622 */ 1623 TCPT_RANGESET(tt, t * tcp_backoff[tp->t_rxtshift], (kgdb) up #4 0xffffffff8084e7b6 in tcp_timer_persist (xtp=0xfffff804ec124c00) at /usr/src/sys/netinet/tcp_timer.c:467 467 tcp_setpersist(tp); (kgdb) list 462 (ticks - tp->t_rcvtime) >= TCPTV_PERSMAX) { 463 TCPSTAT_INC(tcps_persistdrop); 464 tp = tcp_drop(tp, ETIMEDOUT); 465 goto out; 466 } 467 tcp_setpersist(tp); 468 tp->t_flags |= TF_FORCEDATA; 469 (void) tcp_output(tp); 470 tp->t_flags &= ~TF_FORCEDATA; Jason