From owner-freebsd-net@freebsd.org Sat Jun 25 14:41:27 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E85F6B80463 for ; Sat, 25 Jun 2016 14:41:27 +0000 (UTC) (envelope-from rrs@netflix.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id BE3871B44 for ; Sat, 25 Jun 2016 14:41:27 +0000 (UTC) (envelope-from rrs@netflix.com) Received: by mailman.ysv.freebsd.org (Postfix) id BD410B8045E; Sat, 25 Jun 2016 14:41:27 +0000 (UTC) Delivered-To: net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BCD2EB8045D for ; Sat, 25 Jun 2016 14:41:27 +0000 (UTC) (envelope-from rrs@netflix.com) Received: from mail-vk0-x234.google.com (mail-vk0-x234.google.com [IPv6:2607:f8b0:400c:c05::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6891A1B3D for ; Sat, 25 Jun 2016 14:41:27 +0000 (UTC) (envelope-from rrs@netflix.com) Received: by mail-vk0-x234.google.com with SMTP id u64so184605727vkf.3 for ; Sat, 25 Jun 2016 07:41:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netflix.com; s=google; h=from:mime-version:subject:in-reply-to:date:cc:message-id:references :to; bh=Duy9BnwOmcR9mCL+dw5z6on0vanDV3plsSQCxKfAZjQ=; b=DDgWUVly6boLZ4Y6k9C1+lCp+YQDoUSLcfYK/iupnpZClD2e4hJICYOigCL2WUkczS 3li1Q2JBYtjgNbyRFC3e5HCwtrFz/+znhMu1LbSbxIZvP3nak4MV73X5EG30Yhc2wGrt jP/QIUf1f7kDTFcrPXd8Vyoxw7El9LgsG3bZE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:mime-version:subject:in-reply-to:date:cc :message-id:references:to; bh=Duy9BnwOmcR9mCL+dw5z6on0vanDV3plsSQCxKfAZjQ=; b=A7kx2rx1PyRhafsesXY4auUtfkhzWtvWgiiIWkru/4viLE5ghwb8PTwzw3dbxIciXA w570XDa3IIDMwHMjz/i8Cs7HzsBejHNA36y13WXugUiwbxIt1QY0BL4r7ntGr5WG/0k6 XLKSUiSP7s1A8u6vJyO37/Px+pzgWz3bwnyU3FnLXaGbf85khC06G6nd4BPTucKds1k1 zUGlnFq2E3kNc0ayPRC5mao9SdC3guaxMQFI9n3CcPprYouVjXIaGJaAt50jKc4hsY1V cY12Z2ZFjOFWsfzYot3qJL5n3gQgRboeLX9w5pfIT2HJyXWl3aXcLdkHydL0vMiyw4d1 5ikw== X-Gm-Message-State: ALyK8tLxXSFdd/x1sb6lLbB/01qZAGCoCoTzyrtjdoBtROnu4+8w8Usyn6eNKY2YBw1W/rU1 X-Received: by 10.176.4.49 with SMTP id 46mr4269844uav.32.1466865686298; Sat, 25 Jun 2016 07:41:26 -0700 (PDT) Received: from [100.127.86.242] ([69.53.246.16]) by smtp.gmail.com with ESMTPSA id 45sm2914659uaq.6.2016.06.25.07.41.24 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sat, 25 Jun 2016 07:41:25 -0700 (PDT) From: Randall Stewart X-Google-Original-From: Randall Stewart Content-Type: multipart/mixed; boundary="Apple-Mail=_F9B158CC-A33B-400F-9DD1-C3982E64FC31" Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: panic with tcp timers In-Reply-To: <18D94615-810E-4E79-A889-4B0CC70F9E45@netflix.com> Date: Sat, 25 Jun 2016 10:41:24 -0400 Cc: Julien Charbon , Gleb Smirnoff , hselasky@FreeBSD.org, net@FreeBSD.org Message-Id: <6E52CA6A-2153-4EF9-A3E1-97CB0D07EB28@freebsd.org> References: <20160617045319.GE1076@FreeBSD.org> <1f28844b-b4ea-b544-3892-811f2be327b9@freebsd.org> <20160620073917.GI1076@FreeBSD.org> <1d18d0e2-3e42-cb26-928c-2989d0751884@freebsd.org> <20160620095822.GJ1076@FreeBSD.org> <74bb31b7-a9f5-3d0c-eea0-681872e6f09b@freebsd.org> <18D94615-810E-4E79-A889-4B0CC70F9E45@netflix.com> To: current@freebsd.org X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Jun 2016 14:41:28 -0000 --Apple-Mail=_F9B158CC-A33B-400F-9DD1-C3982E64FC31 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 Ok Lets try this again with my source changed to my @freebsd.net :-) Now I am also attaching a patch for you Gleb, this will take some poking = to get in to your NF-head since it incorporates some changes we made = earlier. I think this will fix the problem.. i.e. dealing with two locks in the = callout system (which it was never meant to have done).. Note we probably can move the code to use the callout lock init now.. = but lets see if this works on your setup on c096 and if so we can think about doing that. R --Apple-Mail=_F9B158CC-A33B-400F-9DD1-C3982E64FC31 Content-Disposition: attachment; filename=for_gleb Content-Type: application/octet-stream; name="for_gleb" Content-Transfer-Encoding: 7bit Index: tcp_timer.c =================================================================== --- tcp_timer.c (revision 302198) +++ tcp_timer.c (working copy) @@ -306,6 +306,33 @@ tcp_timer_delack(void *xtp) CURVNET_RESTORE(); } +static inline int +tcp_inp_lock_exchange(struct tcpcb *tp) +{ + struct inpcb *inp; + + + inp = tp->t_inpcb; + in_pcbref(inp); + INP_WUNLOCK(inp); + INP_INFO_RLOCK(&V_tcbinfo); + INP_WLOCK(inp); + if (inp->inp_flags & (INP_TIMEWAIT | INP_DROPPED)) { + return(1); + } + return(0); + +} + +static inline void +tcp_inp_unlock_exchange(struct tcpcb *tp) +{ + INP_INFO_RUNLOCK(&V_tcbinfo); + if (tp && in_pcbrele_wlocked(tp->t_inpcb) == 0) { + INP_WUNLOCK(tp->t_inpcb); + } +} + void tcp_timer_2msl(void *xtp) { @@ -317,7 +344,6 @@ tcp_timer_2msl(void *xtp) ostate = tp->t_state; #endif - INP_INFO_RLOCK(&V_tcbinfo); inp = tp->t_inpcb; KASSERT(inp != NULL, ("%s: tp %p tp->t_inpcb == NULL", __func__, tp)); INP_WLOCK(inp); @@ -325,7 +351,6 @@ tcp_timer_2msl(void *xtp) if (callout_pending(&tp->t_timers->tt_2msl) || !callout_active(&tp->t_timers->tt_2msl)) { INP_WUNLOCK(tp->t_inpcb); - INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); return; } @@ -332,7 +357,6 @@ tcp_timer_2msl(void *xtp) callout_deactivate(&tp->t_timers->tt_2msl); if ((inp->inp_flags & INP_DROPPED) != 0) { INP_WUNLOCK(inp); - INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); return; } @@ -355,7 +379,6 @@ tcp_timer_2msl(void *xtp) */ if ((inp->inp_flags & INP_TIMEWAIT) != 0) { INP_WUNLOCK(inp); - INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); return; } @@ -363,7 +386,13 @@ tcp_timer_2msl(void *xtp) tp->t_inpcb && tp->t_inpcb->inp_socket && (tp->t_inpcb->inp_socket->so_rcv.sb_state & SBS_CANTRCVMORE)) { TCPSTAT_INC(tcps_finwait2_drops); + if (tcp_inp_lock_exchange(tp)) { + tcp_inp_unlock_exchange(tp); + goto out; + } tp = tcp_close(tp); + tcp_inp_unlock_exchange(tp); + goto out; } else { if (ticks - tp->t_rcvtime <= TP_MAXIDLE(tp)) { if (!callout_reset(&tp->t_timers->tt_2msl, @@ -370,8 +399,15 @@ tcp_timer_2msl(void *xtp) TP_KEEPINTVL(tp), tcp_timer_2msl, tp)) { tp->t_timers->tt_flags &= ~TT_2MSL_RST; } - } else - tp = tcp_close(tp); + } else { + if (tcp_inp_lock_exchange(tp)) { + tcp_inp_unlock_exchange(tp); + goto out; + } + tp = tcp_close(tp); + tcp_inp_unlock_exchange(tp); + goto out; + } } #ifdef TCPDEBUG @@ -383,7 +419,7 @@ tcp_timer_2msl(void *xtp) if (tp != NULL) INP_WUNLOCK(inp); - INP_INFO_RUNLOCK(&V_tcbinfo); +out: CURVNET_RESTORE(); } @@ -399,7 +435,6 @@ tcp_timer_keep(void *xtp) ostate = tp->t_state; #endif - INP_INFO_RLOCK(&V_tcbinfo); inp = tp->t_inpcb; KASSERT(inp != NULL, ("%s: tp %p tp->t_inpcb == NULL", __func__, tp)); INP_WLOCK(inp); @@ -406,7 +441,6 @@ tcp_timer_keep(void *xtp) if (callout_pending(&tp->t_timers->tt_keep) || !callout_active(&tp->t_timers->tt_keep)) { INP_WUNLOCK(inp); - INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); return; } @@ -413,7 +447,6 @@ tcp_timer_keep(void *xtp) callout_deactivate(&tp->t_timers->tt_keep); if ((inp->inp_flags & INP_DROPPED) != 0) { INP_WUNLOCK(inp); - INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); return; } @@ -468,12 +501,16 @@ tcp_timer_keep(void *xtp) #endif TCP_PROBE2(debug__user, tp, PRU_SLOWTIMO); INP_WUNLOCK(inp); - INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); return; dropit: TCPSTAT_INC(tcps_keepdrops); + + if (tcp_inp_lock_exchange(tp)) { + tcp_inp_unlock_exchange(tp); + goto out; + } tp = tcp_drop(tp, ETIMEDOUT); #ifdef TCPDEBUG @@ -482,9 +519,8 @@ dropit: PRU_SLOWTIMO); #endif TCP_PROBE2(debug__user, tp, PRU_SLOWTIMO); - if (tp != NULL) - INP_WUNLOCK(tp->t_inpcb); - INP_INFO_RUNLOCK(&V_tcbinfo); + tcp_inp_unlock_exchange(tp); +out: CURVNET_RESTORE(); } @@ -499,7 +535,6 @@ tcp_timer_persist(void *xtp) ostate = tp->t_state; #endif - INP_INFO_RLOCK(&V_tcbinfo); inp = tp->t_inpcb; KASSERT(inp != NULL, ("%s: tp %p tp->t_inpcb == NULL", __func__, tp)); INP_WLOCK(inp); @@ -506,7 +541,6 @@ tcp_timer_persist(void *xtp) if (callout_pending(&tp->t_timers->tt_persist) || !callout_active(&tp->t_timers->tt_persist)) { INP_WUNLOCK(inp); - INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); return; } @@ -513,7 +547,6 @@ tcp_timer_persist(void *xtp) callout_deactivate(&tp->t_timers->tt_persist); if ((inp->inp_flags & INP_DROPPED) != 0) { INP_WUNLOCK(inp); - INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); return; } @@ -537,7 +570,12 @@ tcp_timer_persist(void *xtp) (ticks - tp->t_rcvtime >= tcp_maxpersistidle || ticks - tp->t_rcvtime >= TCP_REXMTVAL(tp) * tcp_totbackoff)) { TCPSTAT_INC(tcps_persistdrop); + if (tcp_inp_lock_exchange(tp)) { + tcp_inp_unlock_exchange(tp); + goto out; + } tp = tcp_drop(tp, ETIMEDOUT); + tcp_inp_unlock_exchange(tp); goto out; } /* @@ -547,7 +585,12 @@ tcp_timer_persist(void *xtp) if (tp->t_state > TCPS_CLOSE_WAIT && (ticks - tp->t_rcvtime) >= TCPTV_PERSMAX) { TCPSTAT_INC(tcps_persistdrop); + if (tcp_inp_lock_exchange(tp)) { + tcp_inp_unlock_exchange(tp); + goto out; + } tp = tcp_drop(tp, ETIMEDOUT); + tcp_inp_unlock_exchange(tp); goto out; } tcp_setpersist(tp); @@ -555,15 +598,13 @@ tcp_timer_persist(void *xtp) (void) tp->t_fb->tfb_tcp_output(tp); tp->t_flags &= ~TF_FORCEDATA; -out: #ifdef TCPDEBUG if (tp != NULL && tp->t_inpcb->inp_socket->so_options & SO_DEBUG) tcp_trace(TA_USER, ostate, tp, NULL, NULL, PRU_SLOWTIMO); #endif TCP_PROBE2(debug__user, tp, PRU_SLOWTIMO); - if (tp != NULL) - INP_WUNLOCK(inp); - INP_INFO_RUNLOCK(&V_tcbinfo); + INP_WUNLOCK(inp); +out: CURVNET_RESTORE(); } @@ -573,7 +614,6 @@ tcp_timer_rexmt(void * xtp) struct tcpcb *tp = xtp; CURVNET_SET(tp->t_vnet); int rexmt; - int headlocked; struct inpcb *inp; #ifdef TCPDEBUG int ostate; @@ -580,8 +620,6 @@ tcp_timer_rexmt(void * xtp) ostate = tp->t_state; #endif - - INP_INFO_RLOCK(&V_tcbinfo); inp = tp->t_inpcb; KASSERT(inp != NULL, ("%s: tp %p tp->t_inpcb == NULL", __func__, tp)); INP_WLOCK(inp); @@ -588,7 +626,6 @@ tcp_timer_rexmt(void * xtp) if (callout_pending(&tp->t_timers->tt_rexmt) || !callout_active(&tp->t_timers->tt_rexmt)) { INP_WUNLOCK(inp); - INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); return; } @@ -595,7 +632,6 @@ tcp_timer_rexmt(void * xtp) callout_deactivate(&tp->t_timers->tt_rexmt); if ((inp->inp_flags & INP_DROPPED) != 0) { INP_WUNLOCK(inp); - INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); return; } @@ -616,14 +652,15 @@ tcp_timer_rexmt(void * xtp) if (++tp->t_rxtshift > TCP_MAXRXTSHIFT) { tp->t_rxtshift = TCP_MAXRXTSHIFT; TCPSTAT_INC(tcps_timeoutdrop); - + if (tcp_inp_lock_exchange(tp)) { + tcp_inp_unlock_exchange(tp); + goto out; + } tp = tcp_drop(tp, tp->t_softerror ? tp->t_softerror : ETIMEDOUT); - headlocked = 1; + tcp_inp_unlock_exchange(tp); goto out; } - INP_INFO_RUNLOCK(&V_tcbinfo); - headlocked = 0; if (tp->t_state == TCPS_SYN_SENT) { /* * If the SYN was retransmitted, indicate CWND to be @@ -811,7 +848,6 @@ tcp_timer_rexmt(void * xtp) (void) tp->t_fb->tfb_tcp_output(tp); -out: #ifdef TCPDEBUG if (tp != NULL && (tp->t_inpcb->inp_socket->so_options & SO_DEBUG)) tcp_trace(TA_USER, ostate, tp, (void *)0, (struct tcphdr *)0, @@ -818,10 +854,8 @@ tcp_timer_rexmt(void * xtp) PRU_SLOWTIMO); #endif TCP_PROBE2(debug__user, tp, PRU_SLOWTIMO); - if (tp != NULL) - INP_WUNLOCK(inp); - if (headlocked) - INP_INFO_RUNLOCK(&V_tcbinfo); + INP_WUNLOCK(inp); +out: CURVNET_RESTORE(); } @@ -832,7 +866,6 @@ tcp_timer_activate(struct tcpcb *tp, uint32_t time timeout_t *f_callout; struct inpcb *inp = tp->t_inpcb; int cpu = inp_to_cpuid(inp); - uint32_t f_reset; #ifdef TCP_OFFLOAD if (tp->t_flags & TF_TOE) @@ -846,27 +879,22 @@ tcp_timer_activate(struct tcpcb *tp, uint32_t time case TT_DELACK: t_callout = &tp->t_timers->tt_delack; f_callout = tcp_timer_delack; - f_reset = TT_DELACK_RST; break; case TT_REXMT: t_callout = &tp->t_timers->tt_rexmt; f_callout = tcp_timer_rexmt; - f_reset = TT_REXMT_RST; break; case TT_PERSIST: t_callout = &tp->t_timers->tt_persist; f_callout = tcp_timer_persist; - f_reset = TT_PERSIST_RST; break; case TT_KEEP: t_callout = &tp->t_timers->tt_keep; f_callout = tcp_timer_keep; - f_reset = TT_KEEP_RST; break; case TT_2MSL: t_callout = &tp->t_timers->tt_2msl; f_callout = tcp_timer_2msl; - f_reset = TT_2MSL_RST; break; default: if (tp->t_fb->tfb_tcp_timer_activate) { @@ -876,24 +904,9 @@ tcp_timer_activate(struct tcpcb *tp, uint32_t time panic("tp %p bad timer_type %#x", tp, timer_type); } if (delta == 0) { - if ((tp->t_timers->tt_flags & timer_type) && - (callout_stop(t_callout) > 0) && - (tp->t_timers->tt_flags & f_reset)) { - tp->t_timers->tt_flags &= ~(timer_type | f_reset); - } + callout_stop(t_callout); } else { - if ((tp->t_timers->tt_flags & timer_type) == 0) { - tp->t_timers->tt_flags |= (timer_type | f_reset); - callout_reset_on(t_callout, delta, f_callout, tp, cpu); - } else { - /* Reset already running callout on the same CPU. */ - if (!callout_reset(t_callout, delta, f_callout, tp)) { - /* - * Callout not cancelled, consider it as not - * properly restarted. */ - tp->t_timers->tt_flags &= ~f_reset; - } - } + callout_reset_on(t_callout, delta, f_callout, tp, cpu); } } @@ -931,30 +944,23 @@ void tcp_timer_stop(struct tcpcb *tp, uint32_t timer_type) { struct callout *t_callout; - uint32_t f_reset; tp->t_timers->tt_flags |= TT_STOPPED; - switch (timer_type) { case TT_DELACK: t_callout = &tp->t_timers->tt_delack; - f_reset = TT_DELACK_RST; break; case TT_REXMT: t_callout = &tp->t_timers->tt_rexmt; - f_reset = TT_REXMT_RST; break; case TT_PERSIST: t_callout = &tp->t_timers->tt_persist; - f_reset = TT_PERSIST_RST; break; case TT_KEEP: t_callout = &tp->t_timers->tt_keep; - f_reset = TT_KEEP_RST; break; case TT_2MSL: t_callout = &tp->t_timers->tt_2msl; - f_reset = TT_2MSL_RST; break; default: if (tp->t_fb->tfb_tcp_timer_stop) { @@ -968,15 +974,13 @@ tcp_timer_stop(struct tcpcb *tp, uint32_t timer_ty panic("tp %p bad timer_type %#x", tp, timer_type); } - if (tp->t_timers->tt_flags & timer_type) { - if (callout_async_drain(t_callout, tcp_timer_discard) == 0) { - /* - * Can't stop the callout, defer tcpcb actual deletion - * to the last one. We do this using the async drain - * function and incrementing the count in - */ - tp->t_timers->tt_draincnt++; - } + if (callout_async_drain(t_callout, tcp_timer_discard) == 0) { + /* + * Can't stop the callout, defer tcpcb actual deletion + * to the last one. We do this using the async drain + * function and incrementing the count in + */ + tp->t_timers->tt_draincnt++; } } --Apple-Mail=_F9B158CC-A33B-400F-9DD1-C3982E64FC31 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 > On Jun 25, 2016, at 4:48 AM, Randall Stewart wrote: >=20 > So >=20 > All of our timers in TCP do something like > --------------------- > INFO-LOCK > INP_WLOCK >=20 > if (inp needs to be dropped) { > drop-it > } > do other work >=20 > UNLOCK-INP > UNLOCK-INFO > ------------------ >=20 > And generally the path =93inp needs to be dropped=94 is rarely taken. >=20 > So why don=92t we change the procedure to something like: >=20 > INP_WLOCK > if (inp needs to be dropped) { > inp_ref() > INP_WUNLOCK() > INFO_LOCK() > INP_WLOCK() > if (inp_release_ref) { > /* victory its dead */ > INFO_UNLOCK > return > } > do-the-drop > } >=20 > This way we would only go grab the INFO lock in those rarer cases > when we *do* actually want to kill the tcb and at the same time > it would make the current callout system work correctly.. which as > many have pointed out would be much better if we could just let the > lock be gotten by the callout system. Hmm maybe with this scheme we > could even do that... >=20 > R >=20 >=20 >> On Jun 20, 2016, at 1:45 PM, Julien Charbon wrote: >>=20 >>=20 >> Hi, >>=20 >> On 6/20/16 11:58 AM, Gleb Smirnoff wrote: >>> On Mon, Jun 20, 2016 at 11:55:55AM +0200, Julien Charbon wrote: >>> J> > On Fri, Jun 17, 2016 at 11:27:39AM +0200, Julien Charbon wrote: >>> J> > J> > Comparing stable/10 and head, I see two changes that could >>> J> > J> > affect that: >>> J> > J> >=20 >>> J> > J> > - callout_async_drain >>> J> > J> > - switch to READ lock for inp info in tcp timers >>> J> > J> >=20 >>> J> > J> > That's why you are in To, Julien and Hans :) >>> J> > J> >=20 >>> J> > J> > We continue investigating, and I will keep you updated. >>> J> > J> > However, any help is welcome. I can share cores. >>> J> >=20 >>> J> > Now, spending some time with cores and adding a bunch of >>> J> > extra CTRs, I have a sequence of events that lead to the >>> J> > panic. In short, the bug is in the callout system. It seems >>> J> > to be not relevant to the callout_async_drain, at least for >>> J> > now. The transition to READ lock unmasked the problem, that's >>> J> > why NetflixBSD 10 doesn't panic. >>> J> >=20 >>> J> > The panic requires heavy contention on the TCP info lock. >>> J> >=20 >>> J> > [CPU 1] the callout fires, tcp_timer_keep entered >>> J> > [CPU 1] blocks on INP_INFO_RLOCK(&V_tcbinfo); >>> J> > [CPU 2] schedules the callout >>> J> > [CPU 2] tcp_discardcb called >>> J> > [CPU 2] callout successfully canceled >>> J> > [CPU 2] tcpcb freed >>> J> > [CPU 1] unblocks... panic >>> J> >=20 >>> J> > When the lock was WLOCK, all contenders were resumed in a >>> J> > sequence they came to the lock. Now, that they are readers, >>> J> > once the lock is released, readers are resumed in a "random" >>> J> > order, and this allows tcp_discardcb to go before the old >>> J> > running callout, and this unmasks the panic. >>> J>=20 >>> J> Highly interesting. I should be able to reproduce that (will be = useful >>> J> for testing the corresponding fix). >>> J>=20 >>> J> Fix proposal: If callout_async_drain() returns 0 (fail) = (instead of 1 >>> J> (success) here) when the callout cancellation is a success _but_ = the >>> J> callout is current running, that should fix it. >>> J>=20 >>> J> For the history: It comes back to my old callout question: >>> J>=20 >>> J> Does _callout_stop_safe() is allowed to return 1 (success) even = if the >>> J> callout is still currently running; a.k.a. it is not because you >>> J> successfully cancelled a callout that the callout is not = currently running. >>> J>=20 >>> J> We did propose a patch to make _callout_stop_safe() returns 0 = (fail) >>> J> when the callout is currently running: >>> J>=20 >>> J> callout_stop() should return 0 when the callout is currently = being >>> J> serviced and indeed unstoppable >>> J> = https://reviews.freebsd.org/differential/changeset/?ref=3D62513&whitespace= =3Dignore-most >>> J>=20 >>> J> But this change impacted too many old code paths and was = interesting >>> J> only for TCP timers and thus was abandoned. >>>=20 >>> The fix I am working on now is doing exactly that. callout_reset = must >>> return 0 if the callout is currently running. >>>=20 >>> What are the old paths impacted? >>=20 >> Actually all the paths that check the callout_stop() return value AND >> call both callout_reset() and callout_stop() AND use mpsafe = callout(). >> And for each path, we would have to check our patch was ok (or not). >>=20 >> Thus, if you only do the change in callout_async_drain() context, you >> don't impact the "old" callout_stop() behavior and get the desired >> behavior for the TCP timers. Might be a good trade-off... >>=20 >> My 2 cents. >>=20 >> -- >> Julien >=20 > -------- > Randall Stewart > rrs@netflix.com > 803-317-4952 >=20 >=20 >=20 >=20 >=20 -------- Randall Stewart rrs@netflix.com 803-317-4952 --Apple-Mail=_F9B158CC-A33B-400F-9DD1-C3982E64FC31--