From owner-dev-commits-src-branches@freebsd.org Wed Jun 9 11:14:22 2021 Return-Path: Delivered-To: dev-commits-src-branches@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id BA32E64BE4B; Wed, 9 Jun 2021 11:14:22 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4G0PdB4qCQz3ttb; Wed, 9 Jun 2021 11:14:22 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 8DAA11B489; Wed, 9 Jun 2021 11:14:22 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 159BEMkl047416; Wed, 9 Jun 2021 11:14:22 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 159BEMl8047415; Wed, 9 Jun 2021 11:14:22 GMT (envelope-from git) Date: Wed, 9 Jun 2021 11:14:22 GMT Message-Id: <202106091114.159BEMl8047415@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org From: Richard Scheffenegger Subject: git: 55cc0a478506 - stable/13 - [tcp] Keep socket buffer locked until upcall MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: rscheff X-Git-Repository: src X-Git-Refname: refs/heads/stable/13 X-Git-Reftype: branch X-Git-Commit: 55cc0a478506ee1c2db7b2f9aadb9855e5490af3 Auto-Submitted: auto-generated X-BeenThere: dev-commits-src-branches@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Commits to the stable branches of the FreeBSD src repository List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jun 2021 11:14:22 -0000 The branch stable/13 has been updated by rscheff: URL: https://cgit.FreeBSD.org/src/commit/?id=55cc0a478506ee1c2db7b2f9aadb9855e5490af3 commit 55cc0a478506ee1c2db7b2f9aadb9855e5490af3 Author: Richard Scheffenegger AuthorDate: 2021-05-21 09:00:53 +0000 Commit: Richard Scheffenegger CommitDate: 2021-06-09 10:51:19 +0000 [tcp] Keep socket buffer locked until upcall r367492 would unlock the socket buffer before eventually calling the upcall. This leads to problematic interaction with NFS kernel server/client components (MP threads) accessing the socket buffer with potentially not correctly updated state. Reported by: rmacklem Reviewed By: tuexen, #transport Tested by: rmacklem, otis MFC after: 2 weeks Sponsored By: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29690 (cherry picked from commit 032bf749fd44ac5ff20aab2c3d8e3c05491778ea) --- sys/netinet/tcp_input.c | 38 ++++++++++++++------------------ sys/netinet/tcp_reass.c | 2 -- sys/netinet/tcp_stacks/bbr.c | 20 ++++++++--------- sys/netinet/tcp_stacks/rack.c | 20 ++++++++--------- sys/netinet/tcp_stacks/rack_bbr_common.c | 1 - sys/netinet/tcp_var.h | 2 +- 6 files changed, 37 insertions(+), 46 deletions(-) diff --git a/sys/netinet/tcp_input.c b/sys/netinet/tcp_input.c index 5e25b38c45e2..19232218170b 100644 --- a/sys/netinet/tcp_input.c +++ b/sys/netinet/tcp_input.c @@ -1505,18 +1505,17 @@ tcp_handle_wakeup(struct tcpcb *tp, struct socket *so) * the TIME_WAIT state before coming here, we need * to check if the socket is still connected. */ - if ((so->so_state & SS_ISCONNECTED) == 0) + if (tp == NULL) { return; + } + if (so == NULL) { + return; + } INP_LOCK_ASSERT(tp->t_inpcb); if (tp->t_flags & TF_WAKESOR) { tp->t_flags &= ~TF_WAKESOR; - SOCKBUF_UNLOCK_ASSERT(&so->so_rcv); - sorwakeup(so); - } - if (tp->t_flags & TF_WAKESOW) { - tp->t_flags &= ~TF_WAKESOW; - SOCKBUF_UNLOCK_ASSERT(&so->so_snd); - sowwakeup(so); + SOCKBUF_LOCK_ASSERT(&so->so_rcv); + sorwakeup_locked(so); } } @@ -1894,7 +1893,7 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, struct socket *so, else if (!tcp_timer_active(tp, TT_PERSIST)) tcp_timer_activate(tp, TT_REXMT, tp->t_rxtcur); - tp->t_flags |= TF_WAKESOW; + sowwakeup(so); if (sbavail(&so->so_snd)) (void) tp->t_fb->tfb_tcp_output(tp); goto check_delack; @@ -1959,8 +1958,8 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, struct socket *so, m_adj(m, drop_hdrlen); /* delayed header drop */ sbappendstream_locked(&so->so_rcv, m, 0); } - SOCKBUF_UNLOCK(&so->so_rcv); - tp->t_flags |= TF_WAKESOR; + /* NB: sorwakeup_locked() does an implicit unlock. */ + sorwakeup_locked(so); if (DELAY_ACK(tp, tlen)) { tp->t_flags |= TF_DELACK; } else { @@ -2498,9 +2497,11 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, struct socket *so, * If segment contains data or ACK, will call tcp_reass() * later; if not, do so now to pass queued data to user. */ - if (tlen == 0 && (thflags & TH_FIN) == 0) + if (tlen == 0 && (thflags & TH_FIN) == 0) { (void) tcp_reass(tp, (struct tcphdr *)0, NULL, 0, (struct mbuf *)0); + tcp_handle_wakeup(tp, so); + } tp->snd_wl1 = th->th_seq - 1; /* FALLTHROUGH */ @@ -2952,8 +2953,8 @@ process_ACK: tp->snd_wnd = 0; ourfinisacked = 0; } - SOCKBUF_UNLOCK(&so->so_snd); - tp->t_flags |= TF_WAKESOW; + /* NB: sowwakeup_locked() does an implicit unlock. */ + sowwakeup_locked(so); m_freem(mfree); /* Detect una wraparound. */ if (!IN_RECOVERY(tp->t_flags) && @@ -3174,7 +3175,6 @@ dodata: /* XXX */ m_freem(m); else sbappendstream_locked(&so->so_rcv, m, 0); - SOCKBUF_UNLOCK(&so->so_rcv); tp->t_flags |= TF_WAKESOR; } else { /* @@ -3221,6 +3221,7 @@ dodata: /* XXX */ save_start + tlen); } } + tcp_handle_wakeup(tp, so); #if 0 /* * Note the amount of data that peer has sent into @@ -3244,9 +3245,8 @@ dodata: /* XXX */ */ if (thflags & TH_FIN) { if (TCPS_HAVERCVDFIN(tp->t_state) == 0) { - socantrcvmore(so); /* The socket upcall is handled by socantrcvmore. */ - tp->t_flags &= ~TF_WAKESOR; + socantrcvmore(so); /* * If connection is half-synchronized * (ie NEEDSYN flag on) then delay ACK, @@ -3310,7 +3310,6 @@ check_delack: tp->t_flags &= ~TF_DELACK; tcp_timer_activate(tp, TT_DELACK, tcp_delacktime); } - tcp_handle_wakeup(tp, so); INP_WUNLOCK(tp->t_inpcb); return; @@ -3344,7 +3343,6 @@ dropafterack: TCP_PROBE3(debug__input, tp, th, m); tp->t_flags |= TF_ACKNOW; (void) tp->t_fb->tfb_tcp_output(tp); - tcp_handle_wakeup(tp, so); INP_WUNLOCK(tp->t_inpcb); m_freem(m); return; @@ -3352,7 +3350,6 @@ dropafterack: dropwithreset: if (tp != NULL) { tcp_dropwithreset(m, th, tp, tlen, rstreason); - tcp_handle_wakeup(tp, so); INP_WUNLOCK(tp->t_inpcb); } else tcp_dropwithreset(m, th, NULL, tlen, rstreason); @@ -3369,7 +3366,6 @@ drop: #endif TCP_PROBE3(debug__input, tp, th, m); if (tp != NULL) { - tcp_handle_wakeup(tp, so); INP_WUNLOCK(tp->t_inpcb); } m_freem(m); diff --git a/sys/netinet/tcp_reass.c b/sys/netinet/tcp_reass.c index e708b6db14c0..5b9255da3acf 100644 --- a/sys/netinet/tcp_reass.c +++ b/sys/netinet/tcp_reass.c @@ -973,7 +973,6 @@ new_entry: } else { sbappendstream_locked(&so->so_rcv, m, 0); } - SOCKBUF_UNLOCK(&so->so_rcv); tp->t_flags |= TF_WAKESOR; return (flags); } @@ -1122,7 +1121,6 @@ present: #ifdef TCP_REASS_LOGGING tcp_reass_log_dump(tp); #endif - SOCKBUF_UNLOCK(&so->so_rcv); tp->t_flags |= TF_WAKESOR; return (flags); } diff --git a/sys/netinet/tcp_stacks/bbr.c b/sys/netinet/tcp_stacks/bbr.c index a3f7961af345..7d709e33f0d7 100644 --- a/sys/netinet/tcp_stacks/bbr.c +++ b/sys/netinet/tcp_stacks/bbr.c @@ -7829,8 +7829,8 @@ bbr_process_ack(struct mbuf *m, struct tcphdr *th, struct socket *so, acked_amount = min(acked, (int)sbavail(&so->so_snd)); tp->snd_wnd -= acked_amount; mfree = sbcut_locked(&so->so_snd, acked_amount); - SOCKBUF_UNLOCK(&so->so_snd); - tp->t_flags |= TF_WAKESOW; + /* NB: sowwakeup_locked() does an implicit unlock. */ + sowwakeup_locked(so); m_freem(mfree); if (SEQ_GT(th->th_ack, tp->snd_una)) { bbr_collapse_rtt(tp, bbr, TCP_REXMTVAL(tp)); @@ -8306,7 +8306,6 @@ bbr_process_data(struct mbuf *m, struct tcphdr *th, struct socket *so, appended = #endif sbappendstream_locked(&so->so_rcv, m, 0); - SOCKBUF_UNLOCK(&so->so_rcv); tp->t_flags |= TF_WAKESOR; #ifdef NETFLIX_SB_LIMITS if (so->so_rcv.sb_shlim && appended != mcnt) @@ -8358,6 +8357,7 @@ bbr_process_data(struct mbuf *m, struct tcphdr *th, struct socket *so, save_start + tlen); } } + tcp_handle_wakeup(tp, so); } else { m_freem(m); thflags &= ~TH_FIN; @@ -8369,9 +8369,8 @@ bbr_process_data(struct mbuf *m, struct tcphdr *th, struct socket *so, */ if (thflags & TH_FIN) { if (TCPS_HAVERCVDFIN(tp->t_state) == 0) { - socantrcvmore(so); /* The socket upcall is handled by socantrcvmore. */ - tp->t_flags &= ~TF_WAKESOR; + socantrcvmore(so); /* * If connection is half-synchronized (ie NEEDSYN * flag on) then delay ACK, so it may be piggybacked @@ -8562,8 +8561,8 @@ bbr_do_fastnewdata(struct mbuf *m, struct tcphdr *th, struct socket *so, sbappendstream_locked(&so->so_rcv, m, 0); ctf_calc_rwin(so, tp); } - SOCKBUF_UNLOCK(&so->so_rcv); - tp->t_flags |= TF_WAKESOR; + /* NB: sorwakeup_locked() does an implicit unlock. */ + sorwakeup_locked(so); #ifdef NETFLIX_SB_LIMITS if (so->so_rcv.sb_shlim && mcnt != appended) counter_fo_release(so->so_rcv.sb_shlim, mcnt - appended); @@ -8754,7 +8753,7 @@ bbr_fastack(struct mbuf *m, struct tcphdr *th, struct socket *so, &tcp_savetcp, 0); #endif /* Wake up the socket if we have room to write more */ - tp->t_flags |= TF_WAKESOW; + sowwakeup(so); if (tp->snd_una == tp->snd_max) { /* Nothing left outstanding */ bbr_log_progress_event(bbr, tp, ticks, PROGRESS_CLEAR, __LINE__); @@ -9162,9 +9161,11 @@ bbr_do_syn_recv(struct mbuf *m, struct tcphdr *th, struct socket *so, * If segment contains data or ACK, will call tcp_reass() later; if * not, do so now to pass queued data to user. */ - if (tlen == 0 && (thflags & TH_FIN) == 0) + if (tlen == 0 && (thflags & TH_FIN) == 0) { (void)tcp_reass(tp, (struct tcphdr *)0, NULL, 0, (struct mbuf *)0); + tcp_handle_wakeup(tp, so); + } tp->snd_wl1 = th->th_seq - 1; if (bbr_process_ack(m, th, so, tp, to, tiwin, tlen, &ourfinisacked, thflags, &ret_val)) { return (ret_val); @@ -11716,7 +11717,6 @@ bbr_do_segment(struct mbuf *m, struct tcphdr *th, struct socket *so, retval = bbr_do_segment_nounlock(m, th, so, tp, drop_hdrlen, tlen, iptos, 0, &tv); if (retval == 0) { - tcp_handle_wakeup(tp, so); INP_WUNLOCK(tp->t_inpcb); } } diff --git a/sys/netinet/tcp_stacks/rack.c b/sys/netinet/tcp_stacks/rack.c index 4a547e900c4f..9203eb8e252d 100644 --- a/sys/netinet/tcp_stacks/rack.c +++ b/sys/netinet/tcp_stacks/rack.c @@ -9869,8 +9869,8 @@ rack_process_ack(struct mbuf *m, struct tcphdr *th, struct socket *so, if (acked_amount && sbavail(&so->so_snd)) rack_adjust_sendmap(rack, &so->so_snd, tp->snd_una); rack_log_wakeup(tp,rack, &so->so_snd, acked, 2); - SOCKBUF_UNLOCK(&so->so_snd); - tp->t_flags |= TF_WAKESOW; + /* NB: sowwakeup_locked() does an implicit unlock. */ + sowwakeup_locked(so); m_freem(mfree); if (SEQ_GT(tp->snd_una, tp->snd_recover)) tp->snd_recover = tp->snd_una; @@ -10221,7 +10221,6 @@ rack_process_data(struct mbuf *m, struct tcphdr *th, struct socket *so, sbappendstream_locked(&so->so_rcv, m, 0); rack_log_wakeup(tp,rack, &so->so_rcv, tlen, 1); - SOCKBUF_UNLOCK(&so->so_rcv); tp->t_flags |= TF_WAKESOR; #ifdef NETFLIX_SB_LIMITS if (so->so_rcv.sb_shlim && appended != mcnt) @@ -10279,6 +10278,7 @@ rack_process_data(struct mbuf *m, struct tcphdr *th, struct socket *so, save_start + tlen); } } + tcp_handle_wakeup(tp, so); } else { m_freem(m); thflags &= ~TH_FIN; @@ -10290,9 +10290,8 @@ rack_process_data(struct mbuf *m, struct tcphdr *th, struct socket *so, */ if (thflags & TH_FIN) { if (TCPS_HAVERCVDFIN(tp->t_state) == 0) { - socantrcvmore(so); /* The socket upcall is handled by socantrcvmore. */ - tp->t_flags &= ~TF_WAKESOR; + socantrcvmore(so); /* * If connection is half-synchronized (ie NEEDSYN * flag on) then delay ACK, so it may be piggybacked @@ -10485,7 +10484,6 @@ rack_do_fastnewdata(struct mbuf *m, struct tcphdr *th, struct socket *so, ctf_calc_rwin(so, tp); } rack_log_wakeup(tp,rack, &so->so_rcv, tlen, 1); - SOCKBUF_UNLOCK(&so->so_rcv); tp->t_flags |= TF_WAKESOR; #ifdef NETFLIX_SB_LIMITS if (so->so_rcv.sb_shlim && mcnt != appended) @@ -10645,8 +10643,7 @@ rack_fastack(struct mbuf *m, struct tcphdr *th, struct socket *so, rack_adjust_sendmap(rack, &so->so_snd, tp->snd_una); /* Wake up the socket if we have room to write more */ rack_log_wakeup(tp,rack, &so->so_snd, acked, 2); - SOCKBUF_UNLOCK(&so->so_snd); - tp->t_flags |= TF_WAKESOW; + sowwakeup(so); m_freem(mfree); tp->t_rxtshift = 0; RACK_TCPT_RANGESET(tp->t_rxtcur, RACK_REXMTVAL(tp), @@ -11084,9 +11081,11 @@ rack_do_syn_recv(struct mbuf *m, struct tcphdr *th, struct socket *so, * If segment contains data or ACK, will call tcp_reass() later; if * not, do so now to pass queued data to user. */ - if (tlen == 0 && (thflags & TH_FIN) == 0) + if (tlen == 0 && (thflags & TH_FIN) == 0) { (void) tcp_reass(tp, (struct tcphdr *)0, NULL, 0, (struct mbuf *)0); + tcp_handle_wakeup(tp, so); + } tp->snd_wl1 = th->th_seq - 1; /* For syn-recv we need to possibly update the rtt */ if ((to->to_flags & TOF_TS) != 0 && to->to_tsecr) { @@ -13173,8 +13172,7 @@ rack_do_compressed_ack_processing(struct tcpcb *tp, struct socket *so, struct mb rack_adjust_sendmap(rack, &so->so_snd, tp->snd_una); /* Wake up the socket if we have room to write more */ rack_log_wakeup(tp,rack, &so->so_snd, acked, 2); - SOCKBUF_UNLOCK(&so->so_snd); - tp->t_flags |= TF_WAKESOW; + sowwakeup(so); m_freem(mfree); } /* update progress */ diff --git a/sys/netinet/tcp_stacks/rack_bbr_common.c b/sys/netinet/tcp_stacks/rack_bbr_common.c index 6475a1a01c26..501d29ac48da 100644 --- a/sys/netinet/tcp_stacks/rack_bbr_common.c +++ b/sys/netinet/tcp_stacks/rack_bbr_common.c @@ -533,7 +533,6 @@ ctf_do_queued_segments(struct socket *so, struct tcpcb *tp, int have_pkt) /* We lost the tcpcb (maybe a RST came in)? */ return(1); } - tcp_handle_wakeup(tp, so); } return (0); } diff --git a/sys/netinet/tcp_var.h b/sys/netinet/tcp_var.h index 917e993d79e0..7b51afbb42b6 100644 --- a/sys/netinet/tcp_var.h +++ b/sys/netinet/tcp_var.h @@ -408,7 +408,7 @@ TAILQ_HEAD(tcp_funchead, tcp_function); #define TF_FORCEDATA 0x00800000 /* force out a byte */ #define TF_TSO 0x01000000 /* TSO enabled on this connection */ #define TF_TOE 0x02000000 /* this connection is offloaded */ -#define TF_WAKESOW 0x04000000 /* wake up send socket */ +#define TF_UNUSED0 0x04000000 /* unused */ #define TF_UNUSED1 0x08000000 /* unused */ #define TF_UNUSED2 0x10000000 /* unused */ #define TF_CONGRECOVERY 0x20000000 /* congestion recovery mode */