From owner-freebsd-net@FreeBSD.ORG Fri Oct 3 13:17:02 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 441F79BA for ; Fri, 3 Oct 2014 13:17:02 +0000 (UTC) Received: from mail-la0-f51.google.com (mail-la0-f51.google.com [209.85.215.51]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6BB25BAD for ; Fri, 3 Oct 2014 13:17:00 +0000 (UTC) Received: by mail-la0-f51.google.com with SMTP id ge10so1019141lab.10 for ; Fri, 03 Oct 2014 06:16:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type; bh=B1eFBJdDDThIEVdYkxdEiOh8QmeNUIj48gSb0GoGm58=; b=LZwbdHHvBTEN3Q4x+J2JCOdVbgFeNkhXYtbNx+COm4OJdqtwLWGUXIy4xUvipTRNBJ 6PskEHlxIWPJClxwmN5OYbdyPrRU69cOIuSWAZbbIcO6IvdXvvbgGDVw3Ta4cvfibRhZ J8chcRmK55DW2UT1Igv01lkEQavDUjdcLGanYoVuW67ebrJ6sHUSMkNhyfYU021YVj56 0h1e9JwsYcELFMsHVvxRtgN4yBQaKJeE3fu/bl19byrqA6wKNGXtDwrRJZMLJvZsUek2 VDMCBqp4Mbb9FCoFabIlF/HHgtSb1cO6aPvbomChygD0In0o5pa6VB/WorBcRIa6TpMV KgMg== X-Gm-Message-State: ALoCoQm0aMM3Bkncb0e0lJssitdErUU+cYWS+K6oHD0mgq1+exz0R9XFBFsBnxPiMX1LW8JlfOc0 X-Received: by 10.112.166.35 with SMTP id zd3mr5571896lbb.3.1412342213008; Fri, 03 Oct 2014 06:16:53 -0700 (PDT) Received: from FRI2JCHARBON-M1.local ([217.30.88.7]) by mx.google.com with ESMTPSA id u6sm2686307lag.19.2014.10.03.06.16.51 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 03 Oct 2014 06:16:52 -0700 (PDT) Message-ID: <542EA1C9.6080907@freebsd.org> Date: Fri, 03 Oct 2014 15:16:57 +0200 From: Julien Charbon User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: freebsd-net@freebsd.org Subject: Re: TCP stack lock contention with short-lived connections References: <537F39DF.1090900@verisign.com> In-Reply-To: <537F39DF.1090900@verisign.com> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="IArnHQlKn5sngVOmi6i0lvig0oDVojeVN" Cc: "De La Gueronniere, Marc" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Oct 2014 13:17:02 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --IArnHQlKn5sngVOmi6i0lvig0oDVojeVN Content-Type: multipart/mixed; boundary="------------090402000607020404040507" This is a multi-part message in MIME format. --------------090402000607020404040507 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi, On 23/05/14 14:06, Julien Charbon wrote: > On 27/02/14 11:32, Julien Charbon wrote: >> On 07/11/13 14:55, Julien Charbon wrote: >>> On Mon, 04 Nov 2013 22:21:04 +0100, Julien Charbon >>> wrote: >>>> I have put technical and how-to-repeat details in below PR: >>>> >>>> kern/183659: TCP stack lock contention with short-lived connections >>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D183659 >>>> >>>> We are currently working on this performance improvement effort; = it >>>> will impact only the TCP locking strategy not the TCP stack logic >>>> itself. We will share on freebsd-net the patches we made for >>>> reviewing and improvement propositions; anyway this change might al= so >>>> require enough eyeballs to avoid tricky race conditions introduction= >>>> in TCP stack. As usual, first a follow-up on TCP short-lived connections improvement progress: Thanks to jhb (committer) and adrian, hiren, jhb, Mike Bentkofsky (reviewers) the SYN reception optimization has been pushed in HEAD: "In tcp_input(), don't acquire the pcbinfo global write lock for SYN packets targeting a listening socket" http://svnweb.freebsd.org/base?view=3Drevision&revision=3D271119 Next, two related patches are remaining: - The first one is actually a race condition fix, thanks to Marc for spotting it. This race has been introduced with our first TCP timewait change: http://svnweb.freebsd.org/base?view=3Drevision&revision=3D264321 The proposed fix is currently under review (see also joined patch): https://reviews.freebsd.org/D826 Note: The original change has not been MFC'ed, thus this race condition is only in HEAD. - The second one being (see also joined patch): https://github.com/verisign/freebsd/commit/f4c11c6b678195515bbab8bb2825fa= 5222ed3a58 Nothing new here (just read previous emails in this thread for details). The patch just becomes easier to read with time. As we didn't find any issues with it, following a proposition from Marc we are going to start a specific code review process: #1 Write down all the current INP_INFO_WLOCK locking rules, especially all the non-obvious ones #2 Check that all these rules are still respected with the proposed improvement It will permit an in depth code review, and to get all pcbinfo/inpcb related locking rules well described. -- Julien --------------090402000607020404040507 Content-Type: text/plain; charset=UTF-8; name="fix-tcptw-race.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="fix-tcptw-race.patch" =46rom 91a1a400f50aad0e2304d655a78286d9285efbc5 Mon Sep 17 00:00:00 2001 From: Julien Charbon Date: Thu, 4 Sep 2014 09:40:21 +0200 Subject: [PATCH] [tcp-scale] Fix a race condition in TCP timewait between= tcp_tw_2msl_reuse() and tcp_tw_2msl_scan() that drives (at least) timewa= it timeout cancellation. Also simplify implementation by holding inpcb reference and removing tw_pcbref() and tw_pcbrele(). --- sys/netinet/tcp_timewait.c | 133 ++++++++++++++++++++++++++++-----------= ------ sys/netinet/tcp_usrreq.c | 15 +++++ sys/netinet/tcp_var.h | 1 - 3 files changed, 99 insertions(+), 50 deletions(-) diff --git a/sys/netinet/tcp_timewait.c b/sys/netinet/tcp_timewait.c index 33555d9..9c17655 100644 --- a/sys/netinet/tcp_timewait.c +++ b/sys/netinet/tcp_timewait.c @@ -49,7 +49,6 @@ __FBSDID("$FreeBSD$"); #include #include #include -#include =20 #include =20 @@ -101,6 +100,11 @@ static int maxtcptw; * currently in the TIME_WAIT state. The queue pointers, including the * queue pointers in each tcptw structure, are protected using the globa= l * timewait lock, which must be held over queue iteration and modificati= on. + * + * Rules on tcptw usage: + * - a inpcb is always freed _after_ its tcptw + * - a tcptw relies on its inpcb reference counting for memory stabilit= y + * - a tcptw is valid only under its inpcb locked */ static VNET_DEFINE(TAILQ_HEAD(, tcptw), twq_2msl); #define V_twq_2msl VNET(twq_2msl) @@ -124,32 +128,6 @@ static void tcp_tw_2msl_reset(struct tcptw *, int); static void tcp_tw_2msl_stop(struct tcptw *, int); static int tcp_twrespond(struct tcptw *, int); =20 -/* - * tw_pcbref() bumps the reference count on an tw in order to maintain - * stability of an tw pointer despite the tw lock being released. - */ -static void -tw_pcbref(struct tcptw *tw) -{ - - KASSERT(tw->tw_refcount > 0, ("%s: refcount 0", __func__)); - refcount_acquire(&tw->tw_refcount); -} - -/* - * Drop a refcount on an tw elevated using tw_pcbref(). - */ -static int -tw_pcbrele(struct tcptw *tw) -{ - - KASSERT(tw->tw_refcount > 0, ("%s: refcount 0", __func__)); - if (!refcount_release(&tw->tw_refcount)) - return (0); - uma_zfree(V_tcptw_zone, tw); - return (1); -} - static int tcptw_auto_size(void) { @@ -289,7 +267,11 @@ tcp_twstart(struct tcpcb *tp) } } tw->tw_inpcb =3D inp; - refcount_init(&tw->tw_refcount, 1); + /* + * The tcptw will hold a reference on its inpcb until tcp_twclose + * is called + */ + in_pcbref(inp); /* Reference from tw */ =20 /* * Recover last window size sent. @@ -479,7 +461,6 @@ tcp_twclose(struct tcptw *tw, int reuse) INP_INFO_WLOCK_ASSERT(&V_tcbinfo); /* in_pcbfree() */ INP_WLOCK_ASSERT(inp); =20 - tw->tw_inpcb =3D NULL; tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb =3D NULL; in_pcbdrop(inp); @@ -509,8 +490,13 @@ tcp_twclose(struct tcptw *tw, int reuse) */ INP_WUNLOCK(inp); } - } else + } else { + /* + * The socket has been already cleaned-up for us, only free the + * inpcb. + */ in_pcbfree(inp); + } TCPSTAT_INC(tcps_closed); } =20 @@ -641,36 +627,70 @@ tcp_tw_2msl_reset(struct tcptw *tw, int rearm) static void tcp_tw_2msl_stop(struct tcptw *tw, int reuse) { + struct ucred *cred; + struct inpcb *inp; + int released; =20 INP_INFO_WLOCK_ASSERT(&V_tcbinfo); =20 TW_WLOCK(V_tw_lock); + inp =3D tw->tw_inpcb; + tw->tw_inpcb =3D NULL; + TAILQ_REMOVE(&V_twq_2msl, tw, tw_2msl); - crfree(tw->tw_cred); + cred =3D tw->tw_cred; tw->tw_cred =3D NULL; TW_WUNLOCK(V_tw_lock); =20 + if (cred !=3D NULL) + crfree(cred); + + released =3D in_pcbrele_wlocked(inp); + KASSERT(!released, ("%s: inp should not be released here", __func__)); + if (!reuse) - tw_pcbrele(tw); + uma_zfree(V_tcptw_zone, tw); } =20 struct tcptw * tcp_tw_2msl_reuse(void) { struct tcptw *tw; + struct inpcb *inp; =20 INP_INFO_WLOCK_ASSERT(&V_tcbinfo); =20 - TW_WLOCK(V_tw_lock); - tw =3D TAILQ_FIRST(&V_twq_2msl); - if (tw =3D=3D NULL) { - TW_WUNLOCK(V_tw_lock); - return NULL; - } - TW_WUNLOCK(V_tw_lock); + for (;;) { + TW_RLOCK(V_tw_lock); + tw =3D TAILQ_FIRST(&V_twq_2msl); + if (tw =3D=3D NULL) { + TW_RUNLOCK(V_tw_lock); + break; + } + KASSERT(tw->tw_inpcb !=3D NULL, ("%s: tw->tw_inpcb =3D=3D NULL", + __func__)); =20 - INP_WLOCK(tw->tw_inpcb); - tcp_twclose(tw, 1); + inp =3D tw->tw_inpcb; + in_pcbref(inp); + TW_RUNLOCK(V_tw_lock); + + INP_WLOCK(inp); + tw =3D intotw(inp); + if (in_pcbrele_wlocked(inp)) { + KASSERT(tw =3D=3D NULL, ("%s: held last inp reference but " + "tw not NULL", __func__)); + continue; + } + + if (tw =3D=3D NULL) { + /* tcp_twclose() has already been called */ + INP_WUNLOCK(inp); + continue; + } + + tcp_twclose(tw, 1); + break; + } =20 return (tw); } @@ -679,6 +699,7 @@ void tcp_tw_2msl_scan(void) { struct tcptw *tw; + struct inpcb *inp; =20 for (;;) { TW_RLOCK(V_tw_lock); @@ -687,24 +708,38 @@ tcp_tw_2msl_scan(void) TW_RUNLOCK(V_tw_lock); break; } - tw_pcbref(tw); + KASSERT(tw->tw_inpcb !=3D NULL, ("%s: tw->tw_inpcb =3D=3D NULL", + __func__)); + + inp =3D tw->tw_inpcb; + in_pcbref(inp); TW_RUNLOCK(V_tw_lock); =20 - /* Close timewait state */ if (INP_INFO_TRY_WLOCK(&V_tcbinfo)) { - if (tw_pcbrele(tw)) { + + INP_WLOCK(inp); + tw =3D intotw(inp); + if (in_pcbrele_wlocked(inp)) { + KASSERT(tw =3D=3D NULL, ("%s: held last inp " + "reference but tw not NULL", __func__)); + INP_INFO_WUNLOCK(&V_tcbinfo); + continue; + } + + if (tw =3D=3D NULL) { + /* tcp_twclose() has already been called */ + INP_WUNLOCK(inp); INP_INFO_WUNLOCK(&V_tcbinfo); continue; } =20 - KASSERT(tw->tw_inpcb !=3D NULL, - ("%s: tw->tw_inpcb =3D=3D NULL", __func__)); - INP_WLOCK(tw->tw_inpcb); tcp_twclose(tw, 0); INP_INFO_WUNLOCK(&V_tcbinfo); } else { - /* INP_INFO lock is busy; continue later. */ - tw_pcbrele(tw); + /* INP_INFO lock is busy, continue later. */ + INP_WLOCK(inp); + if (!in_pcbrele_wlocked(inp)) + INP_WUNLOCK(inp); break; } } diff --git a/sys/netinet/tcp_usrreq.c b/sys/netinet/tcp_usrreq.c index 22a160c..908afa2 100644 --- a/sys/netinet/tcp_usrreq.c +++ b/sys/netinet/tcp_usrreq.c @@ -183,6 +183,21 @@ tcp_detach(struct socket *so, struct inpcb *inp) * present until timewait ends. * * XXXRW: Would it be cleaner to free the tcptw here? + * + * Astute question indeed, from twtcp perspective there are + * three cases to consider: + * + * #1 tcp_detach is called at tcptw creation time by + * tcp_twstart, then do not discard the newly created tcptw + * and leave inpcb present until timewait ends + * #2 tcp_detach is called at timewait end (or reuse) by + * tcp_twclose, then the tcptw has already been discarded + * and inpcb is freed here + * #3 tcp_detach is called() after timewait ends (or reuse) + * (e.g. by soclose), then tcptw has already been discarded + * and inpcb is freed here + * + * In all three cases the tcptw should not be freed here. */ if (inp->inp_flags & INP_DROPPED) { KASSERT(tp =3D=3D NULL, ("tcp_detach: INP_TIMEWAIT && " diff --git a/sys/netinet/tcp_var.h b/sys/netinet/tcp_var.h index c2298fc..93e1b62 100644 --- a/sys/netinet/tcp_var.h +++ b/sys/netinet/tcp_var.h @@ -349,7 +349,6 @@ struct tcptw { u_int t_starttime; int tw_time; TAILQ_ENTRY(tcptw) tw_2msl; - u_int tw_refcount; /* refcount */ }; =20 #define intotcpcb(ip) ((struct tcpcb *)(ip)->inp_ppcb) --------------090402000607020404040507 Content-Type: text/plain; charset=UTF-8; name="tcp-scale-pcbinfo.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="tcp-scale-pcbinfo.patch" =46rom f4c11c6b678195515bbab8bb2825fa5222ed3a58 Mon Sep 17 00:00:00 2001 From: Julien Charbon Date: Fri, 28 Mar 2014 15:36:52 +0100 Subject: [PATCH] [tcp-scale] Introduce the INP_LIST global mutex for protecting pcbinfo global structures. Then use INP_INFO_RLOCK in critic= al paths to increase TCP processing parallelism, and use INP_INFO_WLOCK in = full INPs iteration loops. Julien's review: - Fix INP_INFO_WLOCK assertions - Fixing comments - Rebased on svn path=3D/head/; revision=3D272099 --- sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c | 28 +++++----- sys/dev/cxgb/ulp/tom/cxgb_listen.c | 14 ++--- sys/dev/cxgbe/tom/t4_connect.c | 4 +- sys/dev/cxgbe/tom/t4_cpl_io.c | 20 +++---- sys/dev/cxgbe/tom/t4_listen.c | 10 ++-- sys/netinet/in_pcb.c | 43 ++++++++++++--- sys/netinet/in_pcb.h | 73 ++++++++++++++++++------- sys/netinet/tcp_input.c | 108 ++++++++++++++++++-------------= ------ sys/netinet/tcp_subr.c | 40 +++++++------- sys/netinet/tcp_syncache.c | 4 +- sys/netinet/tcp_timer.c | 40 +++++++------- sys/netinet/tcp_timewait.c | 24 ++++----- sys/netinet/tcp_usrreq.c | 40 +++++++------- sys/netinet/toecore.c | 6 +-- sys/netinet6/in6_pcb.c | 4 +- 15 files changed, 261 insertions(+), 197 deletions(-) diff --git a/sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c b/sys/dev/cxgb/ulp/tom/cx= gb_cpl_io.c index a86bf72..f28c83d 100644 --- a/sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c +++ b/sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c @@ -639,7 +639,7 @@ t3_send_fin(struct toedev *tod, struct tcpcb *tp) unsigned int tid =3D toep->tp_tid; #endif =20 - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); =20 CTR4(KTR_CXGB, "%s: tid %d, toep %p, flags %x", __func__, tid, toep, @@ -925,12 +925,12 @@ do_act_open_rpl(struct sge_qset *qs, struct rsp_des= c *r, struct mbuf *m) =20 rc =3D act_open_rpl_status_to_errno(s); if (rc !=3D EAGAIN) - INP_INFO_WLOCK(&V_tcbinfo); + INP_INFO_RLOCK(&V_tcbinfo); INP_WLOCK(inp); toe_connect_failed(tod, inp, rc); toepcb_release(toep); /* unlocks inp */ if (rc !=3D EAGAIN) - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); =20 m_freem(m); return (0); @@ -1061,7 +1061,7 @@ send_reset(struct toepcb *toep) struct adapter *sc =3D tod->tod_softc; struct mbuf *m; =20 - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); =20 CTR4(KTR_CXGB, "%s: tid %d, toep %p (%x)", __func__, tid, toep, @@ -1172,12 +1172,12 @@ do_rx_data(struct sge_qset *qs, struct rsp_desc *= r, struct mbuf *m) SOCKBUF_UNLOCK(so_rcv); INP_WUNLOCK(inp); =20 - INP_INFO_WLOCK(&V_tcbinfo); + INP_INFO_RLOCK(&V_tcbinfo); INP_WLOCK(inp); tp =3D tcp_drop(tp, ECONNRESET); if (tp) INP_WUNLOCK(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); =20 m_freem(m); return (0); @@ -1222,7 +1222,7 @@ do_peer_close(struct sge_qset *qs, struct rsp_desc = *r, struct mbuf *m) struct tcpcb *tp; struct socket *so; =20 - INP_INFO_WLOCK(&V_tcbinfo); + INP_INFO_RLOCK(&V_tcbinfo); INP_WLOCK(inp); tp =3D intotcpcb(inp); =20 @@ -1250,7 +1250,7 @@ do_peer_close(struct sge_qset *qs, struct rsp_desc = *r, struct mbuf *m) case TCPS_FIN_WAIT_2: tcp_twstart(tp); INP_UNLOCK_ASSERT(inp); /* safe, we have a ref on the inp */ - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); =20 INP_WLOCK(inp); toepcb_release(toep); /* no more CPLs expected */ @@ -1264,7 +1264,7 @@ do_peer_close(struct sge_qset *qs, struct rsp_desc = *r, struct mbuf *m) =20 done: INP_WUNLOCK(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); =20 m_freem(m); return (0); @@ -1285,7 +1285,7 @@ do_close_con_rpl(struct sge_qset *qs, struct rsp_de= sc *r, struct mbuf *m) struct tcpcb *tp; struct socket *so; =20 - INP_INFO_WLOCK(&V_tcbinfo); + INP_INFO_RLOCK(&V_tcbinfo); INP_WLOCK(inp); tp =3D intotcpcb(inp); =20 @@ -1303,7 +1303,7 @@ do_close_con_rpl(struct sge_qset *qs, struct rsp_de= sc *r, struct mbuf *m) tcp_twstart(tp); release: INP_UNLOCK_ASSERT(inp); /* safe, we have a ref on the inp */ - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); =20 INP_WLOCK(inp); toepcb_release(toep); /* no more CPLs expected */ @@ -1328,7 +1328,7 @@ do_close_con_rpl(struct sge_qset *qs, struct rsp_de= sc *r, struct mbuf *m) =20 done: INP_WUNLOCK(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); =20 m_freem(m); return (0); @@ -1489,7 +1489,7 @@ do_abort_req(struct sge_qset *qs, struct rsp_desc *= r, struct mbuf *m) return (do_abort_req_synqe(qs, r, m)); =20 inp =3D toep->tp_inp; - INP_INFO_WLOCK(&V_tcbinfo); /* for tcp_close */ + INP_INFO_RLOCK(&V_tcbinfo); /* for tcp_close */ INP_WLOCK(inp); =20 tp =3D intotcpcb(inp); @@ -1523,7 +1523,7 @@ do_abort_req(struct sge_qset *qs, struct rsp_desc *= r, struct mbuf *m) INP_WLOCK(inp); /* re-acquire */ toepcb_release(toep); /* no more CPLs expected */ } - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); =20 send_abort_rpl(tod, tid, qset); m_freem(m); diff --git a/sys/dev/cxgb/ulp/tom/cxgb_listen.c b/sys/dev/cxgb/ulp/tom/cx= gb_listen.c index 94a219b..631899d 100644 --- a/sys/dev/cxgb/ulp/tom/cxgb_listen.c +++ b/sys/dev/cxgb/ulp/tom/cxgb_listen.c @@ -554,11 +554,11 @@ do_pass_accept_req(struct sge_qset *qs, struct rsp_= desc *r, struct mbuf *m) REJECT_PASS_ACCEPT(); /* no l2te, or ifp mismatch */ } =20 - INP_INFO_WLOCK(&V_tcbinfo); + INP_INFO_RLOCK(&V_tcbinfo); =20 /* Don't offload if the 4-tuple is already in use */ if (toe_4tuple_check(&inc, &th, ifp) !=3D 0) { - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); REJECT_PASS_ACCEPT(); } =20 @@ -571,7 +571,7 @@ do_pass_accept_req(struct sge_qset *qs, struct rsp_de= sc *r, struct mbuf *m) * resources tied to this listen context. */ INP_WUNLOCK(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); REJECT_PASS_ACCEPT(); } so =3D inp->inp_socket; @@ -713,7 +713,7 @@ do_pass_establish(struct sge_qset *qs, struct rsp_des= c *r, struct mbuf *m) KASSERT(qs->idx =3D=3D synqe->qset, ("%s qset mismatch %d %d", __func__, qs->idx, synqe->qset)); =20 - INP_INFO_WLOCK(&V_tcbinfo); /* for syncache_expand */ + INP_INFO_RLOCK(&V_tcbinfo); /* for syncache_expand */ INP_WLOCK(inp); =20 if (__predict_false(inp->inp_flags & INP_DROPPED)) { @@ -727,7 +727,7 @@ do_pass_establish(struct sge_qset *qs, struct rsp_des= c *r, struct mbuf *m) ("%s: listen socket dropped but tid %u not aborted.", __func__, tid)); INP_WUNLOCK(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); m_freem(m); return (0); } @@ -743,7 +743,7 @@ do_pass_establish(struct sge_qset *qs, struct rsp_des= c *r, struct mbuf *m) reset: t3_send_reset_synqe(tod, synqe); INP_WUNLOCK(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); m_freem(m); return (0); } @@ -775,7 +775,7 @@ do_pass_establish(struct sge_qset *qs, struct rsp_des= c *r, struct mbuf *m) inp =3D release_lctx(td, lctx); if (inp) INP_WUNLOCK(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); release_synqe(synqe); =20 m_freem(m); diff --git a/sys/dev/cxgbe/tom/t4_connect.c b/sys/dev/cxgbe/tom/t4_connec= t.c index 9973fa5..718f62a 100644 --- a/sys/dev/cxgbe/tom/t4_connect.c +++ b/sys/dev/cxgbe/tom/t4_connect.c @@ -208,12 +208,12 @@ do_act_open_rpl(struct sge_iq *iq, const struct rss= _header *rss, =20 rc =3D act_open_rpl_status_to_errno(status); if (rc !=3D EAGAIN) - INP_INFO_WLOCK(&V_tcbinfo); + INP_INFO_RLOCK(&V_tcbinfo); INP_WLOCK(inp); toe_connect_failed(tod, inp, rc); final_cpl_received(toep); /* unlocks inp */ if (rc !=3D EAGAIN) - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); =20 return (0); } diff --git a/sys/dev/cxgbe/tom/t4_cpl_io.c b/sys/dev/cxgbe/tom/t4_cpl_io.= c index f18e0c7..f0e9b0a 100644 --- a/sys/dev/cxgbe/tom/t4_cpl_io.c +++ b/sys/dev/cxgbe/tom/t4_cpl_io.c @@ -1059,7 +1059,7 @@ do_peer_close(struct sge_iq *iq, const struct rss_h= eader *rss, struct mbuf *m) =20 KASSERT(toep->tid =3D=3D tid, ("%s: toep tid mismatch", __func__)); =20 - INP_INFO_WLOCK(&V_tcbinfo); + INP_INFO_RLOCK(&V_tcbinfo); INP_WLOCK(inp); tp =3D intotcpcb(inp); =20 @@ -1113,7 +1113,7 @@ do_peer_close(struct sge_iq *iq, const struct rss_h= eader *rss, struct mbuf *m) case TCPS_FIN_WAIT_2: tcp_twstart(tp); INP_UNLOCK_ASSERT(inp); /* safe, we have a ref on the inp */ - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); =20 INP_WLOCK(inp); final_cpl_received(toep); @@ -1125,7 +1125,7 @@ do_peer_close(struct sge_iq *iq, const struct rss_h= eader *rss, struct mbuf *m) } done: INP_WUNLOCK(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); return (0); } =20 @@ -1152,7 +1152,7 @@ do_close_con_rpl(struct sge_iq *iq, const struct rs= s_header *rss, KASSERT(m =3D=3D NULL, ("%s: wasn't expecting payload", __func__)); KASSERT(toep->tid =3D=3D tid, ("%s: toep tid mismatch", __func__)); =20 - INP_INFO_WLOCK(&V_tcbinfo); + INP_INFO_RLOCK(&V_tcbinfo); INP_WLOCK(inp); tp =3D intotcpcb(inp); =20 @@ -1170,7 +1170,7 @@ do_close_con_rpl(struct sge_iq *iq, const struct rs= s_header *rss, tcp_twstart(tp); release: INP_UNLOCK_ASSERT(inp); /* safe, we have a ref on the inp */ - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); =20 INP_WLOCK(inp); final_cpl_received(toep); /* no more CPLs expected */ @@ -1194,7 +1194,7 @@ do_close_con_rpl(struct sge_iq *iq, const struct rs= s_header *rss, } done: INP_WUNLOCK(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); return (0); } =20 @@ -1353,7 +1353,7 @@ do_abort_req(struct sge_iq *iq, const struct rss_he= ader *rss, struct mbuf *m) } =20 inp =3D toep->inp; - INP_INFO_WLOCK(&V_tcbinfo); /* for tcp_close */ + INP_INFO_RLOCK(&V_tcbinfo); /* for tcp_close */ INP_WLOCK(inp); =20 tp =3D intotcpcb(inp); @@ -1387,7 +1387,7 @@ do_abort_req(struct sge_iq *iq, const struct rss_he= ader *rss, struct mbuf *m) =20 final_cpl_received(toep); done: - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); send_abort_rpl(sc, ofld_txq, tid, CPL_ABORT_NO_RST); return (0); } @@ -1501,12 +1501,12 @@ do_rx_data(struct sge_iq *iq, const struct rss_he= ader *rss, struct mbuf *m) SOCKBUF_UNLOCK(sb); INP_WUNLOCK(inp); =20 - INP_INFO_WLOCK(&V_tcbinfo); + INP_INFO_RLOCK(&V_tcbinfo); INP_WLOCK(inp); tp =3D tcp_drop(tp, ECONNRESET); if (tp) INP_WUNLOCK(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); =20 return (0); } diff --git a/sys/dev/cxgbe/tom/t4_listen.c b/sys/dev/cxgbe/tom/t4_listen.= c index 4380c9e..0faac12 100644 --- a/sys/dev/cxgbe/tom/t4_listen.c +++ b/sys/dev/cxgbe/tom/t4_listen.c @@ -1311,15 +1311,15 @@ do_pass_accept_req(struct sge_iq *iq, const struc= t rss_header *rss, REJECT_PASS_ACCEPT(); rpl =3D wrtod(wr); =20 - INP_INFO_WLOCK(&V_tcbinfo); /* for 4-tuple check */ + INP_INFO_RLOCK(&V_tcbinfo); /* for 4-tuple check */ =20 /* Don't offload if the 4-tuple is already in use */ if (toe_4tuple_check(&inc, &th, ifp) !=3D 0) { - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); free(wr, M_CXGBE); REJECT_PASS_ACCEPT(); } - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); =20 inp =3D lctx->inp; /* listening socket, not owned by TOE */ INP_WLOCK(inp); @@ -1511,7 +1511,7 @@ do_pass_establish(struct sge_iq *iq, const struct r= ss_header *rss, KASSERT(synqe->flags & TPF_SYNQE, ("%s: tid %u (ctx %p) not a synqe", __func__, tid, synqe)); =20 - INP_INFO_WLOCK(&V_tcbinfo); /* for syncache_expand */ + INP_INFO_RLOCK(&V_tcbinfo); /* for syncache_expand */ INP_WLOCK(inp); =20 CTR6(KTR_CXGBE, @@ -1609,7 +1609,7 @@ do_pass_establish(struct sge_iq *iq, const struct r= ss_header *rss, inp =3D release_lctx(sc, lctx); if (inp !=3D NULL) INP_WUNLOCK(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); release_synqe(synqe); =20 return (0); diff --git a/sys/netinet/in_pcb.c b/sys/netinet/in_pcb.c index a0a1b30..b91c5ad 100644 --- a/sys/netinet/in_pcb.c +++ b/sys/netinet/in_pcb.c @@ -218,6 +218,7 @@ in_pcbinfo_init(struct inpcbinfo *pcbinfo, const char= *name, =20 INP_INFO_LOCK_INIT(pcbinfo, name); INP_HASH_LOCK_INIT(pcbinfo, "pcbinfohash"); /* XXXRW: argument? */ + INP_LIST_LOCK_INIT(pcbinfo, "pcbinfolist"); #ifdef VIMAGE pcbinfo->ipi_vnet =3D curvnet; #endif @@ -256,6 +257,7 @@ in_pcbinfo_destroy(struct inpcbinfo *pcbinfo) in_pcbgroup_destroy(pcbinfo); #endif uma_zdestroy(pcbinfo->ipi_zone); + INP_LIST_LOCK_DESTROY(pcbinfo); INP_HASH_LOCK_DESTROY(pcbinfo); INP_INFO_LOCK_DESTROY(pcbinfo); } @@ -270,7 +272,14 @@ in_pcballoc(struct socket *so, struct inpcbinfo *pcb= info) struct inpcb *inp; int error; =20 - INP_INFO_WLOCK_ASSERT(pcbinfo); +#ifdef INVARIANTS + if (pcbinfo =3D=3D &V_tcbinfo) { + INP_INFO_RLOCK_ASSERT(pcbinfo); + } else { + INP_INFO_WLOCK_ASSERT(pcbinfo); + } +#endif + error =3D 0; inp =3D uma_zalloc(pcbinfo->ipi_zone, M_NOWAIT); if (inp =3D=3D NULL) @@ -302,6 +311,8 @@ in_pcballoc(struct socket *so, struct inpcbinfo *pcbi= nfo) inp->inp_flags |=3D IN6P_IPV6_V6ONLY; } #endif + INP_WLOCK(inp); + INP_LIST_WLOCK(pcbinfo); LIST_INSERT_HEAD(pcbinfo->ipi_listhead, inp, inp_list); pcbinfo->ipi_count++; so->so_pcb =3D (caddr_t)inp; @@ -309,9 +320,9 @@ in_pcballoc(struct socket *so, struct inpcbinfo *pcbi= nfo) if (V_ip6_auto_flowlabel) inp->inp_flags |=3D IN6P_AUTOFLOWLABEL; #endif - INP_WLOCK(inp); inp->inp_gencnt =3D ++pcbinfo->ipi_gencnt; refcount_init(&inp->inp_refcount, 1); /* Reference from inpcbinfo */ + INP_LIST_WUNLOCK(pcbinfo); #if defined(IPSEC) || defined(MAC) out: if (error !=3D 0) { @@ -1239,7 +1250,13 @@ in_pcbfree(struct inpcb *inp) =20 KASSERT(inp->inp_socket =3D=3D NULL, ("%s: inp_socket !=3D NULL", __fun= c__)); =20 - INP_INFO_WLOCK_ASSERT(pcbinfo); +#ifdef INVARIANTS + if (pcbinfo =3D=3D &V_tcbinfo) { + INP_INFO_RLOCK_ASSERT(pcbinfo); + } else { + INP_INFO_WLOCK_ASSERT(pcbinfo); + } +#endif INP_WLOCK_ASSERT(inp); =20 /* XXXRW: Do as much as possible here. */ @@ -1247,8 +1264,10 @@ in_pcbfree(struct inpcb *inp) if (inp->inp_sp !=3D NULL) ipsec_delete_pcbpolicy(inp); #endif + INP_LIST_WLOCK(pcbinfo); inp->inp_gencnt =3D ++pcbinfo->ipi_gencnt; in_pcbremlists(inp); + INP_LIST_WUNLOCK(pcbinfo); #ifdef INET6 if (inp->inp_vflag & INP_IPV6PROTO) { ip6_freepcbopts(inp->in6p_outputopts); @@ -1405,7 +1424,7 @@ in_pcbpurgeif0(struct inpcbinfo *pcbinfo, struct if= net *ifp) struct ip_moptions *imo; int i, gap; =20 - INP_INFO_RLOCK(pcbinfo); + INP_INFO_WLOCK(pcbinfo); LIST_FOREACH(inp, pcbinfo->ipi_listhead, inp_list) { INP_WLOCK(inp); imo =3D inp->inp_moptions; @@ -1435,7 +1454,7 @@ in_pcbpurgeif0(struct inpcbinfo *pcbinfo, struct if= net *ifp) } INP_WUNLOCK(inp); } - INP_INFO_RUNLOCK(pcbinfo); + INP_INFO_WUNLOCK(pcbinfo); } =20 /* @@ -2171,8 +2190,16 @@ in_pcbremlists(struct inpcb *inp) { struct inpcbinfo *pcbinfo =3D inp->inp_pcbinfo; =20 - INP_INFO_WLOCK_ASSERT(pcbinfo); +#ifdef INVARIANTS + if (pcbinfo =3D=3D &V_tcbinfo) { + INP_INFO_RLOCK_ASSERT(pcbinfo); + } else { + INP_INFO_WLOCK_ASSERT(pcbinfo); + } +#endif + INP_WLOCK_ASSERT(inp); + INP_LIST_WLOCK_ASSERT(pcbinfo); =20 inp->inp_gencnt =3D ++pcbinfo->ipi_gencnt; if (inp->inp_flags & INP_INHASHLIST) { @@ -2317,13 +2344,13 @@ inp_apply_all(void (*func)(struct inpcb *, void *= ), void *arg) { struct inpcb *inp; =20 - INP_INFO_RLOCK(&V_tcbinfo); + INP_INFO_WLOCK(&V_tcbinfo); LIST_FOREACH(inp, V_tcbinfo.ipi_listhead, inp_list) { INP_WLOCK(inp); func(inp, arg); INP_WUNLOCK(inp); } - INP_INFO_RUNLOCK(&V_tcbinfo); + INP_INFO_WUNLOCK(&V_tcbinfo); } =20 struct socket * diff --git a/sys/netinet/in_pcb.h b/sys/netinet/in_pcb.h index 185bcfb..661d7b8 100644 --- a/sys/netinet/in_pcb.h +++ b/sys/netinet/in_pcb.h @@ -134,19 +134,20 @@ struct icmp6_filter; * and IPv6 sockets. In the case of TCP, further per-connection state i= s * hung off of inp_ppcb most of the time. Almost all fields of struct i= npcb * are static after creation or protected by a per-inpcb rwlock, inp_loc= k. A - * few fields also require the global pcbinfo lock for the inpcb to be h= eld, - * when modified, such as the global connection lists and hashes, as wel= l as - * binding information (which affects which hash a connection is on). T= his - * model means that connections can be looked up without holding the - * per-connection lock, which is important for performance when attempti= ng to - * find the connection for a packet given its IP and port tuple. Writin= g to - * these fields that write locks be held on both the inpcb and global lo= cks. + * few fields also require the global pcblist lock for the inpcb to be h= eld, + * when modified, such as the global connection lists. This model means= that + * connections can be looked up without holding the per-connection lock,= which + * is important for performance when attempting to find the connection f= or a + * packet given its IP and port tuple. Writing to these fields that wri= te + * locks be held on both the inpcb and global locks. * * Key: * (c) - Constant after initialization * (g) - Protected by the pcbgroup lock * (i) - Protected by the inpcb lock * (p) - Protected by the pcbinfo lock for the inpcb + * (l) - Protected by the pcblist lock for the inpcb + * (h) - Protected by the pcbhash lock for the inpcb * (s) - Protected by another subsystem's locks * (x) - Undefined locking * @@ -163,13 +164,13 @@ struct icmp6_filter; * The inp_vflag field is overloaded, and would otherwise ideally be (c)= =2E */ struct inpcb { - LIST_ENTRY(inpcb) inp_hash; /* (i/p) hash list */ + LIST_ENTRY(inpcb) inp_hash; /* (i/h) hash list */ LIST_ENTRY(inpcb) inp_pcbgrouphash; /* (g/i) hash list */ - LIST_ENTRY(inpcb) inp_list; /* (i/p) list for all PCBs for proto */ + LIST_ENTRY(inpcb) inp_list; /* (i/l) list for all PCBs for proto */ void *inp_ppcb; /* (i) pointer to per-protocol pcb */ struct inpcbinfo *inp_pcbinfo; /* (c) PCB list info */ struct inpcbgroup *inp_pcbgroup; /* (g/i) PCB group list */ - LIST_ENTRY(inpcb) inp_pcbgroup_wild; /* (g/i/p) group wildcard entry */= + LIST_ENTRY(inpcb) inp_pcbgroup_wild; /* (g/i/h) group wildcard entry */= struct socket *inp_socket; /* (i) back pointer to socket */ struct ucred *inp_cred; /* (c) cache of socket cred */ u_int32_t inp_flow; /* (i) IPv6 flow information */ @@ -188,7 +189,7 @@ struct inpcb { * general use */ =20 /* Local and foreign ports, local and foreign addr. */ - struct in_conninfo inp_inc; /* (i/p) list for PCB's local port */ + struct in_conninfo inp_inc; /* (i) list for PCB's local port */ =20 /* MAC and IPSEC policy information. */ struct label *inp_label; /* (i) MAC label */ @@ -213,8 +214,8 @@ struct inpcb { int inp6_cksum; short inp6_hops; } inp_depend6; - LIST_ENTRY(inpcb) inp_portlist; /* (i/p) */ - struct inpcbport *inp_phd; /* (i/p) head of this list */ + LIST_ENTRY(inpcb) inp_portlist; /* (i/h) */ + struct inpcbport *inp_phd; /* (i/h) head of this list */ #define inp_zero_size offsetof(struct inpcb, inp_gencnt) inp_gen_t inp_gencnt; /* (c) generation count */ struct llentry *inp_lle; /* cached L2 information */ @@ -279,16 +280,24 @@ struct inpcbport { * Global data structure for each high-level protocol (UDP, TCP, ...) in= both * IPv4 and IPv6. Holds inpcb lists and information for managing them. * - * Each pcbinfo is protected by two locks: ipi_lock and ipi_hash_lock, - * the former covering mutable global fields (such as the global pcb lis= t), - * and the latter covering the hashed lookup tables. The lock order is:= + * Each pcbinfo is protected by three locks: ipi_lock, ipi_hash_lock and= + * ipi_list_lock: + * - ipi_lock covering the global pcb list stability during loop iterat= ion, + * - ipi_hash_lock covering the hashed lookup tables, + * - ipi_list_lock covering mutable global fields (such as the global + * pcb list) * - * ipi_lock (before) inpcb locks (before) {ipi_hash_lock, pcbgroup lo= cks} + * The lock order is: + * + * ipi_lock (before) + * inpcb locks (before) + * {ipi_hash_lock, ipi_list_lock, pcbgroup locks} * * Locking key: * * (c) Constant or nearly constant after initialisation * (g) Locked by ipi_lock + * (l) Locked by ipi_list_lock * (h) Read using either ipi_hash_lock or inpcb lock; write requires bot= h * (p) Protected by one or more pcbgroup locks * (x) Synchronisation properties poorly defined @@ -302,14 +311,14 @@ struct inpcbinfo { /* * Global list of inpcbs on the protocol. */ - struct inpcbhead *ipi_listhead; /* (g) */ - u_int ipi_count; /* (g) */ + struct inpcbhead *ipi_listhead; /* (g/l) */ + u_int ipi_count; /* (g/l) */ =20 /* * Generation count -- incremented each time a connection is allocated * or freed. */ - u_quad_t ipi_gencnt; /* (g) */ + u_quad_t ipi_gencnt; /* (g/l) */ =20 /* * Fields associated with port lookup and allocation. @@ -367,6 +376,11 @@ struct inpcbinfo { * general use 2 */ void *ipi_pspare[2]; + + /* + * Global lock protecting global inpcb list, inpcb count, etc. + */ + struct rwlock ipi_list_lock; }; =20 #ifdef _KERNEL @@ -466,6 +480,25 @@ short inp_so_options(const struct inpcb *inp); #define INP_INFO_WLOCK_ASSERT(ipi) rw_assert(&(ipi)->ipi_lock, RA_WLOCKE= D) #define INP_INFO_UNLOCK_ASSERT(ipi) rw_assert(&(ipi)->ipi_lock, RA_UNLOC= KED) =20 +#define INP_LIST_LOCK_INIT(ipi, d) \ + rw_init_flags(&(ipi)->ipi_list_lock, (d), 0) +#define INP_LIST_LOCK_DESTROY(ipi) rw_destroy(&(ipi)->ipi_list_lock) +#define INP_LIST_RLOCK(ipi) rw_rlock(&(ipi)->ipi_list_lock) +#define INP_LIST_WLOCK(ipi) rw_wlock(&(ipi)->ipi_list_lock) +#define INP_LIST_TRY_RLOCK(ipi) rw_try_rlock(&(ipi)->ipi_list_lock) +#define INP_LIST_TRY_WLOCK(ipi) rw_try_wlock(&(ipi)->ipi_list_lock) +#define INP_LIST_TRY_UPGRADE(ipi) rw_try_upgrade(&(ipi)->ipi_list_= lock) +#define INP_LIST_RUNLOCK(ipi) rw_runlock(&(ipi)->ipi_list_lock) +#define INP_LIST_WUNLOCK(ipi) rw_wunlock(&(ipi)->ipi_list_lock) +#define INP_LIST_LOCK_ASSERT(ipi) \ + rw_assert(&(ipi)->ipi_list_lock, RA_LOCKED) +#define INP_LIST_RLOCK_ASSERT(ipi) \ + rw_assert(&(ipi)->ipi_list_lock, RA_RLOCKED) +#define INP_LIST_WLOCK_ASSERT(ipi) \ + rw_assert(&(ipi)->ipi_list_lock, RA_WLOCKED) +#define INP_LIST_UNLOCK_ASSERT(ipi) \ + rw_assert(&(ipi)->ipi_list_lock, RA_UNLOCKED) + #define INP_HASH_LOCK_INIT(ipi, d) \ rw_init_flags(&(ipi)->ipi_hash_lock, (d), 0) #define INP_HASH_LOCK_DESTROY(ipi) rw_destroy(&(ipi)->ipi_hash_lock) diff --git a/sys/netinet/tcp_input.c b/sys/netinet/tcp_input.c index e338b1e..8090303 100644 --- a/sys/netinet/tcp_input.c +++ b/sys/netinet/tcp_input.c @@ -571,7 +571,7 @@ tcp_input(struct mbuf **mp, int *offp, int proto) char *s =3D NULL; /* address and port logging */ int ti_locked; #define TI_UNLOCKED 1 -#define TI_WLOCKED 2 +#define TI_RLOCKED 2 =20 #ifdef TCPDEBUG /* @@ -760,8 +760,8 @@ tcp_input(struct mbuf **mp, int *offp, int proto) * connection in TIMEWAIT and SYNs not targeting a listening socket. */ if ((thflags & (TH_FIN | TH_RST)) !=3D 0) { - INP_INFO_WLOCK(&V_tcbinfo); - ti_locked =3D TI_WLOCKED; + INP_INFO_RLOCK(&V_tcbinfo); + ti_locked =3D TI_RLOCKED; } else ti_locked =3D TI_UNLOCKED; =20 @@ -783,8 +783,8 @@ tcp_input(struct mbuf **mp, int *offp, int proto) =20 findpcb: #ifdef INVARIANTS - if (ti_locked =3D=3D TI_WLOCKED) { - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + if (ti_locked =3D=3D TI_RLOCKED) { + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); } else { INP_INFO_UNLOCK_ASSERT(&V_tcbinfo); } @@ -936,20 +936,20 @@ tcp_input(struct mbuf **mp, int *offp, int proto) relocked: if (inp->inp_flags & INP_TIMEWAIT) { if (ti_locked =3D=3D TI_UNLOCKED) { - if (INP_INFO_TRY_WLOCK(&V_tcbinfo) =3D=3D 0) { + if (INP_INFO_TRY_RLOCK(&V_tcbinfo) =3D=3D 0) { in_pcbref(inp); INP_WUNLOCK(inp); - INP_INFO_WLOCK(&V_tcbinfo); - ti_locked =3D TI_WLOCKED; + INP_INFO_RLOCK(&V_tcbinfo); + ti_locked =3D TI_RLOCKED; INP_WLOCK(inp); if (in_pcbrele_wlocked(inp)) { inp =3D NULL; goto findpcb; } } else - ti_locked =3D TI_WLOCKED; + ti_locked =3D TI_RLOCKED; } - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); =20 if (thflags & TH_SYN) tcp_dooptions(&to, optp, optlen, TO_SYN); @@ -958,7 +958,7 @@ tcp_input(struct mbuf **mp, int *offp, int proto) */ if (tcp_twcheck(inp, &to, th, m, tlen)) goto findpcb; - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); return (IPPROTO_DONE); } /* @@ -989,16 +989,16 @@ tcp_input(struct mbuf **mp, int *offp, int proto) */ #ifdef INVARIANTS if ((thflags & (TH_FIN | TH_RST)) !=3D 0) - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); #endif if (!((tp->t_state =3D=3D TCPS_ESTABLISHED && (thflags & TH_SYN) =3D=3D= 0) || (tp->t_state =3D=3D TCPS_LISTEN && (thflags & TH_SYN)))) { if (ti_locked =3D=3D TI_UNLOCKED) { - if (INP_INFO_TRY_WLOCK(&V_tcbinfo) =3D=3D 0) { + if (INP_INFO_TRY_RLOCK(&V_tcbinfo) =3D=3D 0) { in_pcbref(inp); INP_WUNLOCK(inp); - INP_INFO_WLOCK(&V_tcbinfo); - ti_locked =3D TI_WLOCKED; + INP_INFO_RLOCK(&V_tcbinfo); + ti_locked =3D TI_RLOCKED; INP_WLOCK(inp); if (in_pcbrele_wlocked(inp)) { inp =3D NULL; @@ -1006,9 +1006,9 @@ tcp_input(struct mbuf **mp, int *offp, int proto) } goto relocked; } else - ti_locked =3D TI_WLOCKED; + ti_locked =3D TI_RLOCKED; } - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); } =20 #ifdef MAC @@ -1063,7 +1063,7 @@ tcp_input(struct mbuf **mp, int *offp, int proto) */ if ((thflags & (TH_RST|TH_ACK|TH_SYN)) =3D=3D TH_ACK) { =20 - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); /* * Parse the TCP options here because * syncookies need access to the reflected @@ -1346,8 +1346,8 @@ tcp_input(struct mbuf **mp, int *offp, int proto) * Entry added to syncache and mbuf consumed. * Only the listen socket is unlocked by syncache_add(). */ - if (ti_locked =3D=3D TI_WLOCKED) { - INP_INFO_WUNLOCK(&V_tcbinfo); + if (ti_locked =3D=3D TI_RLOCKED) { + INP_INFO_RUNLOCK(&V_tcbinfo); ti_locked =3D TI_UNLOCKED; } INP_INFO_UNLOCK_ASSERT(&V_tcbinfo); @@ -1396,8 +1396,8 @@ tcp_input(struct mbuf **mp, int *offp, int proto) dropwithreset: TCP_PROBE5(receive, NULL, tp, mtod(m, const char *), tp, th); =20 - if (ti_locked =3D=3D TI_WLOCKED) { - INP_INFO_WUNLOCK(&V_tcbinfo); + if (ti_locked =3D=3D TI_RLOCKED) { + INP_INFO_RUNLOCK(&V_tcbinfo); ti_locked =3D TI_UNLOCKED; } #ifdef INVARIANTS @@ -1420,8 +1420,8 @@ tcp_input(struct mbuf **mp, int *offp, int proto) if (m !=3D NULL) TCP_PROBE5(receive, NULL, tp, mtod(m, const char *), tp, th); =20 - if (ti_locked =3D=3D TI_WLOCKED) { - INP_INFO_WUNLOCK(&V_tcbinfo); + if (ti_locked =3D=3D TI_RLOCKED) { + INP_INFO_RUNLOCK(&V_tcbinfo); ti_locked =3D TI_UNLOCKED; } #ifdef INVARIANTS @@ -1478,13 +1478,13 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th,= struct socket *so, */ if ((thflags & (TH_SYN | TH_FIN | TH_RST)) !=3D 0 || tp->t_state !=3D TCPS_ESTABLISHED) { - KASSERT(ti_locked =3D=3D TI_WLOCKED, ("%s ti_locked %d for " + KASSERT(ti_locked =3D=3D TI_RLOCKED, ("%s ti_locked %d for " "SYN/FIN/RST/!EST", __func__, ti_locked)); - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); } else { #ifdef INVARIANTS - if (ti_locked =3D=3D TI_WLOCKED) - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + if (ti_locked =3D=3D TI_RLOCKED) + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); else { KASSERT(ti_locked =3D=3D TI_UNLOCKED, ("%s: EST " "ti_locked: %d", __func__, ti_locked)); @@ -1652,8 +1652,8 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, s= truct socket *so, /* * This is a pure ack for outstanding data. */ - if (ti_locked =3D=3D TI_WLOCKED) - INP_INFO_WUNLOCK(&V_tcbinfo); + if (ti_locked =3D=3D TI_RLOCKED) + INP_INFO_RUNLOCK(&V_tcbinfo); ti_locked =3D TI_UNLOCKED; =20 TCPSTAT_INC(tcps_predack); @@ -1756,8 +1756,8 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, s= truct socket *so, * nothing on the reassembly queue and we have enough * buffer space to take it. */ - if (ti_locked =3D=3D TI_WLOCKED) - INP_INFO_WUNLOCK(&V_tcbinfo); + if (ti_locked =3D=3D TI_RLOCKED) + INP_INFO_RUNLOCK(&V_tcbinfo); ti_locked =3D TI_UNLOCKED; =20 /* Clean receiver SACK report if present */ @@ -1992,9 +1992,9 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, s= truct socket *so, tcp_state_change(tp, TCPS_SYN_RECEIVED); } =20 - KASSERT(ti_locked =3D=3D TI_WLOCKED, ("%s: trimthenstep6: " + KASSERT(ti_locked =3D=3D TI_RLOCKED, ("%s: trimthenstep6: " "ti_locked %d", __func__, ti_locked)); - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(tp->t_inpcb); =20 /* @@ -2067,8 +2067,8 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, s= truct socket *so, SEQ_LT(th->th_seq, tp->last_ack_sent + tp->rcv_wnd)) || (tp->rcv_wnd =3D=3D 0 && tp->last_ack_sent =3D=3D th->th_seq)) { =20 - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); - KASSERT(ti_locked =3D=3D TI_WLOCKED, + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); + KASSERT(ti_locked =3D=3D TI_RLOCKED, ("%s: TH_RST ti_locked %d, th %p tp %p", __func__, ti_locked, th, tp)); KASSERT(tp->t_state !=3D TCPS_SYN_SENT, @@ -2111,9 +2111,9 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, s= truct socket *so, * Send challenge ACK for any SYN in synchronized state. */ if ((thflags & TH_SYN) && tp->t_state !=3D TCPS_SYN_SENT) { - KASSERT(ti_locked =3D=3D TI_WLOCKED, + KASSERT(ti_locked =3D=3D TI_RLOCKED, ("tcp_do_segment: TH_SYN ti_locked %d", ti_locked)); - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); =20 TCPSTAT_INC(tcps_badsyn); if (V_tcp_insecure_syn && @@ -2226,9 +2226,9 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, s= truct socket *so, */ if ((so->so_state & SS_NOFDREF) && tp->t_state > TCPS_CLOSE_WAIT && tlen) { - KASSERT(ti_locked =3D=3D TI_WLOCKED, ("%s: SS_NOFDEREF && " + KASSERT(ti_locked =3D=3D TI_RLOCKED, ("%s: SS_NOFDEREF && " "CLOSE_WAIT && tlen ti_locked %d", __func__, ti_locked)); - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); =20 if ((s =3D tcp_log_addrs(inc, th, NULL, NULL))) { log(LOG_DEBUG, "%s; %s: %s: Received %d bytes of data " @@ -2729,9 +2729,9 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, s= truct socket *so, */ case TCPS_CLOSING: if (ourfinisacked) { - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); tcp_twstart(tp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); m_freem(m); return; } @@ -2745,7 +2745,7 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, s= truct socket *so, */ case TCPS_LAST_ACK: if (ourfinisacked) { - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); tp =3D tcp_close(tp); goto drop; } @@ -2959,18 +2959,18 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th,= struct socket *so, * standard timers. */ case TCPS_FIN_WAIT_2: - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); - KASSERT(ti_locked =3D=3D TI_WLOCKED, ("%s: dodata " + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); + KASSERT(ti_locked =3D=3D TI_RLOCKED, ("%s: dodata " "TCP_FIN_WAIT_2 ti_locked: %d", __func__, ti_locked)); =20 tcp_twstart(tp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); return; } } - if (ti_locked =3D=3D TI_WLOCKED) - INP_INFO_WUNLOCK(&V_tcbinfo); + if (ti_locked =3D=3D TI_RLOCKED) + INP_INFO_RUNLOCK(&V_tcbinfo); ti_locked =3D TI_UNLOCKED; =20 #ifdef TCPDEBUG @@ -3025,8 +3025,8 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, s= truct socket *so, tcp_trace(TA_DROP, ostate, tp, (void *)tcp_saveipgen, &tcp_savetcp, 0); #endif - if (ti_locked =3D=3D TI_WLOCKED) - INP_INFO_WUNLOCK(&V_tcbinfo); + if (ti_locked =3D=3D TI_RLOCKED) + INP_INFO_RUNLOCK(&V_tcbinfo); ti_locked =3D TI_UNLOCKED; =20 tp->t_flags |=3D TF_ACKNOW; @@ -3036,8 +3036,8 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, s= truct socket *so, return; =20 dropwithreset: - if (ti_locked =3D=3D TI_WLOCKED) - INP_INFO_WUNLOCK(&V_tcbinfo); + if (ti_locked =3D=3D TI_RLOCKED) + INP_INFO_RUNLOCK(&V_tcbinfo); ti_locked =3D TI_UNLOCKED; =20 if (tp !=3D NULL) { @@ -3048,8 +3048,8 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, s= truct socket *so, return; =20 drop: - if (ti_locked =3D=3D TI_WLOCKED) { - INP_INFO_WUNLOCK(&V_tcbinfo); + if (ti_locked =3D=3D TI_RLOCKED) { + INP_INFO_RUNLOCK(&V_tcbinfo); ti_locked =3D TI_UNLOCKED; } #ifdef INVARIANTS diff --git a/sys/netinet/tcp_subr.c b/sys/netinet/tcp_subr.c index 7adda33..ff2153e 100644 --- a/sys/netinet/tcp_subr.c +++ b/sys/netinet/tcp_subr.c @@ -849,11 +849,11 @@ tcp_ccalgounload(struct cc_algo *unload_algo) VNET_LIST_RLOCK(); VNET_FOREACH(vnet_iter) { CURVNET_SET(vnet_iter); - INP_INFO_RLOCK(&V_tcbinfo); + INP_INFO_WLOCK(&V_tcbinfo); /* * New connections already part way through being initialised * with the CC algo we're removing will not race with this code - * because the INP_INFO_WLOCK is held during initialisation. We + * because the INP_INFO_RLOCK is held during initialisation. We * therefore don't enter the loop below until the connection * list has stabilised. */ @@ -879,7 +879,7 @@ tcp_ccalgounload(struct cc_algo *unload_algo) } INP_WUNLOCK(inp); } - INP_INFO_RUNLOCK(&V_tcbinfo); + INP_INFO_WUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); } VNET_LIST_RUNLOCK(); @@ -897,7 +897,7 @@ tcp_drop(struct tcpcb *tp, int errno) { struct socket *so =3D tp->t_inpcb->inp_socket; =20 - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(tp->t_inpcb); =20 if (TCPS_HAVERCVDSYN(tp->t_state)) { @@ -1033,7 +1033,7 @@ tcp_close(struct tcpcb *tp) struct inpcb *inp =3D tp->t_inpcb; struct socket *so; =20 - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); =20 #ifdef TCP_OFFLOAD @@ -1081,7 +1081,7 @@ tcp_drain(void) * where we're really low on mbufs, this is potentially * useful. */ - INP_INFO_RLOCK(&V_tcbinfo); + INP_INFO_WLOCK(&V_tcbinfo); LIST_FOREACH(inpb, V_tcbinfo.ipi_listhead, inp_list) { if (inpb->inp_flags & INP_TIMEWAIT) continue; @@ -1092,7 +1092,7 @@ tcp_drain(void) } INP_WUNLOCK(inpb); } - INP_INFO_RUNLOCK(&V_tcbinfo); + INP_INFO_WUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); } VNET_LIST_RUNLOCK_NOSLEEP(); @@ -1176,8 +1176,10 @@ tcp_pcblist(SYSCTL_HANDLER_ARGS) * OK, now we're committed to doing something. */ INP_INFO_RLOCK(&V_tcbinfo); + INP_LIST_RLOCK(&V_tcbinfo); gencnt =3D V_tcbinfo.ipi_gencnt; n =3D V_tcbinfo.ipi_count; + INP_LIST_RUNLOCK(&V_tcbinfo); INP_INFO_RUNLOCK(&V_tcbinfo); =20 m =3D syncache_pcbcount(); @@ -1203,7 +1205,7 @@ tcp_pcblist(SYSCTL_HANDLER_ARGS) if (inp_list =3D=3D NULL) return (ENOMEM); =20 - INP_INFO_RLOCK(&V_tcbinfo); + INP_INFO_WLOCK(&V_tcbinfo); for (inp =3D LIST_FIRST(V_tcbinfo.ipi_listhead), i =3D 0; inp !=3D NULL && i < n; inp =3D LIST_NEXT(inp, inp_list)) { INP_WLOCK(inp); @@ -1228,7 +1230,7 @@ tcp_pcblist(SYSCTL_HANDLER_ARGS) } INP_WUNLOCK(inp); } - INP_INFO_RUNLOCK(&V_tcbinfo); + INP_INFO_WUNLOCK(&V_tcbinfo); n =3D i; =20 error =3D 0; @@ -1266,14 +1268,14 @@ tcp_pcblist(SYSCTL_HANDLER_ARGS) } else INP_RUNLOCK(inp); } - INP_INFO_WLOCK(&V_tcbinfo); + INP_INFO_RLOCK(&V_tcbinfo); for (i =3D 0; i < n; i++) { inp =3D inp_list[i]; INP_RLOCK(inp); if (!in_pcbrele_rlocked(inp)) INP_RUNLOCK(inp); } - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); =20 if (!error) { /* @@ -1284,9 +1286,11 @@ tcp_pcblist(SYSCTL_HANDLER_ARGS) * might be necessary to retry. */ INP_INFO_RLOCK(&V_tcbinfo); + INP_LIST_RLOCK(&V_tcbinfo); xig.xig_gen =3D V_tcbinfo.ipi_gencnt; xig.xig_sogen =3D so_gencnt; xig.xig_count =3D V_tcbinfo.ipi_count + pcb_count; + INP_LIST_RUNLOCK(&V_tcbinfo); INP_INFO_RUNLOCK(&V_tcbinfo); error =3D SYSCTL_OUT(req, &xig, sizeof xig); } @@ -1448,7 +1452,7 @@ tcp_ctlinput(int cmd, struct sockaddr *sa, void *vi= p) - offsetof(struct icmp, icmp_ip)); th =3D (struct tcphdr *)((caddr_t)ip + (ip->ip_hl << 2)); - INP_INFO_WLOCK(&V_tcbinfo); + INP_INFO_RLOCK(&V_tcbinfo); inp =3D in_pcblookup(&V_tcbinfo, faddr, th->th_dport, ip->ip_src, th->th_sport, INPLOOKUP_WLOCKPCB, NULL); if (inp !=3D NULL) { @@ -1508,7 +1512,7 @@ tcp_ctlinput(int cmd, struct sockaddr *sa, void *vi= p) inc.inc_laddr =3D ip->ip_src; syncache_unreach(&inc, th); } - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); } else in_pcbnotifyall(&V_tcbinfo, faddr, inetctlerrmap[cmd], notify); } @@ -1581,9 +1585,9 @@ tcp6_ctlinput(int cmd, struct sockaddr *sa, void *d= ) inc.inc6_faddr =3D ((struct sockaddr_in6 *)sa)->sin6_addr; inc.inc6_laddr =3D ip6cp->ip6c_src->sin6_addr; inc.inc_flags |=3D INC_ISIPV6; - INP_INFO_WLOCK(&V_tcbinfo); + INP_INFO_RLOCK(&V_tcbinfo); syncache_unreach(&inc, &th); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); } else in6_pcbnotify(&V_tcbinfo, sa, 0, (const struct sockaddr *)sa6_src, 0, cmd, NULL, notify); @@ -1716,7 +1720,7 @@ tcp_drop_syn_sent(struct inpcb *inp, int errno) { struct tcpcb *tp; =20 - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); =20 if ((inp->inp_flags & INP_TIMEWAIT) || @@ -2239,7 +2243,7 @@ sysctl_drop(SYSCTL_HANDLER_ARGS) default: return (EINVAL); } - INP_INFO_WLOCK(&V_tcbinfo); + INP_INFO_RLOCK(&V_tcbinfo); switch (addrs[0].ss_family) { #ifdef INET6 case AF_INET6: @@ -2278,7 +2282,7 @@ sysctl_drop(SYSCTL_HANDLER_ARGS) INP_WUNLOCK(inp); } else error =3D ESRCH; - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); return (error); } =20 diff --git a/sys/netinet/tcp_syncache.c b/sys/netinet/tcp_syncache.c index 55a5044..197c788 100644 --- a/sys/netinet/tcp_syncache.c +++ b/sys/netinet/tcp_syncache.c @@ -662,7 +662,7 @@ syncache_socket(struct syncache *sc, struct socket *l= so, struct mbuf *m) int error; char *s; =20 - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); =20 /* * Ok, create the full blown connection, and set things up @@ -944,7 +944,7 @@ syncache_expand(struct in_conninfo *inc, struct tcpop= t *to, struct tcphdr *th, * Global TCP locks are held because we manipulate the PCB lists * and create a new socket. */ - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); KASSERT((th->th_flags & (TH_RST|TH_ACK|TH_SYN)) =3D=3D TH_ACK, ("%s: can handle only ACK", __func__)); =20 diff --git a/sys/netinet/tcp_timer.c b/sys/netinet/tcp_timer.c index 1767e1e..997638c 100644 --- a/sys/netinet/tcp_timer.c +++ b/sys/netinet/tcp_timer.c @@ -269,7 +269,7 @@ tcp_timer_2msl(void *xtp) /* * XXXRW: Does this actually happen? */ - INP_INFO_WLOCK(&V_tcbinfo); + INP_INFO_RLOCK(&V_tcbinfo); inp =3D tp->t_inpcb; /* * XXXRW: While this assert is in fact correct, bugs in the tcpcb @@ -280,7 +280,7 @@ tcp_timer_2msl(void *xtp) */ if (inp =3D=3D NULL) { tcp_timer_race++; - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); return; } @@ -289,14 +289,14 @@ tcp_timer_2msl(void *xtp) if (callout_pending(&tp->t_timers->tt_2msl) || !callout_active(&tp->t_timers->tt_2msl)) { INP_WUNLOCK(tp->t_inpcb); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); return; } callout_deactivate(&tp->t_timers->tt_2msl); if ((inp->inp_flags & INP_DROPPED) !=3D 0) { INP_WUNLOCK(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); return; } @@ -332,7 +332,7 @@ tcp_timer_2msl(void *xtp) #endif if (tp !=3D NULL) INP_WUNLOCK(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); } =20 @@ -348,7 +348,7 @@ tcp_timer_keep(void *xtp) =20 ostate =3D tp->t_state; #endif - INP_INFO_WLOCK(&V_tcbinfo); + INP_INFO_RLOCK(&V_tcbinfo); inp =3D tp->t_inpcb; /* * XXXRW: While this assert is in fact correct, bugs in the tcpcb @@ -359,7 +359,7 @@ tcp_timer_keep(void *xtp) */ if (inp =3D=3D NULL) { tcp_timer_race++; - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); return; } @@ -367,14 +367,14 @@ tcp_timer_keep(void *xtp) if (callout_pending(&tp->t_timers->tt_keep) || !callout_active(&tp->t_timers->tt_keep)) { INP_WUNLOCK(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); return; } callout_deactivate(&tp->t_timers->tt_keep); if ((inp->inp_flags & INP_DROPPED) !=3D 0) { INP_WUNLOCK(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); return; } @@ -421,7 +421,7 @@ tcp_timer_keep(void *xtp) PRU_SLOWTIMO); #endif INP_WUNLOCK(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); return; =20 @@ -436,7 +436,7 @@ tcp_timer_keep(void *xtp) #endif if (tp !=3D NULL) INP_WUNLOCK(tp->t_inpcb); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); } =20 @@ -451,7 +451,7 @@ tcp_timer_persist(void *xtp) =20 ostate =3D tp->t_state; #endif - INP_INFO_WLOCK(&V_tcbinfo); + INP_INFO_RLOCK(&V_tcbinfo); inp =3D tp->t_inpcb; /* * XXXRW: While this assert is in fact correct, bugs in the tcpcb @@ -462,7 +462,7 @@ tcp_timer_persist(void *xtp) */ if (inp =3D=3D NULL) { tcp_timer_race++; - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); return; } @@ -470,14 +470,14 @@ tcp_timer_persist(void *xtp) if (callout_pending(&tp->t_timers->tt_persist) || !callout_active(&tp->t_timers->tt_persist)) { INP_WUNLOCK(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); return; } callout_deactivate(&tp->t_timers->tt_persist); if ((inp->inp_flags & INP_DROPPED) !=3D 0) { INP_WUNLOCK(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); return; } @@ -522,7 +522,7 @@ tcp_timer_persist(void *xtp) #endif if (tp !=3D NULL) INP_WUNLOCK(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); } =20 @@ -581,16 +581,16 @@ tcp_timer_rexmt(void * xtp) in_pcbref(inp); INP_INFO_RUNLOCK(&V_tcbinfo); INP_WUNLOCK(inp); - INP_INFO_WLOCK(&V_tcbinfo); + INP_INFO_RLOCK(&V_tcbinfo); INP_WLOCK(inp); if (in_pcbrele_wlocked(inp)) { - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); return; } if (inp->inp_flags & INP_DROPPED) { INP_WUNLOCK(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); return; } @@ -688,7 +688,7 @@ tcp_timer_rexmt(void * xtp) if (tp !=3D NULL) INP_WUNLOCK(inp); if (headlocked) - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); } =20 diff --git a/sys/netinet/tcp_timewait.c b/sys/netinet/tcp_timewait.c index 9c17655..7687058 100644 --- a/sys/netinet/tcp_timewait.c +++ b/sys/netinet/tcp_timewait.c @@ -202,10 +202,10 @@ tcp_tw_destroy(void) { struct tcptw *tw; =20 - INP_INFO_WLOCK(&V_tcbinfo); + INP_INFO_RLOCK(&V_tcbinfo); while ((tw =3D TAILQ_FIRST(&V_twq_2msl)) !=3D NULL) tcp_twclose(tw, 0); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); =20 TW_LOCK_DESTROY(V_tw_lock); uma_zdestroy(V_tcptw_zone); @@ -228,7 +228,7 @@ tcp_twstart(struct tcpcb *tp) int isipv6 =3D inp->inp_inc.inc_flags & INC_ISIPV6; #endif =20 - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); =20 if (V_nolocaltimewait) { @@ -357,7 +357,7 @@ tcp_twcheck(struct inpcb *inp, struct tcpopt *to __un= used, struct tcphdr *th, int thflags; tcp_seq seq; =20 - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); =20 /* @@ -458,7 +458,7 @@ tcp_twclose(struct tcptw *tw, int reuse) inp =3D tw->tw_inpcb; KASSERT((inp->inp_flags & INP_TIMEWAIT), ("tcp_twclose: !timewait")); KASSERT(intotw(inp) =3D=3D tw, ("tcp_twclose: inp_ppcb !=3D tw")); - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); /* in_pcbfree() */ + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); /* in_pcbfree() */ INP_WLOCK_ASSERT(inp); =20 tcp_tw_2msl_stop(tw, reuse); @@ -613,7 +613,7 @@ static void tcp_tw_2msl_reset(struct tcptw *tw, int rearm) { =20 - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(tw->tw_inpcb); =20 TW_WLOCK(V_tw_lock); @@ -631,7 +631,7 @@ tcp_tw_2msl_stop(struct tcptw *tw, int reuse) struct inpcb *inp; int released; =20 - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); =20 TW_WLOCK(V_tw_lock); inp =3D tw->tw_inpcb; @@ -658,7 +658,7 @@ tcp_tw_2msl_reuse(void) struct tcptw *tw; struct inpcb *inp; =20 - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); =20 for (;;) { TW_RLOCK(V_tw_lock); @@ -715,26 +715,26 @@ tcp_tw_2msl_scan(void) in_pcbref(inp); TW_RUNLOCK(V_tw_lock); =20 - if (INP_INFO_TRY_WLOCK(&V_tcbinfo)) { + if (INP_INFO_TRY_RLOCK(&V_tcbinfo)) { =20 INP_WLOCK(inp); tw =3D intotw(inp); if (in_pcbrele_wlocked(inp)) { KASSERT(tw =3D=3D NULL, ("%s: held last inp " "reference but tw not NULL", __func__)); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); continue; } =20 if (tw =3D=3D NULL) { /* tcp_twclose() has already been called */ INP_WUNLOCK(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); continue; } =20 tcp_twclose(tw, 0); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); } else { /* INP_INFO lock is busy, continue later. */ INP_WLOCK(inp); diff --git a/sys/netinet/tcp_usrreq.c b/sys/netinet/tcp_usrreq.c index 2820ffb..faa2d45 100644 --- a/sys/netinet/tcp_usrreq.c +++ b/sys/netinet/tcp_usrreq.c @@ -163,7 +163,7 @@ tcp_detach(struct socket *so, struct inpcb *inp) { struct tcpcb *tp; =20 - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); =20 KASSERT(so->so_pcb =3D=3D inp, ("tcp_detach: so_pcb !=3D inp")); @@ -244,12 +244,12 @@ tcp_usr_detach(struct socket *so) =20 inp =3D sotoinpcb(so); KASSERT(inp !=3D NULL, ("tcp_usr_detach: inp =3D=3D NULL")); - INP_INFO_WLOCK(&V_tcbinfo); + INP_INFO_RLOCK(&V_tcbinfo); INP_WLOCK(inp); KASSERT(inp->inp_socket !=3D NULL, ("tcp_usr_detach: inp_socket =3D=3D NULL")); tcp_detach(so, inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); } =20 #ifdef INET @@ -603,7 +603,7 @@ tcp_usr_disconnect(struct socket *so) int error =3D 0; =20 TCPDEBUG0; - INP_INFO_WLOCK(&V_tcbinfo); + INP_INFO_RLOCK(&V_tcbinfo); inp =3D sotoinpcb(so); KASSERT(inp !=3D NULL, ("tcp_usr_disconnect: inp =3D=3D NULL")); INP_WLOCK(inp); @@ -619,7 +619,7 @@ tcp_usr_disconnect(struct socket *so) out: TCPDEBUG2(PRU_DISCONNECT); INP_WUNLOCK(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); return (error); } =20 @@ -734,7 +734,7 @@ tcp_usr_shutdown(struct socket *so) struct tcpcb *tp =3D NULL; =20 TCPDEBUG0; - INP_INFO_WLOCK(&V_tcbinfo); + INP_INFO_RLOCK(&V_tcbinfo); inp =3D sotoinpcb(so); KASSERT(inp !=3D NULL, ("inp =3D=3D NULL")); INP_WLOCK(inp); @@ -752,7 +752,7 @@ tcp_usr_shutdown(struct socket *so) out: TCPDEBUG2(PRU_SHUTDOWN); INP_WUNLOCK(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); =20 return (error); } @@ -814,7 +814,7 @@ tcp_usr_send(struct socket *so, int flags, struct mbu= f *m, * this call. */ if (flags & PRUS_EOF) - INP_INFO_WLOCK(&V_tcbinfo); + INP_INFO_RLOCK(&V_tcbinfo); inp =3D sotoinpcb(so); KASSERT(inp !=3D NULL, ("tcp_usr_send: inp =3D=3D NULL")); INP_WLOCK(inp); @@ -871,7 +871,7 @@ tcp_usr_send(struct socket *so, int flags, struct mbu= f *m, * Close the send side of the connection after * the data is sent. */ - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); socantsendmore(so); tcp_usrclosed(tp); } @@ -935,7 +935,7 @@ tcp_usr_send(struct socket *so, int flags, struct mbu= f *m, ((flags & PRUS_EOF) ? PRU_SEND_EOF : PRU_SEND)); INP_WUNLOCK(inp); if (flags & PRUS_EOF) - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); return (error); } =20 @@ -952,7 +952,7 @@ tcp_usr_abort(struct socket *so) inp =3D sotoinpcb(so); KASSERT(inp !=3D NULL, ("tcp_usr_abort: inp =3D=3D NULL")); =20 - INP_INFO_WLOCK(&V_tcbinfo); + INP_INFO_RLOCK(&V_tcbinfo); INP_WLOCK(inp); KASSERT(inp->inp_socket !=3D NULL, ("tcp_usr_abort: inp_socket =3D=3D NULL")); @@ -974,7 +974,7 @@ tcp_usr_abort(struct socket *so) inp->inp_flags |=3D INP_SOCKREF; } INP_WUNLOCK(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); } =20 /* @@ -990,7 +990,7 @@ tcp_usr_close(struct socket *so) inp =3D sotoinpcb(so); KASSERT(inp !=3D NULL, ("tcp_usr_close: inp =3D=3D NULL")); =20 - INP_INFO_WLOCK(&V_tcbinfo); + INP_INFO_RLOCK(&V_tcbinfo); INP_WLOCK(inp); KASSERT(inp->inp_socket !=3D NULL, ("tcp_usr_close: inp_socket =3D=3D NULL")); @@ -1013,7 +1013,7 @@ tcp_usr_close(struct socket *so) inp->inp_flags |=3D INP_SOCKREF; } INP_WUNLOCK(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); } =20 /* @@ -1611,10 +1611,10 @@ tcp_attach(struct socket *so) } so->so_rcv.sb_flags |=3D SB_AUTOSIZE; so->so_snd.sb_flags |=3D SB_AUTOSIZE; - INP_INFO_WLOCK(&V_tcbinfo); + INP_INFO_RLOCK(&V_tcbinfo); error =3D in_pcballoc(so, &V_tcbinfo); if (error) { - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); return (error); } inp =3D sotoinpcb(so); @@ -1630,12 +1630,12 @@ tcp_attach(struct socket *so) if (tp =3D=3D NULL) { in_pcbdetach(inp); in_pcbfree(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); return (ENOBUFS); } tp->t_state =3D TCPS_CLOSED; INP_WUNLOCK(inp); - INP_INFO_WUNLOCK(&V_tcbinfo); + INP_INFO_RUNLOCK(&V_tcbinfo); return (0); } =20 @@ -1653,7 +1653,7 @@ tcp_disconnect(struct tcpcb *tp) struct inpcb *inp =3D tp->t_inpcb; struct socket *so =3D inp->inp_socket; =20 - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); =20 /* @@ -1691,7 +1691,7 @@ static void tcp_usrclosed(struct tcpcb *tp) { =20 - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(tp->t_inpcb); =20 switch (tp->t_state) { diff --git a/sys/netinet/toecore.c b/sys/netinet/toecore.c index 1ab6c73..5887576 100644 --- a/sys/netinet/toecore.c +++ b/sys/netinet/toecore.c @@ -339,7 +339,7 @@ toe_syncache_expand(struct in_conninfo *inc, struct t= cpopt *to, struct tcphdr *th, struct socket **lsop) { =20 - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); =20 return (syncache_expand(inc, to, th, lsop, NULL)); } @@ -370,7 +370,7 @@ toe_4tuple_check(struct in_conninfo *inc, struct tcph= dr *th, struct ifnet *ifp) =20 if ((inp->inp_flags & INP_TIMEWAIT) && th !=3D NULL) { =20 - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); /* for twcheck */ + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); /* for twcheck */ if (!tcp_twcheck(inp, NULL, th, NULL, 0)) return (EADDRINUSE); } else { @@ -574,7 +574,7 @@ toe_connect_failed(struct toedev *tod, struct inpcb *= inp, int err) (void) tcp_output(tp); } else { =20 - INP_INFO_WLOCK_ASSERT(&V_tcbinfo); + INP_INFO_RLOCK_ASSERT(&V_tcbinfo); tp =3D tcp_drop(tp, err); if (tp =3D=3D NULL) INP_WLOCK(inp); /* re-acquire */ diff --git a/sys/netinet6/in6_pcb.c b/sys/netinet6/in6_pcb.c index 2be2e83..0fcf091 100644 --- a/sys/netinet6/in6_pcb.c +++ b/sys/netinet6/in6_pcb.c @@ -795,7 +795,7 @@ in6_pcbpurgeif0(struct inpcbinfo *pcbinfo, struct ifn= et *ifp) struct ip6_moptions *im6o; int i, gap; =20 - INP_INFO_RLOCK(pcbinfo); + INP_INFO_WLOCK(pcbinfo); LIST_FOREACH(in6p, pcbinfo->ipi_listhead, inp_list) { INP_WLOCK(in6p); im6o =3D in6p->in6p_moptions; @@ -826,7 +826,7 @@ in6_pcbpurgeif0(struct inpcbinfo *pcbinfo, struct ifn= et *ifp) } INP_WUNLOCK(in6p); } - INP_INFO_RUNLOCK(pcbinfo); + INP_INFO_WUNLOCK(pcbinfo); } =20 /* --------------090402000607020404040507-- --IArnHQlKn5sngVOmi6i0lvig0oDVojeVN Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.22 (Darwin) Comment: GPGTools - https://gpgtools.org iQEcBAEBCgAGBQJULqHKAAoJEKVlQ5Je6dhxgKMH/2WnT+RXtDWrzwmM2Hx6PA6R QsnJyvWbSwvNpL+zLyMWReJyHWmMlfqoUD1Te8TyssjpAthqYGx45Bu3WqoZW7um rmTMbRJjTqzttYctaz3BDGB9xrpVqAG1oVK+urVzZ4qX7yO6b+GSIErfB2coPget EsWA9Pbw42bBT89SX91Oc/KE2Gu0gv5Npcqk1EDQ8UpGPNI+iG7XHPrgpDdmXEkM sDXDiy1ubql/Jrzax7cmr9RubxQ/QXktb3FbKpCiZOgxsF2kpWmcsi7PtVKuG5p5 l6rGjNU5PjsguCxZCQZzFsiPf6lHuuKDjlb+r6Ug73eA6n1DAPxG9jdHKct9Do0= =GIWQ -----END PGP SIGNATURE----- --IArnHQlKn5sngVOmi6i0lvig0oDVojeVN--