From owner-svn-src-head@FreeBSD.ORG Sat Jun 4 16:27:57 2011 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7B2411065672; Sat, 4 Jun 2011 16:27:57 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id F3FFB8FC14; Sat, 4 Jun 2011 16:27:56 +0000 (UTC) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id 6F0B846B06; Sat, 4 Jun 2011 12:27:56 -0400 (EDT) Date: Sat, 4 Jun 2011 17:27:56 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Kristof Provost In-Reply-To: <20110604143043.GE17764@nereid> Message-ID: References: <201105300943.p4U9htjI070096@svn.freebsd.org> <20110604143043.GE17764@nereid> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r222488 - in head/sys: contrib/pf/net netinet netinet/ipfw netinet6 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Jun 2011 16:27:57 -0000 On Sat, 4 Jun 2011, Kristof Provost wrote: > I'm seeing a panic when I start natd, which I suspect is related to this > commit: And, I believe now fixed in the just-committed r222690. My suggestion is that people running -CURRENT should generally keep INVARIANTS and WITNESS on -- they introduce some overhead, but are invaluable in tracking down problems. Thanks! Robert > > panic: Lock pcbinfohash not exclusively locked @ > /usr/src/sys/netinet/in_pcb.c:323 > > Backtrace: > panic() ... > _rw_assert() at _rw_assert+0x18d > in_pcbbind() at in_pcbbind+0xf5 > div_bind() at div_bind+0xb9 > sobind() at sobind()+0x79 > kern_bind() at kern_bind+0xde > bind() at bind()+0x41 > syscallenter() at syscallenter+0x1cb > syscall() at syscall+0x4c > Xfast_syscall() at Xfast_syscall+0xdd > > div_bind probably also needs to surround the call to in_pcbbind with > INP_HASHW(UN)LOCK(...) > > I'm currently running 222680. I've only now seen the issue, but I've also > just now activated INVARIANTS. > > Regards, > Kristof > > On 2011-05-30 09:43:55 (+0000), Robert Watson wrote: >> Author: rwatson >> Date: Mon May 30 09:43:55 2011 >> New Revision: 222488 >> URL: http://svn.freebsd.org/changeset/base/222488 >> >> Log: >> Decompose the current single inpcbinfo lock into two locks: >> >> - The existing ipi_lock continues to protect the global inpcb list and >> inpcb counter. This lock is now relegated to a small number of >> allocation and free operations, and occasional operations that walk >> all connections (including, awkwardly, certain UDP multicast receive >> operations -- something to revisit). >> >> - A new ipi_hash_lock protects the two inpcbinfo hash tables for >> looking up connections and bound sockets, manipulated using new >> INP_HASH_*() macros. This lock, combined with inpcb locks, protects >> the 4-tuple address space. >> >> Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb >> connection locks, so may be acquired while manipulating a connection on >> which a lock is already held, avoiding the need to acquire the inpcbinfo >> lock preemptively when a binding change might later be required. As a >> result, however, lookup operations necessarily go through a reference >> acquire while holding the lookup lock, later acquiring an inpcb lock -- >> if required. >> >> A new function in_pcblookup() looks up connections, and accepts flags >> indicating how to return the inpcb. Due to lock order changes, callers >> no longer need acquire locks before performing a lookup: the lookup >> routine will acquire the ipi_hash_lock as needed. In the future, it will >> also be able to use alternative lookup and locking strategies >> transparently to callers, such as pcbgroup lookup. New lookup flags are, >> supplementing the existing INPLOOKUP_WILDCARD flag: >> >> INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb >> INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb >> >> Callers must pass exactly one of these flags (for the time being). >> >> Some notes: >> >> - All protocols are updated to work within the new regime; especially, >> TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely >> eliminated, and global hash lock hold times are dramatically reduced >> compared to previous locking. >> - The TCP syncache still relies on the pcbinfo lock, something that we >> may want to revisit. >> - Support for reverting to the FreeBSD 7.x locking strategy in TCP input >> is no longer available -- hash lookup locks are now held only very >> briefly during inpcb lookup, rather than for potentially extended >> periods. However, the pcbinfo ipi_lock will still be acquired if a >> connection state might change such that a connection is added or >> removed. >> - Raw IP sockets continue to use the pcbinfo ipi_lock for protection, >> due to maintaining their own hash tables. >> - The interface in6_pcblookup_hash_locked() is maintained, which allows >> callers to acquire hash locks and perform one or more lookups atomically >> with 4-tuple allocation: this is required only for TCPv6, as there is no >> in6_pcbconnect_setup(), which there should be. >> - UDPv6 locking remains significantly more conservative than UDPv4 >> locking, which relates to source address selection. This needs >> attention, as it likely significantly reduces parallelism in this code >> for multithreaded socket use (such as in BIND). >> - In the UDPv4 and UDPv6 multicast cases, we need to revisit locking >> somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which >> is no longer sufficient. A second check once the inpcb lock is held >> should do the trick, keeping the general case from requiring the inpcb >> lock for every inpcb visited. >> - This work reminds us that we need to revisit locking of the v4/v6 flags, >> which may be accessed lock-free both before and after this change. >> - Right now, a single lock name is used for the pcbhash lock -- this is >> undesirable, and probably another argument is required to take care of >> this (or a char array name field in the pcbinfo?). >> >> This is not an MFC candidate for 8.x due to its impact on lookup and >> locking semantics. It's possible some of these issues could be worked >> around with compatibility wrappers, if necessary. >> >> Reviewed by: bz >> Sponsored by: Juniper Networks, Inc. >> >> Modified: >> head/sys/contrib/pf/net/pf.c >> head/sys/netinet/in_pcb.c >> head/sys/netinet/in_pcb.h >> head/sys/netinet/ip_divert.c >> head/sys/netinet/ipfw/ip_fw2.c >> head/sys/netinet/raw_ip.c >> head/sys/netinet/siftr.c >> head/sys/netinet/tcp_input.c >> head/sys/netinet/tcp_subr.c >> head/sys/netinet/tcp_syncache.c >> head/sys/netinet/tcp_timer.c >> head/sys/netinet/tcp_usrreq.c >> head/sys/netinet/udp_usrreq.c >> head/sys/netinet6/in6_pcb.c >> head/sys/netinet6/in6_pcb.h >> head/sys/netinet6/in6_src.c >> head/sys/netinet6/udp6_usrreq.c >> >> Modified: head/sys/contrib/pf/net/pf.c >> ============================================================================== >> --- head/sys/contrib/pf/net/pf.c Mon May 30 09:41:38 2011 (r222487) >> +++ head/sys/contrib/pf/net/pf.c Mon May 30 09:43:55 2011 (r222488) >> @@ -3034,16 +3034,14 @@ pf_socket_lookup(int direction, struct p >> #ifdef INET >> case AF_INET: >> #ifdef __FreeBSD__ >> - INP_INFO_RLOCK(pi); /* XXX LOR */ >> - inp = in_pcblookup_hash(pi, saddr->v4, sport, daddr->v4, >> - dport, 0, NULL); >> + inp = in_pcblookup(pi, saddr->v4, sport, daddr->v4, >> + dport, INPLOOKUP_RLOCKPCB, NULL); >> if (inp == NULL) { >> - inp = in_pcblookup_hash(pi, saddr->v4, sport, >> - daddr->v4, dport, INPLOOKUP_WILDCARD, NULL); >> - if(inp == NULL) { >> - INP_INFO_RUNLOCK(pi); >> + inp = in_pcblookup(pi, saddr->v4, sport, >> + daddr->v4, dport, INPLOOKUP_WILDCARD | >> + INPLOOKUP_RLOCKPCB, NULL); >> + if (inp == NULL) >> return (-1); >> - } >> } >> #else >> inp = in_pcbhashlookup(tb, saddr->v4, sport, daddr->v4, dport); >> @@ -3058,16 +3056,14 @@ pf_socket_lookup(int direction, struct p >> #ifdef INET6 >> case AF_INET6: >> #ifdef __FreeBSD__ >> - INP_INFO_RLOCK(pi); >> - inp = in6_pcblookup_hash(pi, &saddr->v6, sport, >> - &daddr->v6, dport, 0, NULL); >> + inp = in6_pcblookup(pi, &saddr->v6, sport, >> + &daddr->v6, dport, INPLOOKUP_RLOCKPCB, NULL); >> if (inp == NULL) { >> - inp = in6_pcblookup_hash(pi, &saddr->v6, sport, >> - &daddr->v6, dport, INPLOOKUP_WILDCARD, NULL); >> - if (inp == NULL) { >> - INP_INFO_RUNLOCK(pi); >> + inp = in6_pcblookup(pi, &saddr->v6, sport, >> + &daddr->v6, dport, INPLOOKUP_WILDCARD | >> + INPLOOKUP_RLOCKPCB, NULL); >> + if (inp == NULL) >> return (-1); >> - } >> } >> #else >> inp = in6_pcbhashlookup(tb, &saddr->v6, sport, &daddr->v6, >> @@ -3085,9 +3081,10 @@ pf_socket_lookup(int direction, struct p >> return (-1); >> } >> #ifdef __FreeBSD__ >> + INP_RLOCK_ASSERT(inp); >> pd->lookup.uid = inp->inp_cred->cr_uid; >> pd->lookup.gid = inp->inp_cred->cr_groups[0]; >> - INP_INFO_RUNLOCK(pi); >> + INP_RUNLOCK(inp); >> #else >> pd->lookup.uid = inp->inp_socket->so_euid; >> pd->lookup.gid = inp->inp_socket->so_egid; >> >> Modified: head/sys/netinet/in_pcb.c >> ============================================================================== >> --- head/sys/netinet/in_pcb.c Mon May 30 09:41:38 2011 (r222487) >> +++ head/sys/netinet/in_pcb.c Mon May 30 09:43:55 2011 (r222488) >> @@ -127,6 +127,10 @@ static VNET_DEFINE(int, ipport_tcplastco >> >> #define V_ipport_tcplastcount VNET(ipport_tcplastcount) >> >> +static struct inpcb *in_pcblookup_hash_locked(struct inpcbinfo *pcbinfo, >> + struct in_addr faddr, u_int fport_arg, >> + struct in_addr laddr, u_int lport_arg, >> + int lookupflags, struct ifnet *ifp); >> static void in_pcbremlists(struct inpcb *inp); >> >> #ifdef INET >> @@ -212,11 +216,13 @@ in_pcbinfo_init(struct inpcbinfo *pcbinf >> { >> >> INP_INFO_LOCK_INIT(pcbinfo, name); >> + INP_HASH_LOCK_INIT(pcbinfo, "pcbinfohash"); /* XXXRW: argument? */ >> #ifdef VIMAGE >> pcbinfo->ipi_vnet = curvnet; >> #endif >> pcbinfo->ipi_listhead = listhead; >> LIST_INIT(pcbinfo->ipi_listhead); >> + pcbinfo->ipi_count = 0; >> pcbinfo->ipi_hashbase = hashinit(hash_nelements, M_PCB, >> &pcbinfo->ipi_hashmask); >> pcbinfo->ipi_porthashbase = hashinit(porthash_nelements, M_PCB, >> @@ -234,10 +240,14 @@ void >> in_pcbinfo_destroy(struct inpcbinfo *pcbinfo) >> { >> >> + KASSERT(pcbinfo->ipi_count == 0, >> + ("%s: ipi_count = %u", __func__, pcbinfo->ipi_count)); >> + >> hashdestroy(pcbinfo->ipi_hashbase, M_PCB, pcbinfo->ipi_hashmask); >> hashdestroy(pcbinfo->ipi_porthashbase, M_PCB, >> pcbinfo->ipi_porthashmask); >> uma_zdestroy(pcbinfo->ipi_zone); >> + INP_HASH_LOCK_DESTROY(pcbinfo); >> INP_INFO_LOCK_DESTROY(pcbinfo); >> } >> >> @@ -309,8 +319,8 @@ in_pcbbind(struct inpcb *inp, struct soc >> { >> int anonport, error; >> >> - INP_INFO_WLOCK_ASSERT(inp->inp_pcbinfo); >> INP_WLOCK_ASSERT(inp); >> + INP_HASH_WLOCK_ASSERT(inp->inp_pcbinfo); >> >> if (inp->inp_lport != 0 || inp->inp_laddr.s_addr != INADDR_ANY) >> return (EINVAL); >> @@ -351,8 +361,8 @@ in_pcb_lport(struct inpcb *inp, struct i >> * Because no actual state changes occur here, a global write lock on >> * the pcbinfo isn't required. >> */ >> - INP_INFO_LOCK_ASSERT(pcbinfo); >> INP_LOCK_ASSERT(inp); >> + INP_HASH_LOCK_ASSERT(pcbinfo); >> >> if (inp->inp_flags & INP_HIGHPORT) { >> first = V_ipport_hifirstauto; /* sysctl */ >> @@ -473,11 +483,10 @@ in_pcbbind_setup(struct inpcb *inp, stru >> int error; >> >> /* >> - * Because no actual state changes occur here, a global write lock on >> - * the pcbinfo isn't required. >> + * No state changes, so read locks are sufficient here. >> */ >> - INP_INFO_LOCK_ASSERT(pcbinfo); >> INP_LOCK_ASSERT(inp); >> + INP_HASH_LOCK_ASSERT(pcbinfo); >> >> if (TAILQ_EMPTY(&V_in_ifaddrhead)) /* XXX broken! */ >> return (EADDRNOTAVAIL); >> @@ -618,8 +627,8 @@ in_pcbconnect(struct inpcb *inp, struct >> in_addr_t laddr, faddr; >> int anonport, error; >> >> - INP_INFO_WLOCK_ASSERT(inp->inp_pcbinfo); >> INP_WLOCK_ASSERT(inp); >> + INP_HASH_WLOCK_ASSERT(inp->inp_pcbinfo); >> >> lport = inp->inp_lport; >> laddr = inp->inp_laddr.s_addr; >> @@ -907,8 +916,8 @@ in_pcbconnect_setup(struct inpcb *inp, s >> * Because a global state change doesn't actually occur here, a read >> * lock is sufficient. >> */ >> - INP_INFO_LOCK_ASSERT(inp->inp_pcbinfo); >> INP_LOCK_ASSERT(inp); >> + INP_HASH_LOCK_ASSERT(inp->inp_pcbinfo); >> >> if (oinpp != NULL) >> *oinpp = NULL; >> @@ -983,8 +992,8 @@ in_pcbconnect_setup(struct inpcb *inp, s >> if (error) >> return (error); >> } >> - oinp = in_pcblookup_hash(inp->inp_pcbinfo, faddr, fport, laddr, lport, >> - 0, NULL); >> + oinp = in_pcblookup_hash_locked(inp->inp_pcbinfo, faddr, fport, >> + laddr, lport, 0, NULL); >> if (oinp != NULL) { >> if (oinpp != NULL) >> *oinpp = oinp; >> @@ -1007,8 +1016,8 @@ void >> in_pcbdisconnect(struct inpcb *inp) >> { >> >> - INP_INFO_WLOCK_ASSERT(inp->inp_pcbinfo); >> INP_WLOCK_ASSERT(inp); >> + INP_HASH_WLOCK_ASSERT(inp->inp_pcbinfo); >> >> inp->inp_faddr.s_addr = INADDR_ANY; >> inp->inp_fport = 0; >> @@ -1187,19 +1196,24 @@ void >> in_pcbdrop(struct inpcb *inp) >> { >> >> - INP_INFO_WLOCK_ASSERT(inp->inp_pcbinfo); >> INP_WLOCK_ASSERT(inp); >> >> + /* >> + * XXXRW: Possibly we should protect the setting of INP_DROPPED with >> + * the hash lock...? >> + */ >> inp->inp_flags |= INP_DROPPED; >> if (inp->inp_flags & INP_INHASHLIST) { >> struct inpcbport *phd = inp->inp_phd; >> >> + INP_HASH_WLOCK(inp->inp_pcbinfo); >> LIST_REMOVE(inp, inp_hash); >> LIST_REMOVE(inp, inp_portlist); >> if (LIST_FIRST(&phd->phd_pcblist) == NULL) { >> LIST_REMOVE(phd, phd_hash); >> free(phd, M_PCB); >> } >> + INP_HASH_WUNLOCK(inp->inp_pcbinfo); >> inp->inp_flags &= ~INP_INHASHLIST; >> } >> } >> @@ -1328,7 +1342,8 @@ in_pcbpurgeif0(struct inpcbinfo *pcbinfo >> } >> >> /* >> - * Lookup a PCB based on the local address and port. >> + * Lookup a PCB based on the local address and port. Caller must hold the >> + * hash lock. No inpcb locks or references are acquired. >> */ >> #define INP_LOOKUP_MAPPED_PCB_COST 3 >> struct inpcb * >> @@ -1346,7 +1361,7 @@ in_pcblookup_local(struct inpcbinfo *pcb >> KASSERT((lookupflags & ~(INPLOOKUP_WILDCARD)) == 0, >> ("%s: invalid lookup flags %d", __func__, lookupflags)); >> >> - INP_INFO_LOCK_ASSERT(pcbinfo); >> + INP_HASH_LOCK_ASSERT(pcbinfo); >> >> if ((lookupflags & INPLOOKUP_WILDCARD) == 0) { >> struct inpcbhead *head; >> @@ -1450,10 +1465,12 @@ in_pcblookup_local(struct inpcbinfo *pcb >> #undef INP_LOOKUP_MAPPED_PCB_COST >> >> /* >> - * Lookup PCB in hash list. >> + * Lookup PCB in hash list, using pcbinfo tables. This variation assumes >> + * that the caller has locked the hash list, and will not perform any further >> + * locking or reference operations on either the hash list or the connection. >> */ >> -struct inpcb * >> -in_pcblookup_hash(struct inpcbinfo *pcbinfo, struct in_addr faddr, >> +static struct inpcb * >> +in_pcblookup_hash_locked(struct inpcbinfo *pcbinfo, struct in_addr faddr, >> u_int fport_arg, struct in_addr laddr, u_int lport_arg, int lookupflags, >> struct ifnet *ifp) >> { >> @@ -1464,7 +1481,7 @@ in_pcblookup_hash(struct inpcbinfo *pcbi >> KASSERT((lookupflags & ~(INPLOOKUP_WILDCARD)) == 0, >> ("%s: invalid lookup flags %d", __func__, lookupflags)); >> >> - INP_INFO_LOCK_ASSERT(pcbinfo); >> + INP_HASH_LOCK_ASSERT(pcbinfo); >> >> /* >> * First look for an exact match. >> @@ -1574,6 +1591,56 @@ in_pcblookup_hash(struct inpcbinfo *pcbi >> >> return (NULL); >> } >> + >> +/* >> + * Lookup PCB in hash list, using pcbinfo tables. This variation locks the >> + * hash list lock, and will return the inpcb locked (i.e., requires >> + * INPLOOKUP_LOCKPCB). >> + */ >> +static struct inpcb * >> +in_pcblookup_hash(struct inpcbinfo *pcbinfo, struct in_addr faddr, >> + u_int fport, struct in_addr laddr, u_int lport, int lookupflags, >> + struct ifnet *ifp) >> +{ >> + struct inpcb *inp; >> + >> + INP_HASH_RLOCK(pcbinfo); >> + inp = in_pcblookup_hash_locked(pcbinfo, faddr, fport, laddr, lport, >> + (lookupflags & ~(INPLOOKUP_RLOCKPCB | INPLOOKUP_WLOCKPCB)), ifp); >> + if (inp != NULL) { >> + in_pcbref(inp); >> + INP_HASH_RUNLOCK(pcbinfo); >> + if (lookupflags & INPLOOKUP_WLOCKPCB) { >> + INP_WLOCK(inp); >> + if (in_pcbrele_wlocked(inp)) >> + return (NULL); >> + } else if (lookupflags & INPLOOKUP_RLOCKPCB) { >> + INP_RLOCK(inp); >> + if (in_pcbrele_rlocked(inp)) >> + return (NULL); >> + } else >> + panic("%s: locking bug", __func__); >> + } else >> + INP_HASH_RUNLOCK(pcbinfo); >> + return (inp); >> +} >> + >> +/* >> + * Public inpcb lookup routines, accepting a 4-tuple. >> + */ >> +struct inpcb * >> +in_pcblookup(struct inpcbinfo *pcbinfo, struct in_addr faddr, u_int fport, >> + struct in_addr laddr, u_int lport, int lookupflags, struct ifnet *ifp) >> +{ >> + >> + KASSERT((lookupflags & ~INPLOOKUP_MASK) == 0, >> + ("%s: invalid lookup flags %d", __func__, lookupflags)); >> + KASSERT((lookupflags & (INPLOOKUP_RLOCKPCB | INPLOOKUP_WLOCKPCB)) != 0, >> + ("%s: LOCKPCB not set", __func__)); >> + >> + return (in_pcblookup_hash(pcbinfo, faddr, fport, laddr, lport, >> + lookupflags, ifp)); >> +} >> #endif /* INET */ >> >> /* >> @@ -1588,8 +1655,9 @@ in_pcbinshash(struct inpcb *inp) >> struct inpcbport *phd; >> u_int32_t hashkey_faddr; >> >> - INP_INFO_WLOCK_ASSERT(pcbinfo); >> INP_WLOCK_ASSERT(inp); >> + INP_HASH_WLOCK_ASSERT(pcbinfo); >> + >> KASSERT((inp->inp_flags & INP_INHASHLIST) == 0, >> ("in_pcbinshash: INP_INHASHLIST")); >> >> @@ -1645,8 +1713,9 @@ in_pcbrehash(struct inpcb *inp) >> struct inpcbhead *head; >> u_int32_t hashkey_faddr; >> >> - INP_INFO_WLOCK_ASSERT(pcbinfo); >> INP_WLOCK_ASSERT(inp); >> + INP_HASH_WLOCK_ASSERT(pcbinfo); >> + >> KASSERT(inp->inp_flags & INP_INHASHLIST, >> ("in_pcbrehash: !INP_INHASHLIST")); >> >> @@ -1679,12 +1748,14 @@ in_pcbremlists(struct inpcb *inp) >> if (inp->inp_flags & INP_INHASHLIST) { >> struct inpcbport *phd = inp->inp_phd; >> >> + INP_HASH_WLOCK(pcbinfo); >> LIST_REMOVE(inp, inp_hash); >> LIST_REMOVE(inp, inp_portlist); >> if (LIST_FIRST(&phd->phd_pcblist) == NULL) { >> LIST_REMOVE(phd, phd_hash); >> free(phd, M_PCB); >> } >> + INP_HASH_WUNLOCK(pcbinfo); >> inp->inp_flags &= ~INP_INHASHLIST; >> } >> LIST_REMOVE(inp, inp_list); >> >> Modified: head/sys/netinet/in_pcb.h >> ============================================================================== >> --- head/sys/netinet/in_pcb.h Mon May 30 09:41:38 2011 (r222487) >> +++ head/sys/netinet/in_pcb.h Mon May 30 09:43:55 2011 (r222488) >> @@ -268,22 +268,22 @@ struct inpcbport { >> * Global data structure for each high-level protocol (UDP, TCP, ...) in both >> * IPv4 and IPv6. Holds inpcb lists and information for managing them. >> * >> - * Each pcbinfo is protected by ipi_lock, covering mutable global fields (such >> - * as the global pcb list) and hashed lookup tables. The lock order is: >> + * Each pcbinfo is protected by two locks: ipi_lock and ipi_hash_lock, >> + * the former covering mutable global fields (such as the global pcb list), >> + * and the latter covering the hashed lookup tables. The lock order is: >> * >> - * ipi_lock (before) inpcb locks >> + * ipi_lock (before) inpcb locks (before) ipi_hash_lock >> * >> * Locking key: >> * >> * (c) Constant or nearly constant after initialisation >> * (g) Locked by ipi_lock >> - * (h) Read using either ipi_lock or inpcb lock; write requires both. >> + * (h) Read using either ipi_hash_lock or inpcb lock; write requires both. >> * (x) Synchronisation properties poorly defined >> */ >> struct inpcbinfo { >> /* >> - * Global lock protecting global inpcb list, inpcb count, hash tables, >> - * etc. >> + * Global lock protecting global inpcb list, inpcb count, etc. >> */ >> struct rwlock ipi_lock; >> >> @@ -312,17 +312,22 @@ struct inpcbinfo { >> struct uma_zone *ipi_zone; /* (c) */ >> >> /* >> + * Global lock protecting hash lookup tables. >> + */ >> + struct rwlock ipi_hash_lock; >> + >> + /* >> * Global hash of inpcbs, hashed by local and foreign addresses and >> * port numbers. >> */ >> - struct inpcbhead *ipi_hashbase; /* (g) */ >> - u_long ipi_hashmask; /* (g) */ >> + struct inpcbhead *ipi_hashbase; /* (h) */ >> + u_long ipi_hashmask; /* (h) */ >> >> /* >> * Global hash of inpcbs, hashed by only local port number. >> */ >> - struct inpcbporthead *ipi_porthashbase; /* (g) */ >> - u_long ipi_porthashmask; /* (g) */ >> + struct inpcbporthead *ipi_porthashbase; /* (h) */ >> + u_long ipi_porthashmask; /* (h) */ >> >> /* >> * Pointer to network stack instance >> @@ -406,6 +411,18 @@ void inp_4tuple_get(struct inpcb *inp, >> #define INP_INFO_WLOCK_ASSERT(ipi) rw_assert(&(ipi)->ipi_lock, RA_WLOCKED) >> #define INP_INFO_UNLOCK_ASSERT(ipi) rw_assert(&(ipi)->ipi_lock, RA_UNLOCKED) >> >> +#define INP_HASH_LOCK_INIT(ipi, d) \ >> + rw_init_flags(&(ipi)->ipi_hash_lock, (d), 0) >> +#define INP_HASH_LOCK_DESTROY(ipi) rw_destroy(&(ipi)->ipi_hash_lock) >> +#define INP_HASH_RLOCK(ipi) rw_rlock(&(ipi)->ipi_hash_lock) >> +#define INP_HASH_WLOCK(ipi) rw_wlock(&(ipi)->ipi_hash_lock) >> +#define INP_HASH_RUNLOCK(ipi) rw_runlock(&(ipi)->ipi_hash_lock) >> +#define INP_HASH_WUNLOCK(ipi) rw_wunlock(&(ipi)->ipi_hash_lock) >> +#define INP_HASH_LOCK_ASSERT(ipi) rw_assert(&(ipi)->ipi_hash_lock, \ >> + RA_LOCKED) >> +#define INP_HASH_WLOCK_ASSERT(ipi) rw_assert(&(ipi)->ipi_hash_lock, \ >> + RA_WLOCKED) >> + >> #define INP_PCBHASH(faddr, lport, fport, mask) \ >> (((faddr) ^ ((faddr) >> 16) ^ ntohs((lport) ^ (fport))) & (mask)) >> #define INP_PCBPORTHASH(lport, mask) \ >> @@ -466,7 +483,16 @@ void inp_4tuple_get(struct inpcb *inp, >> #define INP_LLE_VALID 0x00000001 /* cached lle is valid */ >> #define INP_RT_VALID 0x00000002 /* cached rtentry is valid */ >> >> -#define INPLOOKUP_WILDCARD 1 >> +/* >> + * Flags passed to in_pcblookup*() functions. >> + */ >> +#define INPLOOKUP_WILDCARD 0x00000001 /* Allow wildcard sockets. */ >> +#define INPLOOKUP_RLOCKPCB 0x00000002 /* Return inpcb read-locked. */ >> +#define INPLOOKUP_WLOCKPCB 0x00000004 /* Return inpcb write-locked. */ >> + >> +#define INPLOOKUP_MASK (INPLOOKUP_WILDCARD | INPLOOKUP_RLOCKPCB | \ >> + INPLOOKUP_WLOCKPCB) >> + >> #define sotoinpcb(so) ((struct inpcb *)(so)->so_pcb) >> #define sotoin6pcb(so) sotoinpcb(so) /* for KAME src sync over BSD*'s */ >> >> @@ -527,7 +553,7 @@ struct inpcb * >> in_pcblookup_local(struct inpcbinfo *, >> struct in_addr, u_short, int, struct ucred *); >> struct inpcb * >> - in_pcblookup_hash(struct inpcbinfo *, struct in_addr, u_int, >> + in_pcblookup(struct inpcbinfo *, struct in_addr, u_int, >> struct in_addr, u_int, int, struct ifnet *); >> void in_pcbnotifyall(struct inpcbinfo *pcbinfo, struct in_addr, >> int, struct inpcb *(*)(struct inpcb *, int)); >> >> Modified: head/sys/netinet/ip_divert.c >> ============================================================================== >> --- head/sys/netinet/ip_divert.c Mon May 30 09:41:38 2011 (r222487) >> +++ head/sys/netinet/ip_divert.c Mon May 30 09:43:55 2011 (r222488) >> @@ -659,9 +659,9 @@ div_pcblist(SYSCTL_HANDLER_ARGS) >> INP_INFO_WLOCK(&V_divcbinfo); >> for (i = 0; i < n; i++) { >> inp = inp_list[i]; >> - INP_WLOCK(inp); >> - if (!in_pcbrele(inp)) >> - INP_WUNLOCK(inp); >> + INP_RLOCK(inp); >> + if (!in_pcbrele_rlocked(inp)) >> + INP_RUNLOCK(inp); >> } >> INP_INFO_WUNLOCK(&V_divcbinfo); >> >> >> Modified: head/sys/netinet/ipfw/ip_fw2.c >> ============================================================================== >> --- head/sys/netinet/ipfw/ip_fw2.c Mon May 30 09:41:38 2011 (r222487) >> +++ head/sys/netinet/ipfw/ip_fw2.c Mon May 30 09:43:55 2011 (r222488) >> @@ -657,7 +657,7 @@ check_uidgid(ipfw_insn_u32 *insn, int pr >> (struct bsd_ucred *)uc, ugid_lookupp, ((struct mbuf *)inp)->m_skb); >> #else /* FreeBSD */ >> struct inpcbinfo *pi; >> - int wildcard; >> + int lookupflags; >> struct inpcb *pcb; >> int match; >> >> @@ -682,30 +682,31 @@ check_uidgid(ipfw_insn_u32 *insn, int pr >> if (*ugid_lookupp == -1) >> return (0); >> if (proto == IPPROTO_TCP) { >> - wildcard = 0; >> + lookupflags = 0; >> pi = &V_tcbinfo; >> } else if (proto == IPPROTO_UDP) { >> - wildcard = INPLOOKUP_WILDCARD; >> + lookupflags = INPLOOKUP_WILDCARD; >> pi = &V_udbinfo; >> } else >> return 0; >> + lookupflags |= INPLOOKUP_RLOCKPCB; >> match = 0; >> if (*ugid_lookupp == 0) { >> - INP_INFO_RLOCK(pi); >> pcb = (oif) ? >> - in_pcblookup_hash(pi, >> + in_pcblookup(pi, >> dst_ip, htons(dst_port), >> src_ip, htons(src_port), >> - wildcard, oif) : >> - in_pcblookup_hash(pi, >> + lookupflags, oif) : >> + in_pcblookup(pi, >> src_ip, htons(src_port), >> dst_ip, htons(dst_port), >> - wildcard, NULL); >> + lookupflags, NULL); >> if (pcb != NULL) { >> + INP_RLOCK_ASSERT(pcb); >> *uc = crhold(pcb->inp_cred); >> *ugid_lookupp = 1; >> + INP_RUNLOCK(pcb); >> } >> - INP_INFO_RUNLOCK(pi); >> if (*ugid_lookupp == 0) { >> /* >> * We tried and failed, set the variable to -1 >> @@ -1827,21 +1828,32 @@ do { \ >> else >> break; >> >> + /* >> + * XXXRW: so_user_cookie should almost >> + * certainly be inp_user_cookie? >> + */ >> + >> /* For incomming packet, lookup up the >> inpcb using the src/dest ip/port tuple */ >> if (inp == NULL) { >> - INP_INFO_RLOCK(pi); >> - inp = in_pcblookup_hash(pi, >> + inp = in_pcblookup(pi, >> src_ip, htons(src_port), >> dst_ip, htons(dst_port), >> - 0, NULL); >> - INP_INFO_RUNLOCK(pi); >> - } >> - >> - if (inp && inp->inp_socket) { >> - tablearg = inp->inp_socket->so_user_cookie; >> - if (tablearg) >> - match = 1; >> + INPLOOKUP_RLOCKPCB, NULL); >> + if (inp != NULL) { >> + tablearg = >> + inp->inp_socket->so_user_cookie; >> + if (tablearg) >> + match = 1; >> + INP_RUNLOCK(inp); >> + } >> + } else { >> + if (inp->inp_socket) { >> + tablearg = >> + inp->inp_socket->so_user_cookie; >> + if (tablearg) >> + match = 1; >> + } >> } >> break; >> } >> >> Modified: head/sys/netinet/raw_ip.c >> ============================================================================== >> --- head/sys/netinet/raw_ip.c Mon May 30 09:41:38 2011 (r222487) >> +++ head/sys/netinet/raw_ip.c Mon May 30 09:43:55 2011 (r222488) >> @@ -226,7 +226,7 @@ rip_append(struct inpcb *last, struct ip >> { >> int policyfail = 0; >> >> - INP_RLOCK_ASSERT(last); >> + INP_LOCK_ASSERT(last); >> >> #ifdef IPSEC >> /* check AH/ESP integrity. */ >> @@ -834,16 +834,19 @@ rip_detach(struct socket *so) >> static void >> rip_dodisconnect(struct socket *so, struct inpcb *inp) >> { >> + struct inpcbinfo *pcbinfo; >> >> - INP_INFO_WLOCK_ASSERT(inp->inp_pcbinfo); >> - INP_WLOCK_ASSERT(inp); >> - >> + pcbinfo = inp->inp_pcbinfo; >> + INP_INFO_WLOCK(pcbinfo); >> + INP_WLOCK(inp); >> rip_delhash(inp); >> inp->inp_faddr.s_addr = INADDR_ANY; >> rip_inshash(inp); >> SOCK_LOCK(so); >> so->so_state &= ~SS_ISCONNECTED; >> SOCK_UNLOCK(so); >> + INP_WUNLOCK(inp); >> + INP_INFO_WUNLOCK(pcbinfo); >> } >> >> static void >> @@ -854,11 +857,7 @@ rip_abort(struct socket *so) >> inp = sotoinpcb(so); >> KASSERT(inp != NULL, ("rip_abort: inp == NULL")); >> >> - INP_INFO_WLOCK(&V_ripcbinfo); >> - INP_WLOCK(inp); >> rip_dodisconnect(so, inp); >> - INP_WUNLOCK(inp); >> - INP_INFO_WUNLOCK(&V_ripcbinfo); >> } >> >> static void >> @@ -869,11 +868,7 @@ rip_close(struct socket *so) >> inp = sotoinpcb(so); >> KASSERT(inp != NULL, ("rip_close: inp == NULL")); >> >> - INP_INFO_WLOCK(&V_ripcbinfo); >> - INP_WLOCK(inp); >> rip_dodisconnect(so, inp); >> - INP_WUNLOCK(inp); >> - INP_INFO_WUNLOCK(&V_ripcbinfo); >> } >> >> static int >> @@ -887,11 +882,7 @@ rip_disconnect(struct socket *so) >> inp = sotoinpcb(so); >> KASSERT(inp != NULL, ("rip_disconnect: inp == NULL")); >> >> - INP_INFO_WLOCK(&V_ripcbinfo); >> - INP_WLOCK(inp); >> rip_dodisconnect(so, inp); >> - INP_WUNLOCK(inp); >> - INP_INFO_WUNLOCK(&V_ripcbinfo); >> return (0); >> } >> >> @@ -1077,9 +1068,9 @@ rip_pcblist(SYSCTL_HANDLER_ARGS) >> INP_INFO_WLOCK(&V_ripcbinfo); >> for (i = 0; i < n; i++) { >> inp = inp_list[i]; >> - INP_WLOCK(inp); >> - if (!in_pcbrele(inp)) >> - INP_WUNLOCK(inp); >> + INP_RLOCK(inp); >> + if (!in_pcbrele_rlocked(inp)) >> + INP_RUNLOCK(inp); >> } >> INP_INFO_WUNLOCK(&V_ripcbinfo); >> >> >> Modified: head/sys/netinet/siftr.c >> ============================================================================== >> --- head/sys/netinet/siftr.c Mon May 30 09:41:38 2011 (r222487) >> +++ head/sys/netinet/siftr.c Mon May 30 09:43:55 2011 (r222488) >> @@ -696,17 +696,16 @@ siftr_findinpcb(int ipver, struct ip *ip >> >> /* We need the tcbinfo lock. */ >> INP_INFO_UNLOCK_ASSERT(&V_tcbinfo); >> - INP_INFO_RLOCK(&V_tcbinfo); >> >> if (dir == PFIL_IN) >> inp = (ipver == INP_IPV4 ? >> - in_pcblookup_hash(&V_tcbinfo, ip->ip_src, sport, ip->ip_dst, >> - dport, 0, m->m_pkthdr.rcvif) >> + in_pcblookup(&V_tcbinfo, ip->ip_src, sport, ip->ip_dst, >> + dport, INPLOOKUP_RLOCKPCB, m->m_pkthdr.rcvif) >> : >> #ifdef SIFTR_IPV6 >> - in6_pcblookup_hash(&V_tcbinfo, >> + in6_pcblookup(&V_tcbinfo, >> &((struct ip6_hdr *)ip)->ip6_src, sport, >> - &((struct ip6_hdr *)ip)->ip6_dst, dport, 0, >> + &((struct ip6_hdr *)ip)->ip6_dst, dport, INPLOOKUP_RLOCKPCB, >> m->m_pkthdr.rcvif) >> #else >> NULL >> @@ -715,13 +714,13 @@ siftr_findinpcb(int ipver, struct ip *ip >> >> else >> inp = (ipver == INP_IPV4 ? >> - in_pcblookup_hash(&V_tcbinfo, ip->ip_dst, dport, ip->ip_src, >> - sport, 0, m->m_pkthdr.rcvif) >> + in_pcblookup(&V_tcbinfo, ip->ip_dst, dport, ip->ip_src, >> + sport, INPLOOKUP_RLOCKPCB, m->m_pkthdr.rcvif) >> : >> #ifdef SIFTR_IPV6 >> - in6_pcblookup_hash(&V_tcbinfo, >> + in6_pcblookup(&V_tcbinfo, >> &((struct ip6_hdr *)ip)->ip6_dst, dport, >> - &((struct ip6_hdr *)ip)->ip6_src, sport, 0, >> + &((struct ip6_hdr *)ip)->ip6_src, sport, INPLOOKUP_RLOCKPCB, >> m->m_pkthdr.rcvif) >> #else >> NULL >> @@ -734,12 +733,7 @@ siftr_findinpcb(int ipver, struct ip *ip >> ss->nskip_in_inpcb++; >> else >> ss->nskip_out_inpcb++; >> - } else { >> - /* Acquire the inpcb lock. */ >> - INP_UNLOCK_ASSERT(inp); >> - INP_RLOCK(inp); >> } >> - INP_INFO_RUNLOCK(&V_tcbinfo); >> >> return (inp); >> } >> >> Modified: head/sys/netinet/tcp_input.c >> ============================================================================== >> --- head/sys/netinet/tcp_input.c Mon May 30 09:41:38 2011 (r222487) >> +++ head/sys/netinet/tcp_input.c Mon May 30 09:43:55 2011 (r222488) >> @@ -5,6 +5,7 @@ >> * Swinburne University of Technology, Melbourne, Australia. >> * Copyright (c) 2009-2010 Lawrence Stewart >> * Copyright (c) 2010 The FreeBSD Foundation >> + * Copyright (c) 2010-2011 Juniper Networks, Inc. >> * All rights reserved. >> * >> * Portions of this software were developed at the Centre for Advanced Internet >> @@ -16,6 +17,9 @@ >> * Internet Architectures, Swinburne University of Technology, Melbourne, >> * Australia by David Hayes under sponsorship from the FreeBSD Foundation. >> * >> + * Portions of this software were developed by Robert N. M. Watson under >> + * contract to Juniper Networks, Inc. >> + * >> * Redistribution and use in source and binary forms, with or without >> * modification, are permitted provided that the following conditions >> * are met: >> @@ -197,10 +201,6 @@ SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, >> &VNET_NAME(tcp_autorcvbuf_max), 0, >> "Max size of automatic receive buffer"); >> >> -int tcp_read_locking = 1; >> -SYSCTL_INT(_net_inet_tcp, OID_AUTO, read_locking, CTLFLAG_RW, >> - &tcp_read_locking, 0, "Enable read locking strategy"); >> - >> VNET_DEFINE(struct inpcbhead, tcb); >> #define tcb6 tcb /* for KAME src sync over BSD*'s */ >> VNET_DEFINE(struct inpcbinfo, tcbinfo); >> @@ -591,8 +591,7 @@ tcp_input(struct mbuf *m, int off0) >> char *s = NULL; /* address and port logging */ >> int ti_locked; >> #define TI_UNLOCKED 1 >> -#define TI_RLOCKED 2 >> -#define TI_WLOCKED 3 >> +#define TI_WLOCKED 2 >> >> #ifdef TCPDEBUG >> /* >> @@ -756,30 +755,25 @@ tcp_input(struct mbuf *m, int off0) >> drop_hdrlen = off0 + off; >> >> /* >> - * Locate pcb for segment, which requires a lock on tcbinfo. >> - * Optimisticaly acquire a global read lock rather than a write lock >> - * unless header flags necessarily imply a state change. There are >> - * two cases where we might discover later we need a write lock >> - * despite the flags: ACKs moving a connection out of the syncache, >> - * and ACKs for a connection in TIMEWAIT. >> + * Locate pcb for segment; if we're likely to add or remove a >> + * connection then first acquire pcbinfo lock. There are two cases >> + * where we might discover later we need a write lock despite the >> + * flags: ACKs moving a connection out of the syncache, and ACKs for >> + * a connection in TIMEWAIT. >> */ >> - if ((thflags & (TH_SYN | TH_FIN | TH_RST)) != 0 || >> - tcp_read_locking == 0) { >> + if ((thflags & (TH_SYN | TH_FIN | TH_RST)) != 0) { >> INP_INFO_WLOCK(&V_tcbinfo); >> ti_locked = TI_WLOCKED; >> - } else { >> - INP_INFO_RLOCK(&V_tcbinfo); >> - ti_locked = TI_RLOCKED; >> - } >> + } else >> + ti_locked = TI_UNLOCKED; >> >> findpcb: >> #ifdef INVARIANTS >> - if (ti_locked == TI_RLOCKED) >> - INP_INFO_RLOCK_ASSERT(&V_tcbinfo); >> - else if (ti_locked == TI_WLOCKED) >> + if (ti_locked == TI_WLOCKED) { >> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); >> - else >> - panic("%s: findpcb ti_locked %d\n", __func__, ti_locked); >> + } else { >> + INP_INFO_UNLOCK_ASSERT(&V_tcbinfo); >> + } >> #endif >> >> #ifdef INET >> @@ -797,20 +791,18 @@ findpcb: >> * Transparently forwarded. Pretend to be the destination. >> * already got one like this? >> */ >> - inp = in_pcblookup_hash(&V_tcbinfo, >> - ip->ip_src, th->th_sport, >> - ip->ip_dst, th->th_dport, >> - 0, m->m_pkthdr.rcvif); >> + inp = in_pcblookup(&V_tcbinfo, ip->ip_src, th->th_sport, >> + ip->ip_dst, th->th_dport, INPLOOKUP_WLOCKPCB, >> + m->m_pkthdr.rcvif); >> if (!inp) { >> - /* It's new. Try to find the ambushing socket. */ >> - inp = in_pcblookup_hash(&V_tcbinfo, >> - ip->ip_src, th->th_sport, >> - next_hop->sin_addr, >> - next_hop->sin_port ? >> - ntohs(next_hop->sin_port) : >> - th->th_dport, >> - INPLOOKUP_WILDCARD, >> - m->m_pkthdr.rcvif); >> + /* >> + * It's new. Try to find the ambushing socket. >> + */ >> + inp = in_pcblookup(&V_tcbinfo, ip->ip_src, >> + th->th_sport, next_hop->sin_addr, >> + next_hop->sin_port ? ntohs(next_hop->sin_port) : >> + th->th_dport, INPLOOKUP_WILDCARD | >> + INPLOOKUP_WLOCKPCB, m->m_pkthdr.rcvif); >> } >> /* Remove the tag from the packet. We don't need it anymore. */ >> m_tag_delete(m, fwd_tag); >> @@ -820,21 +812,19 @@ findpcb: >> { >> #ifdef INET6 >> if (isipv6) >> - inp = in6_pcblookup_hash(&V_tcbinfo, >> - &ip6->ip6_src, th->th_sport, >> - &ip6->ip6_dst, th->th_dport, >> - INPLOOKUP_WILDCARD, >> - m->m_pkthdr.rcvif); >> + inp = in6_pcblookup(&V_tcbinfo, &ip6->ip6_src, >> + th->th_sport, &ip6->ip6_dst, th->th_dport, >> + INPLOOKUP_WILDCARD | INPLOOKUP_WLOCKPCB, >> + m->m_pkthdr.rcvif); >> #endif >> #if defined(INET) && defined(INET6) >> else >> #endif >> #ifdef INET >> - inp = in_pcblookup_hash(&V_tcbinfo, >> - ip->ip_src, th->th_sport, >> - ip->ip_dst, th->th_dport, >> - INPLOOKUP_WILDCARD, >> - m->m_pkthdr.rcvif); >> + inp = in_pcblookup(&V_tcbinfo, ip->ip_src, >> + th->th_sport, ip->ip_dst, th->th_dport, >> + INPLOOKUP_WILDCARD | INPLOOKUP_WLOCKPCB, >> + m->m_pkthdr.rcvif); >> #endif >> } >> >> @@ -865,7 +855,7 @@ findpcb: >> rstreason = BANDLIM_RST_CLOSEDPORT; >> goto dropwithreset; >> } >> - INP_WLOCK(inp); >> + INP_WLOCK_ASSERT(inp); >> if (!(inp->inp_flags & INP_HW_FLOWID) >> && (m->m_flags & M_FLOWID) >> && ((inp->inp_socket == NULL) >> @@ -906,28 +896,26 @@ findpcb: >> * legitimate new connection attempt the old INPCB gets removed and >> * we can try again to find a listening socket. >> * >> - * At this point, due to earlier optimism, we may hold a read lock on >> - * the inpcbinfo, rather than a write lock. If so, we need to >> - * upgrade, or if that fails, acquire a reference on the inpcb, drop >> - * all locks, acquire a global write lock, and then re-acquire the >> - * inpcb lock. We may at that point discover that another thread has >> - * tried to free the inpcb, in which case we need to loop back and >> - * try to find a new inpcb to deliver to. >> + * At this point, due to earlier optimism, we may hold only an inpcb >> + * lock, and not the inpcbinfo write lock. If so, we need to try to >> + * acquire it, or if that fails, acquire a reference on the inpcb, >> + * drop all locks, acquire a global write lock, and then re-acquire >> + * the inpcb lock. We may at that point discover that another thread >> + * has tried to free the inpcb, in which case we need to loop back >> + * and try to find a new inpcb to deliver to. >> + * >> + * XXXRW: It may be time to rethink timewait locking. >> */ >> relocked: >> if (inp->inp_flags & INP_TIMEWAIT) { >> - KASSERT(ti_locked == TI_RLOCKED || ti_locked == TI_WLOCKED, >> - ("%s: INP_TIMEWAIT ti_locked %d", __func__, ti_locked)); >> - >> - if (ti_locked == TI_RLOCKED) { >> - if (INP_INFO_TRY_UPGRADE(&V_tcbinfo) == 0) { >> + if (ti_locked == TI_UNLOCKED) { >> + if (INP_INFO_TRY_WLOCK(&V_tcbinfo) == 0) { >> in_pcbref(inp); >> INP_WUNLOCK(inp); >> - INP_INFO_RUNLOCK(&V_tcbinfo); >> INP_INFO_WLOCK(&V_tcbinfo); >> ti_locked = TI_WLOCKED; >> INP_WLOCK(inp); >> - if (in_pcbrele(inp)) { >> + if (in_pcbrele_wlocked(inp)) { >> inp = NULL; >> goto findpcb; >> } >> @@ -975,26 +963,24 @@ relocked: >> >> /* >> * We've identified a valid inpcb, but it could be that we need an >> - * inpcbinfo write lock and have only a read lock. In this case, >> - * attempt to upgrade/relock using the same strategy as the TIMEWAIT >> - * case above. If we relock, we have to jump back to 'relocked' as >> - * the connection might now be in TIMEWAIT. >> - */ >> - if (tp->t_state != TCPS_ESTABLISHED || >> - (thflags & (TH_SYN | TH_FIN | TH_RST)) != 0 || >> - tcp_read_locking == 0) { >> - KASSERT(ti_locked == TI_RLOCKED || ti_locked == TI_WLOCKED, >> - ("%s: upgrade check ti_locked %d", __func__, ti_locked)); >> - >> - if (ti_locked == TI_RLOCKED) { >> - if (INP_INFO_TRY_UPGRADE(&V_tcbinfo) == 0) { >> + * inpcbinfo write lock but don't hold it. In this case, attempt to >> + * acquire using the same strategy as the TIMEWAIT case above. If we >> + * relock, we have to jump back to 'relocked' as the connection might >> + * now be in TIMEWAIT. >> + */ >> +#ifdef INVARIANTS >> + if ((thflags & (TH_SYN | TH_FIN | TH_RST)) != 0) >> + INP_INFO_WLOCK_ASSERT(&V_tcbinfo); >> +#endif >> + if (tp->t_state != TCPS_ESTABLISHED) { >> + if (ti_locked == TI_UNLOCKED) { >> + if (INP_INFO_TRY_WLOCK(&V_tcbinfo) == 0) { >> in_pcbref(inp); >> INP_WUNLOCK(inp); >> - INP_INFO_RUNLOCK(&V_tcbinfo); >> INP_INFO_WLOCK(&V_tcbinfo); >> ti_locked = TI_WLOCKED; >> INP_WLOCK(inp); >> - if (in_pcbrele(inp)) { >> + if (in_pcbrele_wlocked(inp)) { >> inp = NULL; >> goto findpcb; >> } >> @@ -1027,13 +1013,16 @@ relocked: >> /* >> * When the socket is accepting connections (the INPCB is in LISTEN >> * state) we look into the SYN cache if this is a new connection >> - * attempt or the completion of a previous one. >> + * attempt or the completion of a previous one. Because listen >> + * sockets are never in TCPS_ESTABLISHED, the V_tcbinfo lock will be >> + * held in this case. >> */ >> if (so->so_options & SO_ACCEPTCONN) { >> struct in_conninfo inc; >> >> KASSERT(tp->t_state == TCPS_LISTEN, ("%s: so accepting but " >> "tp not listening", __func__)); >> + INP_INFO_WLOCK_ASSERT(&V_tcbinfo); >> >> bzero(&inc, sizeof(inc)); >> #ifdef INET6 >> @@ -1371,13 +1360,17 @@ relocked: >> return; >> >> dropwithreset: >> - if (ti_locked == TI_RLOCKED) >> - INP_INFO_RUNLOCK(&V_tcbinfo); >> - else if (ti_locked == TI_WLOCKED) >> + if (ti_locked == TI_WLOCKED) { >> INP_INFO_WUNLOCK(&V_tcbinfo); >> - else >> - panic("%s: dropwithreset ti_locked %d", __func__, ti_locked); >> - ti_locked = TI_UNLOCKED; >> + ti_locked = TI_UNLOCKED; >> + } >> +#ifdef INVARIANTS >> + else { >> + KASSERT(ti_locked == TI_UNLOCKED, ("%s: dropwithreset " >> + "ti_locked: %d", __func__, ti_locked)); >> + INP_INFO_UNLOCK_ASSERT(&V_tcbinfo); >> + } >> +#endif >> >> if (inp != NULL) { >> tcp_dropwithreset(m, th, tp, tlen, rstreason); >> @@ -1388,13 +1381,17 @@ dropwithreset: >> goto drop; >> >> dropunlock: >> - if (ti_locked == TI_RLOCKED) >> - INP_INFO_RUNLOCK(&V_tcbinfo); >> - else if (ti_locked == TI_WLOCKED) >> + if (ti_locked == TI_WLOCKED) { >> INP_INFO_WUNLOCK(&V_tcbinfo); >> - else >> - panic("%s: dropunlock ti_locked %d", __func__, ti_locked); >> - ti_locked = TI_UNLOCKED; >> + ti_locked = TI_UNLOCKED; >> + } >> +#ifdef INVARIANTS >> + else { >> + KASSERT(ti_locked == TI_UNLOCKED, ("%s: dropunlock " >> + "ti_locked: %d", __func__, ti_locked)); >> + INP_INFO_UNLOCK_ASSERT(&V_tcbinfo); >> + } >> +#endif >> >> if (inp != NULL) >> INP_WUNLOCK(inp); >> @@ -1449,13 +1446,13 @@ tcp_do_segment(struct mbuf *m, struct tc >> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); >> } else { >> #ifdef INVARIANTS >> - if (ti_locked == TI_RLOCKED) >> - INP_INFO_RLOCK_ASSERT(&V_tcbinfo); >> >> *** DIFF OUTPUT TRUNCATED AT 1000 LINES *** >> _______________________________________________ >> svn-src-head@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/svn-src-head >> To unsubscribe, send any mail to "svn-src-head-unsubscribe@freebsd.org" >> >