Date: Mon, 21 May 2018 09:54:41 -0400 From: "Jonathan T. Looney" <jtl@freebsd.org> To: Matt Macy <mmacy@freebsd.org> Cc: src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r333915 - head/sys/netinet Message-ID: <CADrOrmt5XbueWfFXNwb1Kv-O=KpNLFBi0AaSu0aeczOtXJ4Nfg@mail.gmail.com> In-Reply-To: <201805200438.w4K4c4BQ073120@repo.freebsd.org> References: <201805200438.w4K4c4BQ073120@repo.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Summary: I'm not sure this addresses all the subtleties of INPCB destruction. I think it would benefit from wider review. I suggest it be reverted in the meantime. On Sun, May 20, 2018 at 12:38 AM, Matt Macy <mmacy@freebsd.org> wrote: > + > +/* > + * Unconditionally schedule an inpcb to be freed by decrementing its > + * reference count, which should occur only after the inpcb has been > detached > + * from its socket. If another thread holds a temporary reference > (acquired > + * using in_pcbref()) then the free is deferred until that reference is > + * released using in_pcbrele(), but the inpcb is still unlocked. Almost > all > + * work, including removal from global lists, is done in this context, > where > + * the pcbinfo lock is held. > + */ > +void > +in_pcbfree(struct inpcb *inp) > +{ > + struct inpcbinfo *pcbinfo = inp->inp_pcbinfo; > + > + KASSERT(inp->inp_socket == NULL, ("%s: inp_socket != NULL", > __func__)); > + KASSERT((inp->inp_flags2 & INP_FREED) == 0, > + ("%s: called twice for pcb %p", __func__, inp)); > + if (inp->inp_flags2 & INP_FREED) { > INP_WUNLOCK(inp); > + return; > + } > This check no longer serves its purpose; however, I believe it points to a deeper problem: INP_FREED is not set until the deferred epoch context. So, other threads may not know that another thread has already called in_pcbfree. This changes things, such as the behavior of in_pcbrele_[rw]locked(). In the worst case, if someone calls in_pcbfree() twice, the two calls to in_pcbremlists() will result in running `LIST_REMOVE(inp, inp_list)` twice, resulting in memory corruption. I haven't undertaken to audit all the code. However, the fact that in_pcbrele_[rw]locked() will no longer immediately return 1 may change logic such that this is now possible. > + > +#ifdef INVARIANTS > + if (pcbinfo == &V_tcbinfo) { > + INP_INFO_LOCK_ASSERT(pcbinfo); > + } else { > + INP_INFO_WLOCK_ASSERT(pcbinfo); > + } > +#endif > + INP_WLOCK_ASSERT(inp); > + /* Remove first from list */ > + INP_LIST_WLOCK(pcbinfo); > + inp->inp_gencnt = ++pcbinfo->ipi_gencnt; > + in_pcbremlists(inp); > in_pcbremlists() also increments inp->inp_gencnt(); therefore, it is not necessary to increment it above. > + INP_LIST_WUNLOCK(pcbinfo); > + RO_INVALIDATE_CACHE(&inp->inp_route); > + INP_WUNLOCK(inp); > + epoch_call(net_epoch_preempt, &inp->inp_epoch_ctx, > in_pcbfree_deferred); > } > > /* > > Modified: head/sys/netinet/in_pcb.h > ============================================================ > ================== > --- head/sys/netinet/in_pcb.h Sun May 20 04:32:48 2018 (r333914) > +++ head/sys/netinet/in_pcb.h Sun May 20 04:38:04 2018 (r333915) > @@ -328,6 +328,7 @@ struct inpcb { > LIST_ENTRY(inpcb) inp_list; /* (p/l) list for all PCBs for > proto */ > /* (p[w]) for list iteration */ > /* (p[r]/l) for addition/removal */ > + struct epoch_context inp_epoch_ctx; > }; > #endif /* _KERNEL */ > > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CADrOrmt5XbueWfFXNwb1Kv-O=KpNLFBi0AaSu0aeczOtXJ4Nfg>