Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 21 May 2018 09:54:41 -0400
From:      "Jonathan T. Looney" <jtl@freebsd.org>
To:        Matt Macy <mmacy@freebsd.org>
Cc:        src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org,  svn-src-head@freebsd.org
Subject:   Re: svn commit: r333915 - head/sys/netinet
Message-ID:  <CADrOrmt5XbueWfFXNwb1Kv-O=KpNLFBi0AaSu0aeczOtXJ4Nfg@mail.gmail.com>
In-Reply-To: <201805200438.w4K4c4BQ073120@repo.freebsd.org>
References:  <201805200438.w4K4c4BQ073120@repo.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Summary: I'm not sure this addresses all the subtleties of INPCB
destruction. I think it would benefit from wider review. I suggest it be
reverted in the meantime.

On Sun, May 20, 2018 at 12:38 AM, Matt Macy <mmacy@freebsd.org> wrote:

> +
> +/*
> + * Unconditionally schedule an inpcb to be freed by decrementing its
> + * reference count, which should occur only after the inpcb has been
> detached
> + * from its socket.  If another thread holds a temporary reference
> (acquired
> + * using in_pcbref()) then the free is deferred until that reference is
> + * released using in_pcbrele(), but the inpcb is still unlocked.  Almost
> all
> + * work, including removal from global lists, is done in this context,
> where
> + * the pcbinfo lock is held.
> + */
> +void
> +in_pcbfree(struct inpcb *inp)
> +{
> +       struct inpcbinfo *pcbinfo = inp->inp_pcbinfo;
> +
> +       KASSERT(inp->inp_socket == NULL, ("%s: inp_socket != NULL",
> __func__));
> +       KASSERT((inp->inp_flags2 & INP_FREED) == 0,
> +           ("%s: called twice for pcb %p", __func__, inp));
> +       if (inp->inp_flags2 & INP_FREED) {
>                 INP_WUNLOCK(inp);
> +               return;
> +       }
>

This check no longer serves its purpose; however, I believe it points to a
deeper problem: INP_FREED is not set until the deferred epoch context. So,
other threads may not know that another thread has already called
in_pcbfree.

This changes things, such as the behavior of in_pcbrele_[rw]locked().

In the worst case, if someone calls in_pcbfree() twice, the two calls to
in_pcbremlists() will result in running `LIST_REMOVE(inp, inp_list)` twice,
resulting in memory corruption.

I haven't undertaken to audit all the code. However, the fact that
in_pcbrele_[rw]locked() will no longer immediately return 1 may change
logic such that this is now possible.


> +
> +#ifdef INVARIANTS
> +       if (pcbinfo == &V_tcbinfo) {
> +               INP_INFO_LOCK_ASSERT(pcbinfo);
> +       } else {
> +               INP_INFO_WLOCK_ASSERT(pcbinfo);
> +       }
> +#endif
> +       INP_WLOCK_ASSERT(inp);
> +       /* Remove first from list */
> +       INP_LIST_WLOCK(pcbinfo);
> +       inp->inp_gencnt = ++pcbinfo->ipi_gencnt;
> +       in_pcbremlists(inp);
>

in_pcbremlists() also increments inp->inp_gencnt(); therefore, it is not
necessary to increment it above.



> +       INP_LIST_WUNLOCK(pcbinfo);
> +       RO_INVALIDATE_CACHE(&inp->inp_route);
> +       INP_WUNLOCK(inp);
> +       epoch_call(net_epoch_preempt, &inp->inp_epoch_ctx,
> in_pcbfree_deferred);
>  }
>
>  /*
>
> Modified: head/sys/netinet/in_pcb.h
> ============================================================
> ==================
> --- head/sys/netinet/in_pcb.h   Sun May 20 04:32:48 2018        (r333914)
> +++ head/sys/netinet/in_pcb.h   Sun May 20 04:38:04 2018        (r333915)
> @@ -328,6 +328,7 @@ struct inpcb {
>         LIST_ENTRY(inpcb) inp_list;     /* (p/l) list for all PCBs for
> proto */
>                                         /* (p[w]) for list iteration */
>                                         /* (p[r]/l) for addition/removal */
> +       struct epoch_context inp_epoch_ctx;
>  };
>  #endif /* _KERNEL */
>
>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CADrOrmt5XbueWfFXNwb1Kv-O=KpNLFBi0AaSu0aeczOtXJ4Nfg>