Date: Wed, 19 Jul 2000 09:59:22 -0400 (EDT) From: Bosko Milekic <bmilekic@dsuper.net> To: Alfred Perlstein <bright@wintelcom.net> Cc: net@FreeBSD.ORG Subject: Re: kern/19866: The mbuf subsystem refcount stuff. Message-ID: <Pine.BSF.4.21.0007190941260.40059-100000@jehovah.technokratis.com> In-Reply-To: <20000718184511.O13979@fw.wintelcom.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 18 Jul 2000, Alfred Perlstein wrote:
> moved over to -net.
[...]
> Ok here's an idea that I was tossing around:
>
> you have a freelist of these guys, one for each mbuf header,
> you can make less and deal with failure by sleeping or failing
> as you want, but for the sake of example one is available for
> each mbuf at any given time.
>
> struct mclrefcnt {
> struct mclrefcnt *mextr_next; /* XXX: use list macros :) */
> atomic_t mextr_refcnt;
> };
Okay, following our discussion yesterday, I can understand why your
suggestion is valid, especially given the brief SMP outline and
future changes that are to come.
There is something that I recently thought of, following a brief
beginning in modifying and implementing this, though: the
mclrefcnt/mextrefcnt/whatever we want to call it is on a singly linked
list to which additions and removes will be made via the head. Therefore,
one will have to splimp(), or lock multiple CPUs in the SMP case, when
tampering with this list, and this would be during certain reference
increments. It's not as bad as the linked list case, which will have to
do this during every single additional reference, because it will only
need to do it during the first reference call, which will probably come
from the allocation macro, which will be under splimp() (or SMP
equivalent multiple CPU lock) anyway. Correct?
Also, I'm curious at this point as to how some of this stuff is
done in BSD/OS, but I don't have access/can't see the source. Any
pointers/explanations/brief outlines? I'm mainly curious as to the SMP
stuff vs. mbuf pools, whether they are per CPU, etc. Alfred, now that I
think about it, I would eventually like to try the per-CPU list idea, to
see how well it will scale on an SMP machine with relatively heavy network
load, and I'll probably want to do this before I finalize the resource
freeing of mbuf-used pages outline.
> and a pointer to one in each mbuf.
>
> when you want to make a copy your code looks something like this:
>
> /* copying m into n */
>
> if (m->rp == NULL && n->rp == NULL) {
> m->rp = n->rp = mclrefcnt_alloc();
> } else if (m->rp == NULL) {
> m->rp = n->rp;
> } else if (n->rp == NULL) {
> n->rp = m->rp;
> } else {
> mclrefcnt_free(n->rp);
> n->rp = m->rp;
> }
> atomic_inc(m->rp.mextr_refcnt);
>
>
> /* freeing m */
>
> /* x must get the value I reduced it to */
> x = atomic_dec_and_fetch(m->rp.mextr_refcnt);
> if (x == 0) {
> /* do extfree callback */
> } else {
> m->rp = NULL;
> }
> /* free mbuf header */
>
> Are there problems you see with that?
Not by just glancing at it, although the copying of the mbuf code
above can be more concise, as you know you're copying say m->m_ext into
n->m_ext and incrementing the ref count, so it's not really that tough,
as far as I can see right now. To "copy m into n" you just do something
like:
n->m_ext = m->m_ext;
n->m_ext->mextrefcnt++;
...and you're on your way. As we discussed this, you don't have to worry
about something else freeing a reference and consequently freeing the
mextrefcnt node and ext_buf, because you know that here you have control
of both m and n, both of which refer to the same ext_buf, so the refcnt
cannot drop below 1, while you're going to atomically increment.
You mentionned a race condition in the free code above, yesterday,
but I don't exactly recall what. If the refcnt reaches zero above though,
you'll have to free the mextrefcnt/mclrefcnt/whatever structure too, at
least back to its respective free list. Note that if in the SMP case we
decide to split up the free lists, this is one list that will have to
remain global, as we may have to deal with mbufs from different CPU free
lists referring the same ext_buf's as some allocated from different
CPU mbuf free lists.
> --
> -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
> "I have the heart of a child; I keep it in a jar on my desk."
Cheers,
Bosko.
--
Bosko Milekic * Voice/Mobile: 514.865.7738 * Pager: 514.921.0237
bmilekic@technokratis.com * http://www.technokratis.com/
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0007190941260.40059-100000>
