Date: Tue, 2 Jul 2002 09:03:54 -0400 From: Bosko Milekic <bmilekic@unixdaemons.com> To: Luigi Rizzo <rizzo@icir.org> Cc: net@FreeBSD.ORG Subject: Re: Mbuf allocator performance (was Should we keep a cache of mbuf+cluster ready for use ?) Message-ID: <20020702090354.A78632@unixdaemons.com> In-Reply-To: <20020702080423.A69693@unixdaemons.com>; from bmilekic@unixdaemons.com on Tue, Jul 02, 2002 at 08:04:23AM -0400 References: <20020702001050.B2250@iguana.icir.org> <20020702080423.A69693@unixdaemons.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jul 02, 2002 at 08:04:23AM -0400, Bosko Milekic wrote:
>
> There are several problems with your "simple" code that make it simpler
> than what's in -CURRENT and in -STABLE and therefore yield [obviously]
> seemingly better times.
>
> 1) sleeping in the case of a system exhausted of mbufs and clusters is
> now broken because code that frees to this "combined" list will not
> generate wake ups; even if you build this functionality in, you would
> have to code the wakeup for a "special" case where one obtains both a
> cluster and an mbuf; it'll get complicated.
>
> 2) no per-CPU caches, you bring contention (in -CURRENT) from per-CPU
> back to global, which you probably don't care about now, but which will
> matter when Giant goes completely.
>
> 3) you violate write/read producer/consumer model. In -CURRENT, we try
> to not have the code doing the freeing write to any part of the mbuf,
> thus hopefully avoiding cache invalidation from occuring during a free
> that happens from a thread that doesn't usually write to the mbuf at
> all.
Forgot, also, in -CURRENT:
4) breaks framework in mb_alloc that groups pages of mbufs and
clusters together allowing them to be freed in the future, if/when we
decide to implement lazy freeing; it would do the same thing if we
mvoed mbuf allocations to UMA, because grouping together mbufs and
clusters on a separate list will violate that scheme.
> I think that because of (1) and (2), especially, and because the kernel
> in -CURRENT is the way it is right now, you're getting better
> performance numbers. Try exhausting the system and seeing how far you
> get. I suspect you'll eventually just hang (because you'll allocate a
> bunch of stuff violating the real interface and then won't be able to
> wakeup and Do The Right Thing).
>
> While, as I said before, I agree that we should have an interface that
> allocates both an mbuf and a cluster, I don't think that this is the
> right approach. I believe that the correct approach would be to couple
> the mbuf and cluster allocation under a single per-CPU lock/unlock
> operation, which isn't too difficult to do.
>
> On Tue, Jul 02, 2002 at 12:10:50AM -0700, Luigi Rizzo wrote:
> > [Bcc to -current as relevant there]
> >
> > As a followup to my question below, i did some simple experiment
> > on a -current box acting as a router, using two "em" cards and
> > DEVICE_POLLING (just for stability).
> >
> > The whole of the code is basically below -- the receiving
> > side tries to grab the mbuf+cluster from the free pool first,
> > and falls back to the standard method on failure; the sending
> > side tries to attach the freed buffer to the free pool
> > if it is the case.
> >
> > There is a simplification here with respect to what could
> > go into an m_freem()/mgethdr_cluster() pair because the driver
> > is already holding a lock, on Giant in this case, so you do not need
> > further locking. Still, to come to the data points:
> >
> > CURRENT STABLE (*)
> >
> > fastfwd=1, pool_max=0 276kpps 365 kpps
> > fastfwd=1, pool_max=50 355kpps 383 kpps
> >
> > fastfwd=0, pool_max=0 195 pps 142 kpps
> > fastfwd=0, pool_max=50 227 kpps 146 kpps
> >
> > (*) This version of STABLE that I am using for comparison has some
> > proprietary optimizations which make it a bit faster than normal.
> > However it still uses the old ipfw code, which is invoked when
> > fastfwd=0, and is significantly slower than the new one.
> >
> > Now this really seems to call for adding this interface into the
> > mbuf subsystem. I believe we have to find a name for the allocator
> > (the deallocator might well go into m_freem(), depending on how we
> > implement the locking) and whether it makes sense to lock
> > mgethdr_cluster() under per-cpu locks or under Giant, or even let
> > the caller make sure that it holds the proper lock before trying
> > to invoke the procedure (as i expect the "producers" or "consumers"
> > of these pairs to be located in the network stack, chances are that
> > they already hold a lock on Giant).
> >
> >
> > cheers
> > luigi
> >
> >
> > The code:
> >
> > struct mbuf *em_pool;
> >
> > static int em_pool_max = 50;
> > SYSCTL_INT(_hw, OID_AUTO, em_pool_max, CTLFLAG_RW,
> > &em_pool_max,0,"max size of mbuf pool");
> > static int em_pool_now;
> > SYSCTL_INT(_hw, OID_AUTO, em_pool_now, CTLFLAG_RD,
> > &em_pool_now,0,"Current size of mbuf pool");
> >
> > ... in em_get_buf() ..
> >
> > if (em_pool) {
> > mp = em_pool;
> > em_pool = mp->m_nextpkt;
> > em_pool_now--;
> > goto have_it;
> > }
> >
> > ... in em_clean_transmit_interrupts() ...
> > if ((m = tx_buffer->m_head)) {
> > if (em_pool_now < em_pool_max &&
> > m->m_next == NULL &&
> > m->m_flags & M_EXT &&
> > M_WRITABLE(m) ) {
> > m->m_nextpkt = em_pool;
> > em_pool = m;
> > em_pool_now++;
> > } else
> > m_freem(m);
> > tx_buffer->m_head = NULL;
> > }
>
> Regards,
> --
> Bosko Milekic
> bmilekic@unixdaemons.com
> bmilekic@FreeBSD.org
>
>
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-net" in the body of the message
>
--
Bosko Milekic
bmilekic@unixdaemons.com
bmilekic@FreeBSD.org
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020702090354.A78632>
