Date: Tue, 2 Jul 2002 08:04:23 -0400 From: Bosko Milekic <bmilekic@unixdaemons.com> To: Luigi Rizzo <rizzo@icir.org> Cc: net@FreeBSD.ORG Subject: Re: Mbuf allocator performance (was Should we keep a cache of mbuf+cluster ready for use ?) Message-ID: <20020702080423.A69693@unixdaemons.com> In-Reply-To: <20020702001050.B2250@iguana.icir.org>; from rizzo@icir.org on Tue, Jul 02, 2002 at 12:10:50AM -0700 References: <20020702001050.B2250@iguana.icir.org>
next in thread | previous in thread | raw e-mail | index | archive | help
There are several problems with your "simple" code that make it simpler
than what's in -CURRENT and in -STABLE and therefore yield [obviously]
seemingly better times.
1) sleeping in the case of a system exhausted of mbufs and clusters is
now broken because code that frees to this "combined" list will not
generate wake ups; even if you build this functionality in, you would
have to code the wakeup for a "special" case where one obtains both a
cluster and an mbuf; it'll get complicated.
2) no per-CPU caches, you bring contention (in -CURRENT) from per-CPU
back to global, which you probably don't care about now, but which will
matter when Giant goes completely.
3) you violate write/read producer/consumer model. In -CURRENT, we try
to not have the code doing the freeing write to any part of the mbuf,
thus hopefully avoiding cache invalidation from occuring during a free
that happens from a thread that doesn't usually write to the mbuf at
all.
I think that because of (1) and (2), especially, and because the kernel
in -CURRENT is the way it is right now, you're getting better
performance numbers. Try exhausting the system and seeing how far you
get. I suspect you'll eventually just hang (because you'll allocate a
bunch of stuff violating the real interface and then won't be able to
wakeup and Do The Right Thing).
While, as I said before, I agree that we should have an interface that
allocates both an mbuf and a cluster, I don't think that this is the
right approach. I believe that the correct approach would be to couple
the mbuf and cluster allocation under a single per-CPU lock/unlock
operation, which isn't too difficult to do.
On Tue, Jul 02, 2002 at 12:10:50AM -0700, Luigi Rizzo wrote:
> [Bcc to -current as relevant there]
>
> As a followup to my question below, i did some simple experiment
> on a -current box acting as a router, using two "em" cards and
> DEVICE_POLLING (just for stability).
>
> The whole of the code is basically below -- the receiving
> side tries to grab the mbuf+cluster from the free pool first,
> and falls back to the standard method on failure; the sending
> side tries to attach the freed buffer to the free pool
> if it is the case.
>
> There is a simplification here with respect to what could
> go into an m_freem()/mgethdr_cluster() pair because the driver
> is already holding a lock, on Giant in this case, so you do not need
> further locking. Still, to come to the data points:
>
> CURRENT STABLE (*)
>
> fastfwd=1, pool_max=0 276kpps 365 kpps
> fastfwd=1, pool_max=50 355kpps 383 kpps
>
> fastfwd=0, pool_max=0 195 pps 142 kpps
> fastfwd=0, pool_max=50 227 kpps 146 kpps
>
> (*) This version of STABLE that I am using for comparison has some
> proprietary optimizations which make it a bit faster than normal.
> However it still uses the old ipfw code, which is invoked when
> fastfwd=0, and is significantly slower than the new one.
>
> Now this really seems to call for adding this interface into the
> mbuf subsystem. I believe we have to find a name for the allocator
> (the deallocator might well go into m_freem(), depending on how we
> implement the locking) and whether it makes sense to lock
> mgethdr_cluster() under per-cpu locks or under Giant, or even let
> the caller make sure that it holds the proper lock before trying
> to invoke the procedure (as i expect the "producers" or "consumers"
> of these pairs to be located in the network stack, chances are that
> they already hold a lock on Giant).
>
>
> cheers
> luigi
>
>
> The code:
>
> struct mbuf *em_pool;
>
> static int em_pool_max = 50;
> SYSCTL_INT(_hw, OID_AUTO, em_pool_max, CTLFLAG_RW,
> &em_pool_max,0,"max size of mbuf pool");
> static int em_pool_now;
> SYSCTL_INT(_hw, OID_AUTO, em_pool_now, CTLFLAG_RD,
> &em_pool_now,0,"Current size of mbuf pool");
>
> ... in em_get_buf() ..
>
> if (em_pool) {
> mp = em_pool;
> em_pool = mp->m_nextpkt;
> em_pool_now--;
> goto have_it;
> }
>
> ... in em_clean_transmit_interrupts() ...
> if ((m = tx_buffer->m_head)) {
> if (em_pool_now < em_pool_max &&
> m->m_next == NULL &&
> m->m_flags & M_EXT &&
> M_WRITABLE(m) ) {
> m->m_nextpkt = em_pool;
> em_pool = m;
> em_pool_now++;
> } else
> m_freem(m);
> tx_buffer->m_head = NULL;
> }
Regards,
--
Bosko Milekic
bmilekic@unixdaemons.com
bmilekic@FreeBSD.org
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020702080423.A69693>
