FreeBSD Mail Archives

Date:      Tue, 2 Jul 2002 08:04:23 -0400
From:      Bosko Milekic <bmilekic@unixdaemons.com>
To:        Luigi Rizzo <rizzo@icir.org>
Cc:        net@FreeBSD.ORG
Subject:   Re: Mbuf allocator performance (was Should we keep a cache of mbuf+cluster ready for use ?)
Message-ID:  <20020702080423.A69693@unixdaemons.com>
In-Reply-To: <20020702001050.B2250@iguana.icir.org>; from rizzo@icir.org on Tue, Jul 02, 2002 at 12:10:50AM -0700
References:  <20020702001050.B2250@iguana.icir.org>

 There are several problems with your "simple" code that make it simpler
 than what's in -CURRENT and in -STABLE and therefore yield [obviously]
 seemingly better times.

 1) sleeping in the case of a system exhausted of mbufs and clusters is
 now broken because code that frees to this "combined" list will not
 generate wake ups;  even if you build this functionality in, you would
 have to code the wakeup for a "special" case where one obtains both a
 cluster and an mbuf; it'll get complicated.

 2) no per-CPU caches, you bring contention (in -CURRENT) from per-CPU
 back to global, which you probably don't care about now, but which will
 matter when Giant goes completely.

 3) you violate write/read producer/consumer model.  In -CURRENT, we try
 to not have the code doing the freeing write to any part of the mbuf,
 thus hopefully avoiding cache invalidation from occuring during a free
 that happens from a thread that doesn't usually write to the mbuf at
 all.

 I think that because of (1) and (2), especially, and because the kernel
 in -CURRENT is the way it is right now, you're getting better
 performance numbers.  Try exhausting the system and seeing how far you
 get.  I suspect you'll eventually just hang (because you'll allocate a
 bunch of stuff violating the real interface and then won't be able to
 wakeup and Do The Right Thing).

 While, as I said before, I agree that we should have an interface that
 allocates both an mbuf and a cluster, I don't think that this is the
 right approach.  I believe that the correct approach would be to couple
 the mbuf and cluster allocation under a single per-CPU lock/unlock
 operation, which isn't too difficult to do.

On Tue, Jul 02, 2002 at 12:10:50AM -0700, Luigi Rizzo wrote:
> [Bcc to -current as relevant there]
> 
> As a followup to my question below, i did some simple experiment
> on a -current box acting as a router, using two "em" cards and
> DEVICE_POLLING (just for stability).
> 
> The whole of the code is basically below -- the receiving
> side tries to grab the mbuf+cluster from the free pool first,
> and falls back to the standard method on failure; the sending
> side tries to attach the freed buffer to the free pool
> if it is the case.
> 
> There is a simplification here with respect to what could
> go into an m_freem()/mgethdr_cluster() pair because the driver
> is already holding a lock, on Giant in this case, so you do not need
> further locking. Still, to come to the data points:
> 
> 				CURRENT		STABLE (*)
> 
> fastfwd=1, pool_max=0		276kpps		365 kpps
> fastfwd=1, pool_max=50		355kpps		383 kpps
> 
> fastfwd=0, pool_max=0		195 pps		142 kpps
> fastfwd=0, pool_max=50		227 kpps	146 kpps
> 
> (*) This version of STABLE that I am using for comparison has some
> proprietary optimizations which make it a bit faster than normal.
> However it still uses the old ipfw code, which is invoked when
> fastfwd=0, and is significantly slower than the new one.
> 
> Now this really seems to call for adding this interface into the
> mbuf subsystem. I believe we have to find a name for the allocator
> (the deallocator might well go into m_freem(), depending on how we
> implement the locking) and whether it makes sense to lock
> mgethdr_cluster() under per-cpu locks or under Giant, or even let
> the caller make sure that it holds the proper lock before trying
> to invoke the procedure (as i expect the "producers" or "consumers"
> of these pairs to be located in the network stack, chances are that
> they already hold a lock on Giant).
> 
> 
> 	cheers
> 	luigi
> 
> 
> The code:
> 
> 	struct mbuf *em_pool;
> 
> 	static int      em_pool_max = 50;
> 	SYSCTL_INT(_hw, OID_AUTO, em_pool_max, CTLFLAG_RW,
> 		&em_pool_max,0,"max size of mbuf pool");
> 	static int em_pool_now;
> 	SYSCTL_INT(_hw, OID_AUTO, em_pool_now, CTLFLAG_RD,
> 		&em_pool_now,0,"Current size of mbuf pool");
> 
> 	... in em_get_buf() ..
> 
> 		if (em_pool) {
>                         mp = em_pool;
>                         em_pool = mp->m_nextpkt;
>                         em_pool_now--;
>                         goto have_it;
>                 }
> 
> 	... in em_clean_transmit_interrupts() ...
> 		if ((m = tx_buffer->m_head)) {
> 			if (em_pool_now < em_pool_max &&
> 			    m->m_next == NULL &&
> 			    m->m_flags & M_EXT &&
> 			    M_WRITABLE(m) ) {
> 				m->m_nextpkt = em_pool;
> 				em_pool = m;
> 				em_pool_now++;
> 			} else
> 				m_freem(m);
> 			tx_buffer->m_head = NULL;
> 		}

Regards,
-- 
Bosko Milekic
bmilekic@unixdaemons.com
bmilekic@FreeBSD.org

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020702080423.A69693>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation