Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 2 Jul 2002 09:03:54 -0400
From:      Bosko Milekic <bmilekic@unixdaemons.com>
To:        Luigi Rizzo <rizzo@icir.org>
Cc:        net@FreeBSD.ORG
Subject:   Re: Mbuf allocator performance (was Should we keep a cache of mbuf+cluster ready for use ?)
Message-ID:  <20020702090354.A78632@unixdaemons.com>
In-Reply-To: <20020702080423.A69693@unixdaemons.com>; from bmilekic@unixdaemons.com on Tue, Jul 02, 2002 at 08:04:23AM -0400
References:  <20020702001050.B2250@iguana.icir.org> <20020702080423.A69693@unixdaemons.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Tue, Jul 02, 2002 at 08:04:23AM -0400, Bosko Milekic wrote:
> 
>  There are several problems with your "simple" code that make it simpler
>  than what's in -CURRENT and in -STABLE and therefore yield [obviously]
>  seemingly better times.
> 
>  1) sleeping in the case of a system exhausted of mbufs and clusters is
>  now broken because code that frees to this "combined" list will not
>  generate wake ups;  even if you build this functionality in, you would
>  have to code the wakeup for a "special" case where one obtains both a
>  cluster and an mbuf; it'll get complicated.
> 
>  2) no per-CPU caches, you bring contention (in -CURRENT) from per-CPU
>  back to global, which you probably don't care about now, but which will
>  matter when Giant goes completely.
> 
>  3) you violate write/read producer/consumer model.  In -CURRENT, we try
>  to not have the code doing the freeing write to any part of the mbuf,
>  thus hopefully avoiding cache invalidation from occuring during a free
>  that happens from a thread that doesn't usually write to the mbuf at
>  all.

   Forgot, also, in -CURRENT:

   4) breaks framework in mb_alloc that groups pages of mbufs and
   clusters together allowing them to be freed in the future, if/when we
   decide to implement lazy freeing;  it would do the same thing if we
   mvoed mbuf allocations to UMA, because grouping together mbufs and
   clusters on a separate list will violate that scheme.

>  I think that because of (1) and (2), especially, and because the kernel
>  in -CURRENT is the way it is right now, you're getting better
>  performance numbers.  Try exhausting the system and seeing how far you
>  get.  I suspect you'll eventually just hang (because you'll allocate a
>  bunch of stuff violating the real interface and then won't be able to
>  wakeup and Do The Right Thing).
> 
>  While, as I said before, I agree that we should have an interface that
>  allocates both an mbuf and a cluster, I don't think that this is the
>  right approach.  I believe that the correct approach would be to couple
>  the mbuf and cluster allocation under a single per-CPU lock/unlock
>  operation, which isn't too difficult to do.
>  
> On Tue, Jul 02, 2002 at 12:10:50AM -0700, Luigi Rizzo wrote:
> > [Bcc to -current as relevant there]
> > 
> > As a followup to my question below, i did some simple experiment
> > on a -current box acting as a router, using two "em" cards and
> > DEVICE_POLLING (just for stability).
> > 
> > The whole of the code is basically below -- the receiving
> > side tries to grab the mbuf+cluster from the free pool first,
> > and falls back to the standard method on failure; the sending
> > side tries to attach the freed buffer to the free pool
> > if it is the case.
> > 
> > There is a simplification here with respect to what could
> > go into an m_freem()/mgethdr_cluster() pair because the driver
> > is already holding a lock, on Giant in this case, so you do not need
> > further locking. Still, to come to the data points:
> > 
> > 				CURRENT		STABLE (*)
> > 
> > fastfwd=1, pool_max=0		276kpps		365 kpps
> > fastfwd=1, pool_max=50		355kpps		383 kpps
> > 
> > fastfwd=0, pool_max=0		195 pps		142 kpps
> > fastfwd=0, pool_max=50		227 kpps	146 kpps
> > 
> > (*) This version of STABLE that I am using for comparison has some
> > proprietary optimizations which make it a bit faster than normal.
> > However it still uses the old ipfw code, which is invoked when
> > fastfwd=0, and is significantly slower than the new one.
> > 
> > Now this really seems to call for adding this interface into the
> > mbuf subsystem. I believe we have to find a name for the allocator
> > (the deallocator might well go into m_freem(), depending on how we
> > implement the locking) and whether it makes sense to lock
> > mgethdr_cluster() under per-cpu locks or under Giant, or even let
> > the caller make sure that it holds the proper lock before trying
> > to invoke the procedure (as i expect the "producers" or "consumers"
> > of these pairs to be located in the network stack, chances are that
> > they already hold a lock on Giant).
> > 
> > 
> > 	cheers
> > 	luigi
> > 
> > 
> > The code:
> > 
> > 	struct mbuf *em_pool;
> > 
> > 	static int      em_pool_max = 50;
> > 	SYSCTL_INT(_hw, OID_AUTO, em_pool_max, CTLFLAG_RW,
> > 		&em_pool_max,0,"max size of mbuf pool");
> > 	static int em_pool_now;
> > 	SYSCTL_INT(_hw, OID_AUTO, em_pool_now, CTLFLAG_RD,
> > 		&em_pool_now,0,"Current size of mbuf pool");
> > 
> > 	... in em_get_buf() ..
> > 
> > 		if (em_pool) {
> >                         mp = em_pool;
> >                         em_pool = mp->m_nextpkt;
> >                         em_pool_now--;
> >                         goto have_it;
> >                 }
> > 
> > 	... in em_clean_transmit_interrupts() ...
> > 		if ((m = tx_buffer->m_head)) {
> > 			if (em_pool_now < em_pool_max &&
> > 			    m->m_next == NULL &&
> > 			    m->m_flags & M_EXT &&
> > 			    M_WRITABLE(m) ) {
> > 				m->m_nextpkt = em_pool;
> > 				em_pool = m;
> > 				em_pool_now++;
> > 			} else
> > 				m_freem(m);
> > 			tx_buffer->m_head = NULL;
> > 		}
>  
> Regards,
> -- 
> Bosko Milekic
> bmilekic@unixdaemons.com
> bmilekic@FreeBSD.org
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-net" in the body of the message
> 

-- 
Bosko Milekic
bmilekic@unixdaemons.com
bmilekic@FreeBSD.org


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020702090354.A78632>