Date: Tue, 2 Jul 2002 09:03:54 -0400 From: Bosko Milekic <bmilekic@unixdaemons.com> To: Luigi Rizzo <rizzo@icir.org> Cc: net@FreeBSD.ORG Subject: Re: Mbuf allocator performance (was Should we keep a cache of mbuf+cluster ready for use ?) Message-ID: <20020702090354.A78632@unixdaemons.com> In-Reply-To: <20020702080423.A69693@unixdaemons.com>; from bmilekic@unixdaemons.com on Tue, Jul 02, 2002 at 08:04:23AM -0400 References: <20020702001050.B2250@iguana.icir.org> <20020702080423.A69693@unixdaemons.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jul 02, 2002 at 08:04:23AM -0400, Bosko Milekic wrote: > > There are several problems with your "simple" code that make it simpler > than what's in -CURRENT and in -STABLE and therefore yield [obviously] > seemingly better times. > > 1) sleeping in the case of a system exhausted of mbufs and clusters is > now broken because code that frees to this "combined" list will not > generate wake ups; even if you build this functionality in, you would > have to code the wakeup for a "special" case where one obtains both a > cluster and an mbuf; it'll get complicated. > > 2) no per-CPU caches, you bring contention (in -CURRENT) from per-CPU > back to global, which you probably don't care about now, but which will > matter when Giant goes completely. > > 3) you violate write/read producer/consumer model. In -CURRENT, we try > to not have the code doing the freeing write to any part of the mbuf, > thus hopefully avoiding cache invalidation from occuring during a free > that happens from a thread that doesn't usually write to the mbuf at > all. Forgot, also, in -CURRENT: 4) breaks framework in mb_alloc that groups pages of mbufs and clusters together allowing them to be freed in the future, if/when we decide to implement lazy freeing; it would do the same thing if we mvoed mbuf allocations to UMA, because grouping together mbufs and clusters on a separate list will violate that scheme. > I think that because of (1) and (2), especially, and because the kernel > in -CURRENT is the way it is right now, you're getting better > performance numbers. Try exhausting the system and seeing how far you > get. I suspect you'll eventually just hang (because you'll allocate a > bunch of stuff violating the real interface and then won't be able to > wakeup and Do The Right Thing). > > While, as I said before, I agree that we should have an interface that > allocates both an mbuf and a cluster, I don't think that this is the > right approach. I believe that the correct approach would be to couple > the mbuf and cluster allocation under a single per-CPU lock/unlock > operation, which isn't too difficult to do. > > On Tue, Jul 02, 2002 at 12:10:50AM -0700, Luigi Rizzo wrote: > > [Bcc to -current as relevant there] > > > > As a followup to my question below, i did some simple experiment > > on a -current box acting as a router, using two "em" cards and > > DEVICE_POLLING (just for stability). > > > > The whole of the code is basically below -- the receiving > > side tries to grab the mbuf+cluster from the free pool first, > > and falls back to the standard method on failure; the sending > > side tries to attach the freed buffer to the free pool > > if it is the case. > > > > There is a simplification here with respect to what could > > go into an m_freem()/mgethdr_cluster() pair because the driver > > is already holding a lock, on Giant in this case, so you do not need > > further locking. Still, to come to the data points: > > > > CURRENT STABLE (*) > > > > fastfwd=1, pool_max=0 276kpps 365 kpps > > fastfwd=1, pool_max=50 355kpps 383 kpps > > > > fastfwd=0, pool_max=0 195 pps 142 kpps > > fastfwd=0, pool_max=50 227 kpps 146 kpps > > > > (*) This version of STABLE that I am using for comparison has some > > proprietary optimizations which make it a bit faster than normal. > > However it still uses the old ipfw code, which is invoked when > > fastfwd=0, and is significantly slower than the new one. > > > > Now this really seems to call for adding this interface into the > > mbuf subsystem. I believe we have to find a name for the allocator > > (the deallocator might well go into m_freem(), depending on how we > > implement the locking) and whether it makes sense to lock > > mgethdr_cluster() under per-cpu locks or under Giant, or even let > > the caller make sure that it holds the proper lock before trying > > to invoke the procedure (as i expect the "producers" or "consumers" > > of these pairs to be located in the network stack, chances are that > > they already hold a lock on Giant). > > > > > > cheers > > luigi > > > > > > The code: > > > > struct mbuf *em_pool; > > > > static int em_pool_max = 50; > > SYSCTL_INT(_hw, OID_AUTO, em_pool_max, CTLFLAG_RW, > > &em_pool_max,0,"max size of mbuf pool"); > > static int em_pool_now; > > SYSCTL_INT(_hw, OID_AUTO, em_pool_now, CTLFLAG_RD, > > &em_pool_now,0,"Current size of mbuf pool"); > > > > ... in em_get_buf() .. > > > > if (em_pool) { > > mp = em_pool; > > em_pool = mp->m_nextpkt; > > em_pool_now--; > > goto have_it; > > } > > > > ... in em_clean_transmit_interrupts() ... > > if ((m = tx_buffer->m_head)) { > > if (em_pool_now < em_pool_max && > > m->m_next == NULL && > > m->m_flags & M_EXT && > > M_WRITABLE(m) ) { > > m->m_nextpkt = em_pool; > > em_pool = m; > > em_pool_now++; > > } else > > m_freem(m); > > tx_buffer->m_head = NULL; > > } > > Regards, > -- > Bosko Milekic > bmilekic@unixdaemons.com > bmilekic@FreeBSD.org > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-net" in the body of the message > -- Bosko Milekic bmilekic@unixdaemons.com bmilekic@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020702090354.A78632>