From owner-freebsd-net Mon Oct 30 14:36:32 2000 Delivered-To: freebsd-net@freebsd.org Received: from mass.osd.bsdi.com (adsl-63-202-178-14.dsl.snfc21.pacbell.net [63.202.178.14]) by hub.freebsd.org (Postfix) with ESMTP id 1CBEE37B479 for ; Mon, 30 Oct 2000 14:36:24 -0800 (PST) Received: from mass.osd.bsdi.com (localhost [127.0.0.1]) by mass.osd.bsdi.com (8.11.0/8.11.1) with ESMTP id e9UMeWF18172; Mon, 30 Oct 2000 14:40:33 -0800 (PST) (envelope-from msmith@mass.osd.bsdi.com) Message-Id: <200010302240.e9UMeWF18172@mass.osd.bsdi.com> X-Mailer: exmh version 2.1.1 10/15/1999 To: Bosko Milekic Cc: freebsd-net@freebsd.org Subject: Re: MP: per-CPU mbuf allocation lists In-reply-to: Your message of "Mon, 30 Oct 2000 13:20:52 EST." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Mon, 30 Oct 2000 14:40:32 -0800 From: Mike Smith Sender: owner-freebsd-net@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > I recently wrote an initial "scratch pad" design for per-CPU mbuf > lists (in the MP case). The design consists simply of introducing > these "fast" lists for each CPU and populating them with mbufs on bootup. > Allocations from these lists would not need to be protected with a mutex > as each CPU has its own. The general mmbfree list remains, and remains > protected with a mutex, in case the per-CPU list is empty. Have you by any chance done any profiling to determine whether contention for the free mbuf list is actually a performance issue, or is this just one of those "hey, this would be cool" design decisions? > - "Fast" list; a per-CPU mbuf list. They contain "w" (for "watermark") > number of mbufs, typically... more on this below. > > - The general (already existing) mmbfree list; mutex protected, global > list, in case the fast list is empty for the given CPU. > > - Allocations; all done from "fast" lists. All are very fast, in the > general case. If no mbufs are available, the general mmbfree list's > lock is acquired, and an mbuf is made from there. Do you handle the case where an interrupt handler running on the same CPU is run while you are manipulating the "fast" list, ie. do you still lock the "fast" list? > If no mbuf is > available, even from the general list, we let go of the lock and > allocate a page from mb_map and drop the mbufs onto our fast list, from > which we grab the one we need. Starvation of the general list should result in the general list being populated, not the fast list. In this case, the general list will remain depleted until mbufs are freed, which will blow mb_map out faster than is otherwise desirable. > - Freeing; First, if someone is sleeping, we grab the mmbfree global > list mutex and drop the mbuf there, and then issue a wakeup. If nobody > is sleeping, then we proceed as follows: > (a) if our fast list does not have over "w" mbufs, put the mbuf on > our fast list and then we're done > (b) since our fast list already has "w" mbufs, acquire the mmbfree > mutex and drop the mbuf there. This is a half-hearted "donation" algorithm. It might make sense to waste some cycles in the "sleeping" case to lock other cpu's "fast" lists and steal mbufs from them... > Things to note: > > - note that if we're out of mbufs on our fast list, and the general > mmbfree list has none available either, and mb_map is starved, even > though there may be free mbufs on other CPU's fast lists, we will > return ENOBUFS. This behavior will usually be an indication of a > wrongly chosen watermark ("w") and we will have to consider how to > inform our users on how to properly select a watermark. I already > have some ideas for alternate situations/ways of handeling this, but > will leave this investigation for later. See previous comment. > - "w" is a tunable watermark. No fast list will ever contain more than > "w" mbufs. This presents a small problem. Consider a situation where > we initially set w = 500; consider we have two CPUs; consider CPU1's > fast list eventually gets 450 mbufs, and CPU2's fast list gets 345. > Consider then that we decide to set w = 200; Even though all > subsequent freeing will be done to the mmbfree list, unless we > eventually go under the 200 mark for our free list, we will likely > end up sitting with > 200 mbufs on each CPU's fast list. The idea I > presently have is to have a kproc "garbage collect" > w mbufs on the > CPUs' fast lists and put them back onto the mmbfree general list, if > it detects that "w" has been lowered. The watermark-lowering operation is likely to be very infrequent. As such, it would hardly hurt for it to scan each of the "fast" lists and steal excess mbufs back into the global pool. -- ... every activity meets with opposition, everyone who acts has his rivals and unfortunately opponents also. But not because people want to be opponents, rather because the tasks and relationships force people to take different points of view. [Dr. Fritz Todt] V I C T O R Y N O T V E N G E A N C E To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message