Date: Thu, 14 Jun 2001 01:03:13 -0700 From: Terry Lambert <tlambert2@mindspring.com> To: Bosko Milekic <bmilekic@technokratis.com> Cc: freebsd-current@FreeBSD.ORG, freebsd-alpha@FreeBSD.ORG Subject: Re: New SMP Mbuf Allocator (PATCH and TESTING request) Message-ID: <3B286FC1.5C112A5C@mindspring.com> References: <20010613010744.A4083@technokratis.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Bosko Milekic wrote: > I plan to commit the new bits within the next week. > However, as is usually the case with commits of this magnitude, > I'd like a few more tests to be run by a few more people. > I've been testing the allocator myself (in several different > ways, mainly resource exhaustion simulations) for the past > couple of weeks and have been able to catch and fix a couple > of bugs. > Also, jake, jlemon, and Silby (Mike Silbersack) have > provided me with some reviews, all of which have been integrated > into the latest version of the patch (below). A general comment, and then a comment on the patch: I would like to see this code be optional, until such time as benchmarks have proven it to be better than the existing code, or have proben it to be worse. Diking out the existing code, and forcing it to be diked out by making unnecessary function name changes seems to me to be a problem. Overall, I like the idea of moving towards a Dynix allocator type model to support many processors (not just 4) well, but I'm leery of many of the changes, particularly in light of the claim that "It has been established that we do not benefit from different locks for different objects, so we use the same lock, regardless of object type". On to the specific comments... I have addressed them in patch order, so that they can be read side-by-side: -- The "npg" calculation has always been incorrect for large memory systems with non-sparse physical maps, and now it is much worse. I commented on this before in -arch. Consider the case of a machine with 4G of physical memory. -- Is the policy two line or 4 line license, these days? -- The MBALLOC_CPU stuff should be more dynamic; it appears that the real number of CPUs is known at the time the allocations take place. -- I think you should spin, and only block if you don't get the spin lock after several attempts. The Solaris tradeoff in this regard is 10 attempts. Direct blocking on the cv seems a little wasteful. -- I'm not sure why you mutex the per-CPU containers? Aren't they, by definition, per-CPU, and therefore immune from contention? -- You realize that there is a dependency on MAXFILES, right? Specifically, there is an mbuf allocated per open socket for the tcptmpl (even though it only needs 60 bytes). --- If you fix the tcptmpl allocation, a couple of additional issues arise: 1) Since the allocations are in page-units, you will end up with a lot of waste. 2) Allocation in page units will make it hard to give 60 byte allocations back to the system. 3) The allocator is not really general enough for this (or for things like TIME_WAIT zombie structs, etc.). 4) Yeah, this sort of also implies that mbuf cluster headers come from a seperate pool, instead of being 256 bytes as well... -- Why do you lock around things in mb_init()? It's never called twice, let alone reentrantly... the code will be done before an AP is allowed to run. Alpha has this same restriction, from what I can see from the code. -- Dropping the lock in mb_pop_cont() seems wrong; I guess you are avoiding sleeping with the lock held? I don't think you need to worry (per-CPU, again), unless you later attempt lazy repopulation; it seems to me that the lock that should be held is a global lock, to prevent reentrancy on the global memory pool, and the malloc() code does that. Perhaps I'm missing something here. -- The ability to specify a "how" that implies an unwillingness to block implies interrupt allocation; yet you can't support this, since you do not preallocate a kernel map (ala the zone allocator). The zone allocator is actually very misleading, in that some of it's code never has the values it implies that it might have; in particular, there are functions which are passed parameters which can't really be variant, as implied by the use of per zone variables as arguments. -- Ah. I see why all the locks: mb_alloc_wait attempts to be a primitive reclaimer. You would do much better with a per-CPU high-watermark reclaim at free(), I think, instead of doing this. Really, you are in a starvation condition because of your load, and not because someone is "hogging already free mbufs", I think. This is probably premature optimization. -- In the starvation case during a free, I don't think you really care, except perhaps to adjust the high/low watermarks. An interesting issue here is that the average TCP window size will end up being 16k, which comes out to 4 1-page buckets (2, on Alpha), where you end up with multiple buckets and many mbufs (64) per, so freeing isn't really an option, if you are sending lots of data (serving content). It would be different for a desktop system, since the receive interrupt might come to any idle CPU. -- I like the cleanup via the use of local macros. -- I think the m_getclr -> m_get_clrd change is gratuitous. -- I like the "only" comment claifications. -- I dislike the bloating of one line comments into three lines, with the first and last blank. -- You could resolve the statistics problems by keeping them on a per-CPU basis, and then agregating them under a general system lock, when necessary (when they were requested). -- Most of the header definition changes for function declarations are gratuitous. -- Don't really care about netstat stuff... -- -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3B286FC1.5C112A5C>