Date: Wed, 1 Mar 2006 16:57:10 +0000 (GMT) From: Robert Watson <rwatson@FreeBSD.org> To: Rohit Jalan <rohitj@purpe.com> Cc: hackers@freebsd.org Subject: Re: UMA zone allocator memory fragmentation questions Message-ID: <20060301165231.Y40707@fledge.watson.org> In-Reply-To: <20060301061750.GA4664@desk01.n2.purpe.com> References: <20060227104341.GA6671@desk01.n2.purpe.com> <20060228215910.S2248@fledge.watson.org> <20060301061750.GA4664@desk01.n2.purpe.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 1 Mar 2006, Rohit Jalan wrote: > My problem is that I need to enforce a single memory limit on the total > number of pages used by multiple zones. > > The limit changes dynamically based on the number of pages being used by > other non-zone allocations and also on the amount of available swap and > memory. > > I've tried to do the same in various ways with the stock kernel but I was > unsuccessful due to reasons detailed below. In the end I had to patch the > UMA subsystem to achieve my goal. > > Is there a better method of doing the same? Something that would not involve > patching the kernel. Please advise. Currently, UMA supports limits on allocation by keg, so if two zones don't share the same keg, they won't share the same limit. Supporting limits shared across requires a change as things stand. On the general topic of how to implement this -- I'm not sure what the best approach is. Your approach gives quite a bit of flexibility. I wonder, though, if it would be better to add an explicit accounting feature rather than a more flexible callback feature? I.e., have a notion of a UMA accounting group which can be shared by one or more Keg to impose shared limits on multiple kegs? Something similar to this might also be useful in the mbuf allocator, where we currently have quite a few kegs and zones floating around, making implementing a common limit quite difficult. Robert N M Watson > > ---------------------------------------------------------------------- > TMPFS uses multiple UMA zones to store filesystem metadata. > These zones are allocated on a per mount basis for reasons described in > the documentation. Because of fragmentation that can occur in a zone due > to dynamic allocations and frees, the actual memory in use can be more > than the sum of the contained item sizes. This makes it difficult to > track and limit the space being used by a filesystem. > > Even though the zone API provides scope for custom item constructors > and destructors the necessary information (nr. pages used) is > stored inside a keg structure which itself is a part of the opaque > uma_zone_t object. One could include <vm/uma_int.h> and access > the keg information in the custom constructor but it would require > messy code to calculate the change delta because one would have to > track the older value to see how many pages have been added or > subtracted. > > The zone API also provides custom page allocation and free hooks. > These are ideal for my purpose as they allow me to control > page allocation and frees effectively. But the callback interface is > lacking, it does not allow one to specify an argument (like const & destr) > making it difficult to update custom information from within the uma_free > callback because it is not passed the zone pointer nor an argument. > > Presently I have patched my private sources to modify the UMA API to > support passing an argument to the page allocation and free callbacks. > Unlike the constructor and destructor callback argument which is specified > on each call, the argument to uma_alloc or uma_free is specified > when setting the callback via uma_zone_set_allocf() or uma_zone_set_freef(). > This argument is stored in the keg and passed to the callback whenever > it is called. > > The scheme implemented by my patch imposes an overhead of > passing an extra argument to the uma_alloc and uma_free callbacks. > The uma_keg structure size is also increased by (2 * sizeof(void*)). > > My patch changes the present custom alloc and free callback routines > (e.g., page_alloc, page_free, etc.) to accept an extra argument > which is ignored. > > The static page_alloc and page_free routines are made global and > are renamed to uma_page_alloc and uma_page_free respectively. > This is so that they may be called from other custom allocators. > As is the case with my code. > > ---------------------------------------------------------------------- > > Patches: > http://download.purpe.com/files/TMPFS_FreeBSD_7-uma-1.dif > http://download.purpe.com/files/TMPFS_FreeBSD_7-uma-2.dif > > Regards, > > rohit -- > > > > On Tue, Feb 28, 2006 at 10:04:41PM +0000, Robert Watson wrote: >> On Mon, 27 Feb 2006, Rohit Jalan wrote: >> >>> Is there an upper limit on the amount of fragmentation / wastage that can >>> occur in a UMA zone? >>> >>> Is there a method to know the total number of pages used by a UMA zone at >>> some instance of time? >> >> Hey there Rohit, >> >> UMA allocates pages retrieved from VM as "slabs". It's behavior depends a >> bit on how large the allocated object is, as it's a question of packing >> objects into page-sized slabs for small objects, or packing objects into >> sets of pages making up a slab for larger objects. You can >> programmatically access information on UMA using libmemstat(3), which >> allows you to do things like query the current object cache size, total >> lifetime allocations for the zone, allocation failure count, sizes of >> per-cpu caches, etc. You may want to take a glance at the source code for >> vmstat -z and netstat -m for examples of it in use. You'll notice, for >> example, that netstat -m reports on both the mbufs in active use, and also >> the memory allocated to mbufs in the percpu + zone caches, since that >> memory is also (for the time being) committed to the mbuf allocator. The >> mbuf code is a little hard to follow because there are actually two zones >> that allocate mbufs, the mbuf zone and the packet secondary zone, so let me >> know if you have any questions. >> >> If you want to dig down a bit more, uma_int.h includes the keg and zone >> definitions, and you can extracting information like the page maximum, the >> number of items per page or pages per item, etc. If there's useful >> information that you need but isn't currently exposed by libmemstat, we can >> add it easily enough. You might also be interested in some of the tools at >> >> http://www.watson.org/~robert/freebsd/libmemstat/ >> >> Include memtop, which is basically an activity monitor for kernel memory >> types. As an FYI, kernel malloc is wrapped around UMA, so if you view both >> malloc and UMA stats at once, there is double-counting. >> >> Robert N M Watson >> _______________________________________________ >> freebsd-hackers@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers >> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060301165231.Y40707>