Date: Wed, 1 Mar 2006 11:47:50 +0530 From: Rohit Jalan <rohitj@purpe.com> To: Robert Watson <rwatson@freebsd.org> Cc: hackers@freebsd.org Subject: Re: UMA zone allocator memory fragmentation questions Message-ID: <20060301061750.GA4664@desk01.n2.purpe.com> In-Reply-To: <20060228215910.S2248@fledge.watson.org> References: <20060227104341.GA6671@desk01.n2.purpe.com> <20060228215910.S2248@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Robert, My problem is that I need to enforce a single memory limit on the total number of pages used by multiple zones. The limit changes dynamically based on the number of pages being used by other non-zone allocations and also on the amount of available swap and memory. I've tried to do the same in various ways with the stock kernel but I was unsuccessful due to reasons detailed below. In the end I had to patch the UMA subsystem to achieve my goal. Is there a better method of doing the same? Something that would not involve patching the kernel. Please advise. ---------------------------------------------------------------------- TMPFS uses multiple UMA zones to store filesystem metadata. These zones are allocated on a per mount basis for reasons described in the documentation. Because of fragmentation that can occur in a zone due to dynamic allocations and frees, the actual memory in use can be more than the sum of the contained item sizes. This makes it difficult to track and limit the space being used by a filesystem. Even though the zone API provides scope for custom item constructors and destructors the necessary information (nr. pages used) is stored inside a keg structure which itself is a part of the opaque uma_zone_t object. One could include <vm/uma_int.h> and access the keg information in the custom constructor but it would require messy code to calculate the change delta because one would have to track the older value to see how many pages have been added or subtracted. The zone API also provides custom page allocation and free hooks. These are ideal for my purpose as they allow me to control page allocation and frees effectively. But the callback interface is lacking, it does not allow one to specify an argument (like const & destr) making it difficult to update custom information from within the uma_free callback because it is not passed the zone pointer nor an argument. Presently I have patched my private sources to modify the UMA API to support passing an argument to the page allocation and free callbacks. Unlike the constructor and destructor callback argument which is specified on each call, the argument to uma_alloc or uma_free is specified when setting the callback via uma_zone_set_allocf() or uma_zone_set_freef(). This argument is stored in the keg and passed to the callback whenever it is called. The scheme implemented by my patch imposes an overhead of passing an extra argument to the uma_alloc and uma_free callbacks. The uma_keg structure size is also increased by (2 * sizeof(void*)). My patch changes the present custom alloc and free callback routines (e.g., page_alloc, page_free, etc.) to accept an extra argument which is ignored. The static page_alloc and page_free routines are made global and are renamed to uma_page_alloc and uma_page_free respectively. This is so that they may be called from other custom allocators. As is the case with my code. ---------------------------------------------------------------------- Patches: http://download.purpe.com/files/TMPFS_FreeBSD_7-uma-1.dif http://download.purpe.com/files/TMPFS_FreeBSD_7-uma-2.dif Regards, rohit -- On Tue, Feb 28, 2006 at 10:04:41PM +0000, Robert Watson wrote: > On Mon, 27 Feb 2006, Rohit Jalan wrote: > > >Is there an upper limit on the amount of fragmentation / wastage that can > >occur in a UMA zone? > > > >Is there a method to know the total number of pages used by a UMA zone at > >some instance of time? > > Hey there Rohit, > > UMA allocates pages retrieved from VM as "slabs". It's behavior depends a > bit on how large the allocated object is, as it's a question of packing > objects into page-sized slabs for small objects, or packing objects into > sets of pages making up a slab for larger objects. You can > programmatically access information on UMA using libmemstat(3), which > allows you to do things like query the current object cache size, total > lifetime allocations for the zone, allocation failure count, sizes of > per-cpu caches, etc. You may want to take a glance at the source code for > vmstat -z and netstat -m for examples of it in use. You'll notice, for > example, that netstat -m reports on both the mbufs in active use, and also > the memory allocated to mbufs in the percpu + zone caches, since that > memory is also (for the time being) committed to the mbuf allocator. The > mbuf code is a little hard to follow because there are actually two zones > that allocate mbufs, the mbuf zone and the packet secondary zone, so let me > know if you have any questions. > > If you want to dig down a bit more, uma_int.h includes the keg and zone > definitions, and you can extracting information like the page maximum, the > number of items per page or pages per item, etc. If there's useful > information that you need but isn't currently exposed by libmemstat, we can > add it easily enough. You might also be interested in some of the tools at > > http://www.watson.org/~robert/freebsd/libmemstat/ > > Include memtop, which is basically an activity monitor for kernel memory > types. As an FYI, kernel malloc is wrapped around UMA, so if you view both > malloc and UMA stats at once, there is double-counting. > > Robert N M Watson > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060301061750.GA4664>