Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 1 Mar 2006 11:47:50 +0530
From:      Rohit Jalan <rohitj@purpe.com>
To:        Robert Watson <rwatson@freebsd.org>
Cc:        hackers@freebsd.org
Subject:   Re: UMA zone allocator memory fragmentation questions
Message-ID:  <20060301061750.GA4664@desk01.n2.purpe.com>
In-Reply-To: <20060228215910.S2248@fledge.watson.org>
References:  <20060227104341.GA6671@desk01.n2.purpe.com> <20060228215910.S2248@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Robert,

My problem is that I need to enforce a single memory limit
on the total number of pages used by multiple zones.

The limit changes dynamically based on the number of pages 
being used by other non-zone allocations and also on the amount 
of available swap and memory.

I've tried to do the same in various ways with the stock kernel but
I was unsuccessful due to reasons detailed below. In the end I had to
patch the UMA subsystem to achieve my goal. 

Is there a better method of doing the same? Something that would
not involve patching the kernel. Please advise.

----------------------------------------------------------------------
TMPFS uses multiple UMA zones to store filesystem metadata.
These zones are allocated on a per mount basis for reasons described in 
the documentation. Because of fragmentation that can occur in a zone due 
to dynamic allocations and frees, the actual memory in use can be more
than the sum of the contained item sizes. This makes it difficult to 
track and limit the space being used by a filesystem.

Even though the zone API provides scope for custom item constructors 
and destructors the necessary information (nr. pages used) is 
stored inside a keg structure which itself is a part of the opaque 
uma_zone_t object. One could  include <vm/uma_int.h> and access
the keg information in the custom constructor but it would require
messy code to calculate the change delta because one would have to 
track the older value to see how many pages have been added or 
subtracted.

The zone API also provides custom page allocation and free hooks.
These are ideal for my purpose as they allow me to control 
page allocation and frees effectively. But the callback interface is
lacking, it does not allow one to specify an argument (like const & destr)
making it difficult to update custom information from within the uma_free
callback because it is not passed the zone pointer nor an argument.

Presently I have patched my private sources to modify the UMA API to
support passing an argument to the page allocation and free callbacks.
Unlike the constructor and destructor callback argument which is specified 
on each call, the argument to uma_alloc or uma_free is specified 
when setting the callback via uma_zone_set_allocf() or uma_zone_set_freef().
This argument is stored in the keg and passed to the callback whenever 
it is called.

The scheme implemented by my patch imposes an overhead of 
passing an extra argument to the uma_alloc and uma_free callbacks.
The uma_keg structure size is also increased by (2 * sizeof(void*)).

My patch changes the present custom alloc and free callback routines 
(e.g., page_alloc, page_free, etc.) to accept an extra argument 
which is ignored.

The static page_alloc and page_free routines are made global and
are renamed to uma_page_alloc and uma_page_free respectively.
This is so that they may be called from other custom allocators.
As is the case with my code.

----------------------------------------------------------------------

Patches:
	 http://download.purpe.com/files/TMPFS_FreeBSD_7-uma-1.dif
	 http://download.purpe.com/files/TMPFS_FreeBSD_7-uma-2.dif

Regards, 

rohit --



On Tue, Feb 28, 2006 at 10:04:41PM +0000, Robert Watson wrote:
> On Mon, 27 Feb 2006, Rohit Jalan wrote:
> 
> >Is there an upper limit on the amount of fragmentation / wastage that can 
> >occur in a UMA zone?
> >
> >Is there a method to know the total number of pages used by a UMA zone at 
> >some instance of time?
> 
> Hey there Rohit,
> 
> UMA allocates pages retrieved from VM as "slabs".  It's behavior depends a 
> bit on how large the allocated object is, as it's a question of packing 
> objects into page-sized slabs for small objects, or packing objects into 
> sets of pages making up a slab for larger objects.  You can 
> programmatically access information on UMA using libmemstat(3), which 
> allows you to do things like query the current object cache size, total 
> lifetime allocations for the zone, allocation failure count, sizes of 
> per-cpu caches, etc.  You may want to take a glance at the source code for 
> vmstat -z and netstat -m for examples of it in use.  You'll notice, for 
> example, that netstat -m reports on both the mbufs in active use, and also 
> the memory allocated to mbufs in the percpu + zone caches, since that 
> memory is also (for the time being) committed to the mbuf allocator.  The 
> mbuf code is a little hard to follow because there are actually two zones 
> that allocate mbufs, the mbuf zone and the packet secondary zone, so let me 
> know if you have any questions.
> 
> If you want to dig down a bit more, uma_int.h includes the keg and zone 
> definitions, and you can extracting information like the page maximum, the 
> number of items per page or pages per item, etc.  If there's useful 
> information that you need but isn't currently exposed by libmemstat, we can 
> add it easily enough.  You might also be interested in some of the tools at
> 
>     http://www.watson.org/~robert/freebsd/libmemstat/
> 
> Include memtop, which is basically an activity monitor for kernel memory 
> types.  As an FYI, kernel malloc is wrapped around UMA, so if you view both 
> malloc and UMA stats at once, there is double-counting.
> 
> Robert N M Watson
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060301061750.GA4664>