Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 18 Sep 2010 22:27:42 -1000 (HST)
From:      Jeff Roberson <jroberson@jroberson.net>
To:        Andriy Gapon <avg@freebsd.org>
Cc:        Andre Oppermann <andre@freebsd.org>, Jeff Roberson <jeff@freebsd.org>, Robert Watson <rwatson@freebsd.org>, freebsd-hackers@freebsd.org
Subject:   Re: zfs + uma
Message-ID:  <alpine.BSF.2.00.1009182225050.23448@desktop>
In-Reply-To: <4C95C804.1010701@freebsd.org>
References:  <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> <alpine.BSF.2.00.1009181221560.86826@fledge.watson.org> <alpine.BSF.2.00.1009181135430.23448@desktop> <4C95C804.1010701@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 19 Sep 2010, Andriy Gapon wrote:

> on 19/09/2010 01:16 Jeff Roberson said the following:
>> Not specifically in reaction to Robert's comment but I would like to add my
>> thoughts to this notion of resource balancing in buckets.  I really prefer not
>> to do any specific per-zone tuning except in extreme cases. This is because
>> quite often the decisions we make don't apply to some class of machines or
>> workloads.  I would instead prefer to keep the algorithm adaptable.
>
> Agree.
>
>> I like the idea of weighting the bucket decisions by the size of the item.
>> Obviously this has some flaws with compound objects but in the general case it
>> is good.  We should consider increasing the cost of bucket expansion based on
>> the size of the item.  Right now buckets are expanded fairly readily.
>>
>> We could also consider decreasing the default bucket size for a zone based on vm
>> pressure and use.  Right now there is no downward pressure on bucket size, only
>> upward based on trips to the slab layer.
>>
>> Additionally we could make a last ditch flush mechanism that runs on each cpu in
>> turn and flushes some or all of the buckets in per-cpu caches. Presently that is
>> not done due to synchronization issues.  It can't be done from a central place.
>> It could be done with a callout mechanism or a for loop that binds to each core
>> in succession.
>
> I like all of the tree above approaches.
> The last one is a bit hard to implement, the first two seem easier.

All the last one requires is a loop calling sched_bind() on each available 
cpu.

>
>> I believe the combination of these approaches would significantly solve the
>> problem and should be relatively little new code.  It should also preserve the
>> adaptable nature of the system without penalizing resource heavy systems.  I
>> would be happy to review patches from anyone who wishes to undertake it.
>
> FWIW, the approach of simply limiting maximum bucket size based on item size
> seems to work rather well too, as my testing with zfs+uma shows.
> I will also try to add code to completely bypass the per-cpu cache for "really
> huge" items.

I don't like this because even with very large buffers you can still have 
high enough turnover to require per-cpu caching.  Kip specifically added 
UMA support to address this issue in zfs.  If you have allocations which 
don't require per-cpu caching and are very large why even use UMA?

One thing that would be nice if we are frequently using page size 
allocations is to eliminate the requirement for a slab header for each 
page.  It seems unnecessary for any zone where the items per slab is 1 but 
it would require careful modification to support properly.

Thanks,
Jeff

>
> -- 
> Andriy Gapon
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1009182225050.23448>