Date: Sat, 18 Sep 2010 22:27:42 -1000 (HST) From: Jeff Roberson <jroberson@jroberson.net> To: Andriy Gapon <avg@freebsd.org> Cc: Andre Oppermann <andre@freebsd.org>, Jeff Roberson <jeff@freebsd.org>, Robert Watson <rwatson@freebsd.org>, freebsd-hackers@freebsd.org Subject: Re: zfs + uma Message-ID: <alpine.BSF.2.00.1009182225050.23448@desktop> In-Reply-To: <4C95C804.1010701@freebsd.org> References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> <alpine.BSF.2.00.1009181221560.86826@fledge.watson.org> <alpine.BSF.2.00.1009181135430.23448@desktop> <4C95C804.1010701@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 19 Sep 2010, Andriy Gapon wrote: > on 19/09/2010 01:16 Jeff Roberson said the following: >> Not specifically in reaction to Robert's comment but I would like to add my >> thoughts to this notion of resource balancing in buckets. I really prefer not >> to do any specific per-zone tuning except in extreme cases. This is because >> quite often the decisions we make don't apply to some class of machines or >> workloads. I would instead prefer to keep the algorithm adaptable. > > Agree. > >> I like the idea of weighting the bucket decisions by the size of the item. >> Obviously this has some flaws with compound objects but in the general case it >> is good. We should consider increasing the cost of bucket expansion based on >> the size of the item. Right now buckets are expanded fairly readily. >> >> We could also consider decreasing the default bucket size for a zone based on vm >> pressure and use. Right now there is no downward pressure on bucket size, only >> upward based on trips to the slab layer. >> >> Additionally we could make a last ditch flush mechanism that runs on each cpu in >> turn and flushes some or all of the buckets in per-cpu caches. Presently that is >> not done due to synchronization issues. It can't be done from a central place. >> It could be done with a callout mechanism or a for loop that binds to each core >> in succession. > > I like all of the tree above approaches. > The last one is a bit hard to implement, the first two seem easier. All the last one requires is a loop calling sched_bind() on each available cpu. > >> I believe the combination of these approaches would significantly solve the >> problem and should be relatively little new code. It should also preserve the >> adaptable nature of the system without penalizing resource heavy systems. I >> would be happy to review patches from anyone who wishes to undertake it. > > FWIW, the approach of simply limiting maximum bucket size based on item size > seems to work rather well too, as my testing with zfs+uma shows. > I will also try to add code to completely bypass the per-cpu cache for "really > huge" items. I don't like this because even with very large buffers you can still have high enough turnover to require per-cpu caching. Kip specifically added UMA support to address this issue in zfs. If you have allocations which don't require per-cpu caching and are very large why even use UMA? One thing that would be nice if we are frequently using page size allocations is to eliminate the requirement for a slab header for each page. It seems unnecessary for any zone where the items per slab is 1 but it would require careful modification to support properly. Thanks, Jeff > > -- > Andriy Gapon >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1009182225050.23448>