From owner-freebsd-hackers@FreeBSD.ORG Sun Sep 19 08:21:30 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 77570106566C; Sun, 19 Sep 2010 08:21:30 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 262B88FC18; Sun, 19 Sep 2010 08:21:28 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA19366; Sun, 19 Sep 2010 11:21:26 +0300 (EEST) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1OxF9J-000GUo-OK; Sun, 19 Sep 2010 11:21:25 +0300 Message-ID: <4C95C804.1010701@freebsd.org> Date: Sun, 19 Sep 2010 11:21:24 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100918 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Jeff Roberson References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Andre Oppermann , Jeff Roberson , Robert Watson , freebsd-hackers@freebsd.org Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Sep 2010 08:21:30 -0000 on 19/09/2010 01:16 Jeff Roberson said the following: > Not specifically in reaction to Robert's comment but I would like to add my > thoughts to this notion of resource balancing in buckets. I really prefer not > to do any specific per-zone tuning except in extreme cases. This is because > quite often the decisions we make don't apply to some class of machines or > workloads. I would instead prefer to keep the algorithm adaptable. Agree. > I like the idea of weighting the bucket decisions by the size of the item. > Obviously this has some flaws with compound objects but in the general case it > is good. We should consider increasing the cost of bucket expansion based on > the size of the item. Right now buckets are expanded fairly readily. > > We could also consider decreasing the default bucket size for a zone based on vm > pressure and use. Right now there is no downward pressure on bucket size, only > upward based on trips to the slab layer. > > Additionally we could make a last ditch flush mechanism that runs on each cpu in > turn and flushes some or all of the buckets in per-cpu caches. Presently that is > not done due to synchronization issues. It can't be done from a central place. > It could be done with a callout mechanism or a for loop that binds to each core > in succession. I like all of the tree above approaches. The last one is a bit hard to implement, the first two seem easier. > I believe the combination of these approaches would significantly solve the > problem and should be relatively little new code. It should also preserve the > adaptable nature of the system without penalizing resource heavy systems. I > would be happy to review patches from anyone who wishes to undertake it. FWIW, the approach of simply limiting maximum bucket size based on item size seems to work rather well too, as my testing with zfs+uma shows. I will also try to add code to completely bypass the per-cpu cache for "really huge" items. -- Andriy Gapon