From owner-freebsd-hackers@FreeBSD.ORG Sun Sep 19 08:26:37 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EADAB1065670; Sun, 19 Sep 2010 08:26:36 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from mail-pz0-f54.google.com (mail-pz0-f54.google.com [209.85.210.54]) by mx1.freebsd.org (Postfix) with ESMTP id ACDC28FC14; Sun, 19 Sep 2010 08:26:36 +0000 (UTC) Received: by pzk7 with SMTP id 7so1182700pzk.13 for ; Sun, 19 Sep 2010 01:26:36 -0700 (PDT) Received: by 10.142.132.11 with SMTP id f11mr6297097wfd.35.1284884796189; Sun, 19 Sep 2010 01:26:36 -0700 (PDT) Received: from [10.0.1.198] (udp022762uds.hawaiiantel.net [72.234.79.107]) by mx.google.com with ESMTPS id l42sm3725264wfa.9.2010.09.19.01.26.33 (version=SSLv3 cipher=RC4-MD5); Sun, 19 Sep 2010 01:26:35 -0700 (PDT) Date: Sat, 18 Sep 2010 22:27:42 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Andriy Gapon In-Reply-To: <4C95C804.1010701@freebsd.org> Message-ID: References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> <4C95C804.1010701@freebsd.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Mailman-Approved-At: Sun, 19 Sep 2010 11:01:51 +0000 Cc: Andre Oppermann , Jeff Roberson , Robert Watson , freebsd-hackers@freebsd.org Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Sep 2010 08:26:37 -0000 On Sun, 19 Sep 2010, Andriy Gapon wrote: > on 19/09/2010 01:16 Jeff Roberson said the following: >> Not specifically in reaction to Robert's comment but I would like to add my >> thoughts to this notion of resource balancing in buckets. I really prefer not >> to do any specific per-zone tuning except in extreme cases. This is because >> quite often the decisions we make don't apply to some class of machines or >> workloads. I would instead prefer to keep the algorithm adaptable. > > Agree. > >> I like the idea of weighting the bucket decisions by the size of the item. >> Obviously this has some flaws with compound objects but in the general case it >> is good. We should consider increasing the cost of bucket expansion based on >> the size of the item. Right now buckets are expanded fairly readily. >> >> We could also consider decreasing the default bucket size for a zone based on vm >> pressure and use. Right now there is no downward pressure on bucket size, only >> upward based on trips to the slab layer. >> >> Additionally we could make a last ditch flush mechanism that runs on each cpu in >> turn and flushes some or all of the buckets in per-cpu caches. Presently that is >> not done due to synchronization issues. It can't be done from a central place. >> It could be done with a callout mechanism or a for loop that binds to each core >> in succession. > > I like all of the tree above approaches. > The last one is a bit hard to implement, the first two seem easier. All the last one requires is a loop calling sched_bind() on each available cpu. > >> I believe the combination of these approaches would significantly solve the >> problem and should be relatively little new code. It should also preserve the >> adaptable nature of the system without penalizing resource heavy systems. I >> would be happy to review patches from anyone who wishes to undertake it. > > FWIW, the approach of simply limiting maximum bucket size based on item size > seems to work rather well too, as my testing with zfs+uma shows. > I will also try to add code to completely bypass the per-cpu cache for "really > huge" items. I don't like this because even with very large buffers you can still have high enough turnover to require per-cpu caching. Kip specifically added UMA support to address this issue in zfs. If you have allocations which don't require per-cpu caching and are very large why even use UMA? One thing that would be nice if we are frequently using page size allocations is to eliminate the requirement for a slab header for each page. It seems unnecessary for any zone where the items per slab is 1 but it would require careful modification to support properly. Thanks, Jeff > > -- > Andriy Gapon >