From owner-freebsd-hackers@FreeBSD.ORG Sun Sep 19 11:41:19 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9C6941065672; Sun, 19 Sep 2010 11:41:19 +0000 (UTC) (envelope-from rwatson@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 738DA8FC15; Sun, 19 Sep 2010 11:41:19 +0000 (UTC) Received: from [127.0.0.1] (rhee.cl.cam.ac.uk [128.232.1.202]) by cyrus.watson.org (Postfix) with ESMTPSA id 3536746B5C; Sun, 19 Sep 2010 07:41:18 -0400 (EDT) Mime-Version: 1.0 (Apple Message framework v1081) Content-Type: text/plain; charset=us-ascii From: "Robert N. M. Watson" In-Reply-To: <4C95C804.1010701@freebsd.org> Date: Sun, 19 Sep 2010 12:41:16 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <8D2A1836-CA85-4F1B-A5A5-9B75A8E2DA51@freebsd.org> References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> <4C95C804.1010701@freebsd.org> To: Andriy Gapon X-Mailer: Apple Mail (2.1081) Cc: Andre Oppermann , Jeff Roberson , Jeff Roberson , freebsd-hackers@freebsd.org Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Sep 2010 11:41:19 -0000 On 19 Sep 2010, at 09:21, Andriy Gapon wrote: >> I believe the combination of these approaches would significantly = solve the >> problem and should be relatively little new code. It should also = preserve the >> adaptable nature of the system without penalizing resource heavy = systems. I >> would be happy to review patches from anyone who wishes to undertake = it. >=20 > FWIW, the approach of simply limiting maximum bucket size based on = item size > seems to work rather well too, as my testing with zfs+uma shows. > I will also try to add code to completely bypass the per-cpu cache for = "really > huge" items. This is basically what malloc(9) does already: for small items, it = allocates from a series of fixed-size buckets (which could probably use = tuning), but maintains its own stats with respect to the types it maps = into the buckets. This is why there's double-counting between vmstat -z = and vmstat -m, since the former shows the buckets used to allocate the = latter. For large items, malloc(9) goes through UMA, but it's basically a = pass-through to VM, which directly provides pages. This means that for = small malloc types, you get per-CPU caches, and for large malloc types, = you don't. malloc(9) doesn't require fixed-size allocations, but also can't provide = the ctor/dtor partial tear-down caching, nor different effective working = sets of memory for different types. UMA should really only be used directly for memory types where tight = packing, per-CPU caching, and possibly partial tear-down, have benefits. = mbufs are a great example, because we allocate tons and tons of them = continuously in operation. More stable types allocated in smaller = quantities make very little sense, since we waste lots of memory = overhead in allocating buckets that won't be used, etc. Robert=