From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 21 16:16:09 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5C77B106566B for ; Tue, 21 Sep 2010 16:16:09 +0000 (UTC) (envelope-from alan.l.cox@gmail.com) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id 9143A8FC1A for ; Tue, 21 Sep 2010 16:16:08 +0000 (UTC) Received: by qwg5 with SMTP id 5so4985938qwg.13 for ; Tue, 21 Sep 2010 09:16:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:reply-to :in-reply-to:references:date:message-id:subject:from:to:cc :content-type; bh=NKUtT+HBidC3RbyT64+qU0jrci+Ce6CNOSpqazWAw3k=; b=LEqjht7K27SKPkF5XM2E7yHJrH46c9Nk5xmeDjhpR2pBgASdydZMeyZM+eY4EPjlxP kDvQSDhd1iAWMVGmcRuty+DJDw7OX3SdJkD4MDhTZjA7qoP/+eojEpaERSHv1C0m09qG 4MiYYGHTvGum4XFukLWqH6D4yETO5Ua3RGRQk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; b=nW2x4U3hkVi2IY4wxLIAt6s6Cb8Gc7V0zDTJezCPGKMq78C892RGNvC5uiaKFD6Vsp m6Eeaa5/ppLjnwtaaWjS3ScvHlMprvoiWWCSvgNJVxpxXUS/9J41dKZhBsfBmAhTZSkj BGp8+qj+l/Y1nRCGcXgCdaEcnmUBgkIAqPx8Q= MIME-Version: 1.0 Received: by 10.224.79.28 with SMTP id n28mr6991262qak.175.1285085767738; Tue, 21 Sep 2010 09:16:07 -0700 (PDT) Received: by 10.229.37.85 with HTTP; Tue, 21 Sep 2010 09:16:07 -0700 (PDT) In-Reply-To: References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> <4C95C804.1010701@freebsd.org> <4C95CCDA.7010007@freebsd.org> <4C984E90.60507@freebsd.org> Date: Tue, 21 Sep 2010 11:16:07 -0500 Message-ID: From: Alan Cox To: Jeff Roberson Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Robert Watson , Jeff Roberson , Andre Oppermann , Andriy Gapon , freebsd-hackers@freebsd.org Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: alc@freebsd.org List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Sep 2010 16:16:09 -0000 On Tue, Sep 21, 2010 at 1:39 AM, Jeff Roberson wrote: > On Tue, 21 Sep 2010, Andriy Gapon wrote: > > on 19/09/2010 11:42 Andriy Gapon said the following: >> >>> on 19/09/2010 11:27 Jeff Roberson said the following: >>> >>>> I don't like this because even with very large buffers you can still >>>> have high >>>> enough turnover to require per-cpu caching. Kip specifically added UMA >>>> support >>>> to address this issue in zfs. If you have allocations which don't >>>> require >>>> per-cpu caching and are very large why even use UMA? >>>> >>> >>> Good point. >>> Right now I am running with 4 items/bucket limit for items larger than >>> 32KB. >>> >> >> But I also have two counter-points actually :) >> 1. Uniformity. E.g. you can handle all ZFS I/O buffers via the same >> mechanism >> regardless of buffer size. >> 2. (Open)Solaris does that for a while and it seems to suit them well. >> Not >> saying that they are perfect, or the best, or an example to follow, but >> still >> that means quite a bit (for me). >> > > I'm afraid there is not enough context here for me to know what 'the same > mechanism' is or what solaris does. Can you elaborate? > > I prefer not to take the weight of specific examples too heavily when > considering the allocator as it must handle many cases and many types of > systems. I believe there are cases where you want large allocations to be > handled by per-cpu caches, regardless of whether ZFS is one such case. If > ZFS does not need them, then it should simply allocate directly from the VM. > However, I don't want to introduce some maximum constraint unless it can be > shown that adequate behavior is not generated from some more adaptable > algorithm. > > Actually, I think that there is a middle ground between "per-cpu caches" and "directly from the VM" that we are missing. When I've looked at the default configuration of ZFS (without the extra UMA zones enabled), there is an incredible amount of churn on the kmem map caused by the implementation of uma_large_malloc() and uma_large_free() going directly to the kmem map. Not only are the obvious things happening, like allocating and freeing kernel virtual addresses and underlying physical pages on every call, but also system-wide TLB shootdowns and sometimes superpage demotions are occurring. I have some trouble believing that the large allocations being performed by ZFS really need per-CPU caching, but I can certainly believe that they could benefit from not going directly to the kmem map on every uma_large_malloc() and uma_large_free(). In other words, I think it would make a lot of sense to have a thin layer between UMA and the kmem map that caches allocated but unused ranges of pages. Regards, Alan