From owner-freebsd-hackers@FreeBSD.ORG  Tue Sep 21 16:16:09 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5C77B106566B
	for <freebsd-hackers@freebsd.org>; Tue, 21 Sep 2010 16:16:09 +0000 (UTC)
	(envelope-from alan.l.cox@gmail.com)
Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com
	[209.85.216.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 9143A8FC1A
	for <freebsd-hackers@freebsd.org>; Tue, 21 Sep 2010 16:16:08 +0000 (UTC)
Received: by qwg5 with SMTP id 5so4985938qwg.13
	for <multiple recipients>; Tue, 21 Sep 2010 09:16:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:reply-to
	:in-reply-to:references:date:message-id:subject:from:to:cc
	:content-type; bh=NKUtT+HBidC3RbyT64+qU0jrci+Ce6CNOSpqazWAw3k=;
	b=LEqjht7K27SKPkF5XM2E7yHJrH46c9Nk5xmeDjhpR2pBgASdydZMeyZM+eY4EPjlxP
	kDvQSDhd1iAWMVGmcRuty+DJDw7OX3SdJkD4MDhTZjA7qoP/+eojEpaERSHv1C0m09qG
	4MiYYGHTvGum4XFukLWqH6D4yETO5Ua3RGRQk=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:reply-to:in-reply-to:references:date:message-id
	:subject:from:to:cc:content-type;
	b=nW2x4U3hkVi2IY4wxLIAt6s6Cb8Gc7V0zDTJezCPGKMq78C892RGNvC5uiaKFD6Vsp
	m6Eeaa5/ppLjnwtaaWjS3ScvHlMprvoiWWCSvgNJVxpxXUS/9J41dKZhBsfBmAhTZSkj
	BGp8+qj+l/Y1nRCGcXgCdaEcnmUBgkIAqPx8Q=
MIME-Version: 1.0
Received: by 10.224.79.28 with SMTP id n28mr6991262qak.175.1285085767738; Tue,
	21 Sep 2010 09:16:07 -0700 (PDT)
Received: by 10.229.37.85 with HTTP; Tue, 21 Sep 2010 09:16:07 -0700 (PDT)
In-Reply-To: <alpine.BSF.2.00.1009202037260.23448@desktop>
References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org>
	<alpine.BSF.2.00.1009181221560.86826@fledge.watson.org>
	<alpine.BSF.2.00.1009181135430.23448@desktop>
	<4C95C804.1010701@freebsd.org>
	<alpine.BSF.2.00.1009182225050.23448@desktop>
	<4C95CCDA.7010007@freebsd.org> <4C984E90.60507@freebsd.org>
	<alpine.BSF.2.00.1009202037260.23448@desktop>
Date: Tue, 21 Sep 2010 11:16:07 -0500
Message-ID: <AANLkTimy=2WUcH59R5spajrKkUYQnii9SD1ZDdMymNC+@mail.gmail.com>
From: Alan Cox <alan.l.cox@gmail.com>
To: Jeff Roberson <jroberson@jroberson.net>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: Robert Watson <rwatson@freebsd.org>, Jeff Roberson <jeff@freebsd.org>,
	Andre Oppermann <andre@freebsd.org>,
	Andriy Gapon <avg@freebsd.org>, freebsd-hackers@freebsd.org
Subject: Re: zfs + uma
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: alc@freebsd.org
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Sep 2010 16:16:09 -0000

On Tue, Sep 21, 2010 at 1:39 AM, Jeff Roberson <jroberson@jroberson.net>wrote:

> On Tue, 21 Sep 2010, Andriy Gapon wrote:
>
>  on 19/09/2010 11:42 Andriy Gapon said the following:
>>
>>> on 19/09/2010 11:27 Jeff Roberson said the following:
>>>
>>>> I don't like this because even with very large buffers you can still
>>>> have high
>>>> enough turnover to require per-cpu caching.  Kip specifically added UMA
>>>> support
>>>> to address this issue in zfs.  If you have allocations which don't
>>>> require
>>>> per-cpu caching and are very large why even use UMA?
>>>>
>>>
>>> Good point.
>>> Right now I am running with 4 items/bucket limit for items larger than
>>> 32KB.
>>>
>>
>> But I also have two counter-points actually :)
>> 1. Uniformity.  E.g. you can handle all ZFS I/O buffers via the same
>> mechanism
>> regardless of buffer size.
>> 2. (Open)Solaris does that for a while and it seems to suit them well.
>>  Not
>> saying that they are perfect, or the best, or an example to follow, but
>> still
>> that means quite a bit (for me).
>>
>
> I'm afraid there is not enough context here for me to know what 'the same
> mechanism' is or what solaris does.  Can you elaborate?
>
> I prefer not to take the weight of specific examples too heavily when
> considering the allocator as it must handle many cases and many types of
> systems.  I believe there are cases where you want large allocations to be
> handled by per-cpu caches, regardless of whether ZFS is one such case.  If
> ZFS does not need them, then it should simply allocate directly from the VM.
>  However, I don't want to introduce some maximum constraint unless it can be
> shown that adequate behavior is not generated from some more adaptable
> algorithm.
>
>
Actually, I think that there is a middle ground between "per-cpu caches" and
"directly from the VM" that we are missing.  When I've looked at the default
configuration of ZFS (without the extra UMA zones enabled), there is an
incredible amount of churn on the kmem map caused by the implementation of
uma_large_malloc() and uma_large_free() going directly to the kmem map.  Not
only are the obvious things happening, like allocating and freeing kernel
virtual addresses and underlying physical pages on every call, but also
system-wide TLB shootdowns and sometimes superpage demotions are occurring.

I have some trouble believing that the large allocations being performed by
ZFS really need per-CPU caching, but I can certainly believe that they could
benefit from not going directly to the kmem map on every uma_large_malloc()
and uma_large_free().  In other words, I think it would make a lot of sense
to have a thin layer between UMA and the kmem map that caches allocated but
unused ranges of pages.

Regards,
Alan