From owner-freebsd-net@FreeBSD.ORG  Thu Jan 30 08:47:08 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 75D9F310
 for <freebsd-net@FreeBSD.org>; Thu, 30 Jan 2014 08:47:08 +0000 (UTC)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id C57F8127F
 for <freebsd-net@FreeBSD.org>; Thu, 30 Jan 2014 08:47:07 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id KAA10574
 for <freebsd-net@freebsd.org>; Thu, 30 Jan 2014 10:46:59 +0200 (EET)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1W8nH0-0007cE-TK
 for freebsd-net@freebsd.org; Thu, 30 Jan 2014 10:46:58 +0200
Message-ID: <52EA114C.40908@FreeBSD.org>
Date: Thu, 30 Jan 2014 10:46:04 +0200
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:24.0) Gecko/20100101 Thunderbird/24.2.0
MIME-Version: 1.0
To: FreeBSD Net <freebsd-net@FreeBSD.org>
Subject: Re: Big physically contiguous mbuf clusters
References: <21225.20047.947384.390241@khavrinen.csail.mit.edu>
 <CAJ-VmomC5Ge3JwfUsgMrJ_rGqiYxfxR4wWzn5A-KAu7HBsueMw@mail.gmail.com>
 <20140129222714.GK93141@funkthat.com>
In-Reply-To: <20140129222714.GK93141@funkthat.com>
X-Enigmail-Version: 1.6
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jan 2014 08:47:08 -0000

on 30/01/2014 00:27 John-Mark Gurney said the following:
> Adrian Chadd wrote this message on Wed, Jan 29, 2014 at 14:21 -0800:
>> On 29 January 2014 10:54, Garrett Wollman <wollman@csail.mit.edu> wrote:
>>> Resolved: that mbuf clusters longer than one page ought not be
>>> supported.  There is too much physical-memory fragmentation for them
>>> to be of use on a moderately active server.  9k mbufs are especially
>>> bad, since in the fragmented case they waste 3k per allocation.
>>
>> I've been wondering whether it'd be feasible to teach the physical
>> memory allocator about >page sized allocations and to create zones of
>> slightly more physically contiguous memory.
>>
>> For servers with lots of memory we could then keep these around and
>> only dip into them for temporary allocations (eg not VM pages that may
>> be held for some unknown amount of time.)
>>
>> Question is - can we enforce that kind of behaviour?
> 
> It shouldn't be too hard to do...  Since everything pretty much goes
> through uma we can adopt a scheme similar to what Solaris does (read
> Magazines and Vmem: Extending the Slab Allocator to Many CPUs and
> Arbitrary Resources)...  Instead of dealing w/ page size allocations,
> everything is larger, say 16KB, and broken down from there...
> 

FWIW, this is not how it is currently implemented in Solaris judging from
OpenSolaris / illumos code.
They try to find a slab size where the waste would be minimal.  There is a cap
on the maximum slab size, of course.  This is also done for sub-page items.
E.g. if an item size is 3KB, then FreeBSD uma would use 4KB slabs and waste
about 1KB in each slab.  On the other hand, illumos kmem cache code would pick
12KB slab size.

-- 
Andriy Gapon