From owner-freebsd-hackers@FreeBSD.ORG  Tue Sep 30 11:19:26 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id CB544D16;
 Tue, 30 Sep 2014 11:19:26 +0000 (UTC)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id AB3B195B;
 Tue, 30 Sep 2014 11:19:25 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA21479;
 Tue, 30 Sep 2014 14:19:23 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1XYvSl-000G91-7i; Tue, 30 Sep 2014 14:19:23 +0300
Message-ID: <542A916A.2060703@FreeBSD.org>
Date: Tue, 30 Sep 2014 14:18:02 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:31.0) Gecko/20100101 Thunderbird/31.1.2
MIME-Version: 1.0
To: freebsd-hackers@FreeBSD.org
Subject: uk_slabsize, uk_ppera, uk_ipers, uk_pages
Content-Type: text/plain; charset=X-VIET-VPS
Content-Transfer-Encoding: 8bit
Cc: Gleb Smirnoff <glebius@FreeBSD.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Sep 2014 11:19:27 -0000


I have a hard time understanding how to use uk_slabsize, uk_ppera, uk_ipers,
uk_pages to derive other useful characteristics of UMA kegs.  This is despite
the good descriptions of the fields and multiple examples of their usage in the
code.  Unfortunately, I find those examples to be at least inconsistent and
possibly contradictory.

First problem is quite obvious.  uk_slabsize has a too narrow type.  For
example, ZFS creates many zones with item sizes larger than 64KB.  So,
obviously, uk_slabsize overflows.  Not sure how that affects further
calculation, if any, but probably not in a good way.
On the other hand, there is probably no harm at all, because as far as I can see
uk_slabsize is actually used only within keg_small_init().  It is set but not
used in keg_large_init() and keg_cachespread_init().  It does not seem to be
used after initialization.  So, maybe this field could be just converted to a
local variable in keg_small_init() ?

Now a second problem.  Even the names uk_ppera (pages per allocation) and
uk_ipers (items per slab) leave some room for ambiguity.  What is a relation
between the allocation and the slab?  It seems that some code assumes that the
slab takes the whole allocation (that is, one slab per allocation), other code
places multiple slabs into a single allocation, while other code looks
inconsistent in this respect.

For instance:
static void
keg_drain(uma_keg_t keg)
{
...
                LIST_REMOVE(slab, us_link);
                keg->uk_pages -= keg->uk_ppera;
                keg->uk_free -= keg->uk_ipers;

A slab is freed.  There is no question about uk_free.  But it is clear that the
code assumes the slab takes a whole allocation.  keg_alloc_slab() is symmetric
with these stats.

int
uma_zone_set_max(uma_zone_t zone, int nitems)
{
        uma_keg_t keg;

        keg = zone_first_keg(zone);
        if (keg == NULL)
                return (0);
        KEG_LOCK(keg);
        keg->uk_maxpages = (nitems / keg->uk_ipers) * keg->uk_ppera;
        if (keg->uk_maxpages * keg->uk_ipers < nitems)
                keg->uk_maxpages += keg->uk_ppera;
        nitems = keg->uk_maxpages * keg->uk_ipers;
        KEG_UNLOCK(keg);

        return (nitems);
}

The uk_maxpages calculation seems to assume that the allocation and the slab is
the same.  We first calculate a number of slabs needed to hold nitems and then
multiply that number by the number of pages per allocation.  But when we
calculate nitems to be returned we simply multiply uk_maxpages by uk_ipers as if
we assume that the slab size is 1 page regardless of uk_ppera.
uma_zone_get_max() calculates nitems in the same way without taking uk_ppera
into account.

uma_print_keg(): out is calculated as
	(keg->uk_ipers * keg->uk_pages) - keg->uk_free
while limit is calculated as:
	(keg->uk_maxpages / keg->uk_ppera) * keg->uk_ipers
In one case we simply multiply a number of pages by ipers, but in the other case
we first divide with uk_ppera.


My personal opinion is that we should settle on the rule that the slab and the
allocation map 1:1 and fix the code that does not conform to that.
It seems that this is how the code that allocates and frees slabs actually
works.  I do not see any good reason to support multiple slabs per an allocation.


P.S.
By the way, we have some wonderful things in UMA code that are not used anymore
(if ever?) and are scarcely documented.  Perhaps some of those could be removed
to simplify the code:
- UMA_ZONE_CACHESPREAD
- uma_zsecond_add()
More generally it looks like the support for multiple zones using the same keg
is quite useful.  On the other hand the support for a zone using multiple kegs
is of questionable utility and at present that capability is not used.

-- 
Andriy Gapon