From owner-freebsd-current@FreeBSD.ORG  Tue Nov  2 13:06:38 2010
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D5E0A1065672;
	Tue,  2 Nov 2010 13:06:38 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 97B4A8FC16;
	Tue,  2 Nov 2010 13:06:38 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 2E4AA46B1A;
	Tue,  2 Nov 2010 09:06:38 -0400 (EDT)
Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id 10BAA8A01D;
	Tue,  2 Nov 2010 09:06:37 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Giovanni Trematerra <gianni@freebsd.org>
Date: Tue, 2 Nov 2010 08:49:10 -0400
User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; )
References: <4C9B9B9C.6000807@freebsd.org> <201011011514.50322.jhb@freebsd.org>
	<AANLkTi=aJ9KgyK87RP42ZrM8h012-uDso+mmTfz8Lfin@mail.gmail.com>
In-Reply-To: <AANLkTi=aJ9KgyK87RP42ZrM8h012-uDso+mmTfz8Lfin@mail.gmail.com>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201011020849.10288.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6
	(bigwig.baldwin.cx); Tue, 02 Nov 2010 09:06:37 -0400 (EDT)
X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-1.9 required=4.2 tests=BAYES_00 autolearn=ham
	version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx
Cc: alc@freebsd.org, Attilio Rao <attilio@freebsd.org>,
	freebsd-current@freebsd.org, kib@freebsd.org,
	Andriy Gapon <avg@freebsd.org>
Subject: Re: panic in uma_startup for many-core amd64 system
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 02 Nov 2010 13:06:38 -0000

On Monday, November 01, 2010 6:02:19 pm Giovanni Trematerra wrote:
> On Mon, Nov 1, 2010 at 8:14 PM, John Baldwin <jhb@freebsd.org> wrote:
> > On Monday, November 01, 2010 1:09:22 pm Giovanni Trematerra wrote:
> >> On Tue, Oct 19, 2010 at 8:55 AM, Andriy Gapon <avg@freebsd.org> wrote:
> >> > on 19/10/2010 00:01 Giovanni Trematerra said the following:
> >> >>
> >> >> Your patch seems just a work around about initial slab size where the
> >> >> keg is backed.
> >> >
> >> > Well, setting aside my confusion with the terminology - yes, the patch is
> > just
> >> > that, and precisely because I only tried to solve that particular problem.
> >> >
> >> >> Having dynamic slab sizes would allow to have the keg backed on a larger
> > slab
> >> >> without going OFFPAGE.
> >> >
> >> > I agree in principle.
> >> > But without seeing code that implements that I can't guess if it would
> > really be
> >> > more efficient or more maintainable, i.e. more useful in general.
> >> > Still a very good idea.
> >> >
> >>
> >> Here the patch that was in my mind.
> >> The patch doesn't implement dynamic slab size just allow
> >> to have a multipage slab to back uma_zone objects.
> >> I'm going to work more on the topic "dynamic slab size" soon.
> >> I tested the patch on qemu with -smp 32.
> >> I'm looking for real hw to test the patch on.
> >>
> >> here some interesting output:
> >> qemulab# vmstat -z | more
> >> ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP
> >>
> >> UMA Kegs:               208,      0,     149,       4,     149,   0,   0
> >> UMA Zones:             4480,      0,     149,       0,     149,   0,   0
> >> UMA Slabs:              568,      0,     836,       4,    1187,   0,   0
> >> UMA RCntSlabs:          568,      0,     202,       1,     202,   0,   0
> >> UMA Hash:               256,      0,       2,      13,       3,   0,   0
> >>
> >> qemulab# sysctl kern | grep cpu
> >> kern.ccpu: 0
> >>   <cpu count="32" mask="0xffffffff">0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
> >> 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
> >> 28, 29, 30, 31</cpu>
> >> kern.smp.cpus: 32
> >> kern.smp.maxcpus: 32
> >>
> >> Any feedback will be appreciate.
> >>
> >> --
> >> Giovanni Trematerra
> >>
> >>
> >> ==============================================================
> >> diff -r 36572cbc6817 sys/vm/uma_core.c
> >> --- a/sys/vm/uma_core.c       Tue Oct 05 04:49:04 2010 -0400
> >> +++ b/sys/vm/uma_core.c       Mon Nov 01 11:54:38 2010 -0400
> >> @@ -930,27 +930,36 @@ startup_alloc(uma_zone_t zone, int bytes
> >>  {
> >>       uma_keg_t keg;
> >>       uma_slab_t tmps;
> >> -
> >> -     keg = zone_first_keg(zone);
> >> +     u_int16_t pages;
> >> +
> >> +     keg   = zone_first_keg(zone);
> >> +     pages = bytes / PAGE_SIZE;
> >> +
> >> +     /* Account for remainder */
> >> +     if ((pages * PAGE_SIZE) < bytes)
> >> +             pages++;
> >> +     KASSERT(pages > 0, ("startup_alloc can't reserve 0 pages\n"));
> >
> > You can use 'pages = howmany(bytes, PAGE_SIZE)' here.
> 
> Thanks for the hint.
> 
> > Also, why did you make
> > pages a uint16_t instead of an int?  An int is generally more convenient
> > unless you really need a uint16_t (and C99 spells it without an _ after the
> > leading 'u'.. FYI).
> 
> Uhm just to be coherent with field uk_ppera of struct keg, but I think
> I can just use an int.
> BTW is new code supposed to use C99 form even if the rest of the file
> use u_int* form?

Hmm, if it is matching existing code that is ok I guess.  I do use the C99
names for all new code personally.

> > Probably best to make the pages test here explicit.  Also, is there any reason
> > you can't do this as:
> >
> >        while (--pages > 0) {
> >                tmps = LIST_FIRST(&uma_boot_pages);
> >                if (tmps != NULL)
> >                        LIST_REMOVE(tmps, us_link);
> >                else if (booted == 0)
> >                        panic(...);
> >        }
> >
> 
> Well, no, even if I'll need to initialize tmps to NULL otherwise the
> compiler will
> raise a warning.
> do {} while(); might be still better than a while(){}. bytes parameter
> will never be
> zero so pages will always be at least one and KASSERT will catch some
> wired behavior.
> Anyway that looks to me more readable, thanks. I could add an "else
> break;" just in
> the few cases that "pages" is still > 0 and tmps == NULL, that could
> be useless though.

It is probably best to add the break.  You can still use a do {} while to
fix the compiler warning though, that's a legitimate reason. :)

> > One question btw, how does this work since if you need to allocate more than 1
> > page it seems that the 'tmps' values for all but the last are simply ignored
> > and leaked?
> 
> When you extract one item from the list you have tmps->us_data
> pointing to start address of the memory page. The pages are contiguous
> in decrescent
> order of address (see below) so when you extract 2 items the last one
> will point at
> the start address of 2 contiguous pages of memory, just what I need to return.

Ahhh, ok.  I would maybe add a comment to explain that it is ok to "lose"
the tmps references for this reason.

Hmm, one problem I see is that if you need to allocate 4 pages and only 3
are available, you will fail with 'tmps = NULL', but the 3 pages will be
lost.  I'm not sure how much we care about that case?  You could always do
two passes (one to see if N pages are available and fail the allocation
if not) and a second one to actually pull the pages off of the LIST.

-- 
John Baldwin