From owner-freebsd-current@FreeBSD.ORG  Mon Nov  1 19:16:15 2010
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D69C6106566B;
	Mon,  1 Nov 2010 19:16:15 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id A33428FC14;
	Mon,  1 Nov 2010 19:16:15 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 3FB6246B65;
	Mon,  1 Nov 2010 15:16:15 -0400 (EDT)
Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6E6FA8A009;
	Mon,  1 Nov 2010 15:16:10 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-current@freebsd.org
Date: Mon, 1 Nov 2010 15:14:50 -0400
User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; )
References: <4C9B9B9C.6000807@freebsd.org> <4CBD40E2.7030507@freebsd.org>
	<AANLkTi=6aMwph7-mh4QUUUiiox2S+N86agZRgAVw6=QT@mail.gmail.com>
In-Reply-To: <AANLkTi=6aMwph7-mh4QUUUiiox2S+N86agZRgAVw6=QT@mail.gmail.com>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201011011514.50322.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6
	(bigwig.baldwin.cx); Mon, 01 Nov 2010 15:16:10 -0400 (EDT)
X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-1.9 required=4.2 tests=BAYES_00 autolearn=ham
	version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx
Cc: alc@freebsd.org, kib@freebsd.org, Giovanni Trematerra <gianni@freebsd.org>,
	Andriy Gapon <avg@freebsd.org>
Subject: Re: panic in uma_startup for many-core amd64 system
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 01 Nov 2010 19:16:15 -0000

On Monday, November 01, 2010 1:09:22 pm Giovanni Trematerra wrote:
> On Tue, Oct 19, 2010 at 8:55 AM, Andriy Gapon <avg@freebsd.org> wrote:
> > on 19/10/2010 00:01 Giovanni Trematerra said the following:
> >>
> >> Your patch seems just a work around about initial slab size where the
> >> keg is backed.
> >
> > Well, setting aside my confusion with the terminology - yes, the patch is 
just
> > that, and precisely because I only tried to solve that particular problem.
> >
> >> Having dynamic slab sizes would allow to have the keg backed on a larger 
slab
> >> without going OFFPAGE.
> >
> > I agree in principle.
> > But without seeing code that implements that I can't guess if it would 
really be
> > more efficient or more maintainable, i.e. more useful in general.
> > Still a very good idea.
> >
> 
> Here the patch that was in my mind.
> The patch doesn't implement dynamic slab size just allow
> to have a multipage slab to back uma_zone objects.
> I'm going to work more on the topic "dynamic slab size" soon.
> I tested the patch on qemu with -smp 32.
> I'm looking for real hw to test the patch on.
> 
> here some interesting output:
> qemulab# vmstat -z | more
> ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP
> 
> UMA Kegs:               208,      0,     149,       4,     149,   0,   0
> UMA Zones:             4480,      0,     149,       0,     149,   0,   0
> UMA Slabs:              568,      0,     836,       4,    1187,   0,   0
> UMA RCntSlabs:          568,      0,     202,       1,     202,   0,   0
> UMA Hash:               256,      0,       2,      13,       3,   0,   0
> 
> qemulab# sysctl kern | grep cpu
> kern.ccpu: 0
>   <cpu count="32" mask="0xffffffff">0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
> 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
> 28, 29, 30, 31</cpu>
> kern.smp.cpus: 32
> kern.smp.maxcpus: 32
> 
> Any feedback will be appreciate.
> 
> --
> Giovanni Trematerra
> 
> 
> ==============================================================
> diff -r 36572cbc6817 sys/vm/uma_core.c
> --- a/sys/vm/uma_core.c	Tue Oct 05 04:49:04 2010 -0400
> +++ b/sys/vm/uma_core.c	Mon Nov 01 11:54:38 2010 -0400
> @@ -930,27 +930,36 @@ startup_alloc(uma_zone_t zone, int bytes
>  {
>  	uma_keg_t keg;
>  	uma_slab_t tmps;
> -
> -	keg = zone_first_keg(zone);
> +	u_int16_t pages;
> +
> +	keg   = zone_first_keg(zone);
> +	pages = bytes / PAGE_SIZE;
> +
> +	/* Account for remainder */
> +	if ((pages * PAGE_SIZE) < bytes)
> +		pages++;
> +	KASSERT(pages > 0, ("startup_alloc can't reserve 0 pages\n"));

You can use 'pages = howmany(bytes, PAGE_SIZE)' here.  Also, why did you make 
pages a uint16_t instead of an int?  An int is generally more convenient 
unless you really need a uint16_t (and C99 spells it without an _ after the 
leading 'u'.. FYI).

>  	/*
>  	 * Check our small startup cache to see if it has pages remaining.
>  	 */
>  	mtx_lock(&uma_boot_pages_mtx);
> -	if ((tmps = LIST_FIRST(&uma_boot_pages)) != NULL) {
> -		LIST_REMOVE(tmps, us_link);
> +	do {
> +		if ((tmps = LIST_FIRST(&uma_boot_pages)) != NULL)
> +			LIST_REMOVE(tmps, us_link);
> +	} while (--pages && tmps != NULL);
> +	if (tmps != NULL) {
>  		mtx_unlock(&uma_boot_pages_mtx);
>  		*pflag = tmps->us_flags;
>  		return (tmps->us_data);
> -	}
> +	} else if (booted == 0)
> +		panic("UMA: Increase vm.boot_pages");
>  	mtx_unlock(&uma_boot_pages_mtx);
> -	if (booted == 0)
> -		panic("UMA: Increase vm.boot_pages");

Probably best to make the pages test here explicit.  Also, is there any reason 
you can't do this as:

	while (--pages > 0) {
		tmps = LIST_FIRST(&uma_boot_pages);
		if (tmps != NULL)
			LIST_REMOVE(tmps, us_link);
		else if (booted == 0)
			panic(...);
	}

One question btw, how does this work since if you need to allocate more than 1
page it seems that the 'tmps' values for all but the last are simply ignored
and leaked?

Is there some unwritten assumption that the free pages are all virtually
contiguous (and in order), so you can just pull off a run of X and use
the address from X?

>  	/*
>  	 * Now that we've booted reset these users to their real allocator.
>  	 */
>  #ifdef UMA_MD_SMALL_ALLOC
> -	keg->uk_allocf = uma_small_alloc;
> +	keg->uk_allocf = (keg->uk_ppera > 1) ? page_alloc : uma_small_alloc;
>  #else
>  	keg->uk_allocf = page_alloc;
>  #endif
> @@ -1163,7 +1172,7 @@ keg_small_init(uma_keg_t keg)
>  static void
>  keg_large_init(uma_keg_t keg)
>  {
> -	int pages;
> +	u_int16_t pages;
> 
>  	KASSERT(keg != NULL, ("Keg is null in keg_large_init"));
>  	KASSERT((keg->uk_flags & UMA_ZFLAG_CACHEONLY) == 0,
> @@ -1177,12 +1186,15 @@ keg_large_init(uma_keg_t keg)
> 
>  	keg->uk_ppera = pages;
>  	keg->uk_ipers = 1;
> +	keg->uk_rsize = keg->uk_size;
> +
> +	/* We can't do OFFPAGE if we're internal, bail out here. */
> +	if (keg->uk_flags & UMA_ZFLAG_INTERNAL)
> +		return;
> 
>  	keg->uk_flags |= UMA_ZONE_OFFPAGE;
>  	if ((keg->uk_flags & UMA_ZONE_VTOSLAB) == 0)
>  		keg->uk_flags |= UMA_ZONE_HASH;
> -
> -	keg->uk_rsize = keg->uk_size;
>  }
> 
>  static void
> @@ -1301,7 +1313,8 @@ keg_ctor(void *mem, int size, void *udat
>  #endif
>  		if (booted == 0)
>  			keg->uk_allocf = startup_alloc;
> -	}
> +	} else if (booted == 0 && (keg->uk_flags & UMA_ZFLAG_INTERNAL))
> +		keg->uk_allocf = startup_alloc;
> 
>  	/*
>  	 * Initialize keg's lock (shared among zones).
> @@ -1330,7 +1343,7 @@ keg_ctor(void *mem, int size, void *udat
>  		if (totsize & UMA_ALIGN_PTR)
>  			totsize = (totsize & ~UMA_ALIGN_PTR) +
>  			    (UMA_ALIGN_PTR + 1);
> -		keg->uk_pgoff = UMA_SLAB_SIZE - totsize;
> +		keg->uk_pgoff = (UMA_SLAB_SIZE * keg->uk_ppera) - totsize;
> 
>  		if (keg->uk_flags & UMA_ZONE_REFCNT)
>  			totsize = keg->uk_pgoff + sizeof(struct uma_slab_refcnt)
> @@ -1346,7 +1359,7 @@ keg_ctor(void *mem, int size, void *udat
>  		 * mathematically possible for all cases, so we make
>  		 * sure here anyway.
>  		 */
> -		if (totsize > UMA_SLAB_SIZE) {
> +		if (totsize > UMA_SLAB_SIZE * keg->uk_ppera) {
>  			printf("zone %s ipers %d rsize %d size %d\n",
>  			    zone->uz_name, keg->uk_ipers, keg->uk_rsize,
>  			    keg->uk_size);
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
> 

-- 
John Baldwin