From owner-freebsd-current@FreeBSD.ORG Mon Nov 1 19:16:15 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D69C6106566B; Mon, 1 Nov 2010 19:16:15 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id A33428FC14; Mon, 1 Nov 2010 19:16:15 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 3FB6246B65; Mon, 1 Nov 2010 15:16:15 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6E6FA8A009; Mon, 1 Nov 2010 15:16:10 -0400 (EDT) From: John Baldwin To: freebsd-current@freebsd.org Date: Mon, 1 Nov 2010 15:14:50 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; ) References: <4C9B9B9C.6000807@freebsd.org> <4CBD40E2.7030507@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201011011514.50322.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Mon, 01 Nov 2010 15:16:10 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.9 required=4.2 tests=BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx Cc: alc@freebsd.org, kib@freebsd.org, Giovanni Trematerra , Andriy Gapon Subject: Re: panic in uma_startup for many-core amd64 system X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Nov 2010 19:16:15 -0000 On Monday, November 01, 2010 1:09:22 pm Giovanni Trematerra wrote: > On Tue, Oct 19, 2010 at 8:55 AM, Andriy Gapon wrote: > > on 19/10/2010 00:01 Giovanni Trematerra said the following: > >> > >> Your patch seems just a work around about initial slab size where the > >> keg is backed. > > > > Well, setting aside my confusion with the terminology - yes, the patch is just > > that, and precisely because I only tried to solve that particular problem. > > > >> Having dynamic slab sizes would allow to have the keg backed on a larger slab > >> without going OFFPAGE. > > > > I agree in principle. > > But without seeing code that implements that I can't guess if it would really be > > more efficient or more maintainable, i.e. more useful in general. > > Still a very good idea. > > > > Here the patch that was in my mind. > The patch doesn't implement dynamic slab size just allow > to have a multipage slab to back uma_zone objects. > I'm going to work more on the topic "dynamic slab size" soon. > I tested the patch on qemu with -smp 32. > I'm looking for real hw to test the patch on. > > here some interesting output: > qemulab# vmstat -z | more > ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP > > UMA Kegs: 208, 0, 149, 4, 149, 0, 0 > UMA Zones: 4480, 0, 149, 0, 149, 0, 0 > UMA Slabs: 568, 0, 836, 4, 1187, 0, 0 > UMA RCntSlabs: 568, 0, 202, 1, 202, 0, 0 > UMA Hash: 256, 0, 2, 13, 3, 0, 0 > > qemulab# sysctl kern | grep cpu > kern.ccpu: 0 > 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, > 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, > 28, 29, 30, 31 > kern.smp.cpus: 32 > kern.smp.maxcpus: 32 > > Any feedback will be appreciate. > > -- > Giovanni Trematerra > > > ============================================================== > diff -r 36572cbc6817 sys/vm/uma_core.c > --- a/sys/vm/uma_core.c Tue Oct 05 04:49:04 2010 -0400 > +++ b/sys/vm/uma_core.c Mon Nov 01 11:54:38 2010 -0400 > @@ -930,27 +930,36 @@ startup_alloc(uma_zone_t zone, int bytes > { > uma_keg_t keg; > uma_slab_t tmps; > - > - keg = zone_first_keg(zone); > + u_int16_t pages; > + > + keg = zone_first_keg(zone); > + pages = bytes / PAGE_SIZE; > + > + /* Account for remainder */ > + if ((pages * PAGE_SIZE) < bytes) > + pages++; > + KASSERT(pages > 0, ("startup_alloc can't reserve 0 pages\n")); You can use 'pages = howmany(bytes, PAGE_SIZE)' here. Also, why did you make pages a uint16_t instead of an int? An int is generally more convenient unless you really need a uint16_t (and C99 spells it without an _ after the leading 'u'.. FYI). > /* > * Check our small startup cache to see if it has pages remaining. > */ > mtx_lock(&uma_boot_pages_mtx); > - if ((tmps = LIST_FIRST(&uma_boot_pages)) != NULL) { > - LIST_REMOVE(tmps, us_link); > + do { > + if ((tmps = LIST_FIRST(&uma_boot_pages)) != NULL) > + LIST_REMOVE(tmps, us_link); > + } while (--pages && tmps != NULL); > + if (tmps != NULL) { > mtx_unlock(&uma_boot_pages_mtx); > *pflag = tmps->us_flags; > return (tmps->us_data); > - } > + } else if (booted == 0) > + panic("UMA: Increase vm.boot_pages"); > mtx_unlock(&uma_boot_pages_mtx); > - if (booted == 0) > - panic("UMA: Increase vm.boot_pages"); Probably best to make the pages test here explicit. Also, is there any reason you can't do this as: while (--pages > 0) { tmps = LIST_FIRST(&uma_boot_pages); if (tmps != NULL) LIST_REMOVE(tmps, us_link); else if (booted == 0) panic(...); } One question btw, how does this work since if you need to allocate more than 1 page it seems that the 'tmps' values for all but the last are simply ignored and leaked? Is there some unwritten assumption that the free pages are all virtually contiguous (and in order), so you can just pull off a run of X and use the address from X? > /* > * Now that we've booted reset these users to their real allocator. > */ > #ifdef UMA_MD_SMALL_ALLOC > - keg->uk_allocf = uma_small_alloc; > + keg->uk_allocf = (keg->uk_ppera > 1) ? page_alloc : uma_small_alloc; > #else > keg->uk_allocf = page_alloc; > #endif > @@ -1163,7 +1172,7 @@ keg_small_init(uma_keg_t keg) > static void > keg_large_init(uma_keg_t keg) > { > - int pages; > + u_int16_t pages; > > KASSERT(keg != NULL, ("Keg is null in keg_large_init")); > KASSERT((keg->uk_flags & UMA_ZFLAG_CACHEONLY) == 0, > @@ -1177,12 +1186,15 @@ keg_large_init(uma_keg_t keg) > > keg->uk_ppera = pages; > keg->uk_ipers = 1; > + keg->uk_rsize = keg->uk_size; > + > + /* We can't do OFFPAGE if we're internal, bail out here. */ > + if (keg->uk_flags & UMA_ZFLAG_INTERNAL) > + return; > > keg->uk_flags |= UMA_ZONE_OFFPAGE; > if ((keg->uk_flags & UMA_ZONE_VTOSLAB) == 0) > keg->uk_flags |= UMA_ZONE_HASH; > - > - keg->uk_rsize = keg->uk_size; > } > > static void > @@ -1301,7 +1313,8 @@ keg_ctor(void *mem, int size, void *udat > #endif > if (booted == 0) > keg->uk_allocf = startup_alloc; > - } > + } else if (booted == 0 && (keg->uk_flags & UMA_ZFLAG_INTERNAL)) > + keg->uk_allocf = startup_alloc; > > /* > * Initialize keg's lock (shared among zones). > @@ -1330,7 +1343,7 @@ keg_ctor(void *mem, int size, void *udat > if (totsize & UMA_ALIGN_PTR) > totsize = (totsize & ~UMA_ALIGN_PTR) + > (UMA_ALIGN_PTR + 1); > - keg->uk_pgoff = UMA_SLAB_SIZE - totsize; > + keg->uk_pgoff = (UMA_SLAB_SIZE * keg->uk_ppera) - totsize; > > if (keg->uk_flags & UMA_ZONE_REFCNT) > totsize = keg->uk_pgoff + sizeof(struct uma_slab_refcnt) > @@ -1346,7 +1359,7 @@ keg_ctor(void *mem, int size, void *udat > * mathematically possible for all cases, so we make > * sure here anyway. > */ > - if (totsize > UMA_SLAB_SIZE) { > + if (totsize > UMA_SLAB_SIZE * keg->uk_ppera) { > printf("zone %s ipers %d rsize %d size %d\n", > zone->uz_name, keg->uk_ipers, keg->uk_rsize, > keg->uk_size); > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > -- John Baldwin