From owner-freebsd-current@FreeBSD.ORG Tue Nov 2 13:06:38 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D5E0A1065672; Tue, 2 Nov 2010 13:06:38 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 97B4A8FC16; Tue, 2 Nov 2010 13:06:38 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 2E4AA46B1A; Tue, 2 Nov 2010 09:06:38 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 10BAA8A01D; Tue, 2 Nov 2010 09:06:37 -0400 (EDT) From: John Baldwin To: Giovanni Trematerra Date: Tue, 2 Nov 2010 08:49:10 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; ) References: <4C9B9B9C.6000807@freebsd.org> <201011011514.50322.jhb@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201011020849.10288.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Tue, 02 Nov 2010 09:06:37 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.9 required=4.2 tests=BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx Cc: alc@freebsd.org, Attilio Rao , freebsd-current@freebsd.org, kib@freebsd.org, Andriy Gapon Subject: Re: panic in uma_startup for many-core amd64 system X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Nov 2010 13:06:38 -0000 On Monday, November 01, 2010 6:02:19 pm Giovanni Trematerra wrote: > On Mon, Nov 1, 2010 at 8:14 PM, John Baldwin wrote: > > On Monday, November 01, 2010 1:09:22 pm Giovanni Trematerra wrote: > >> On Tue, Oct 19, 2010 at 8:55 AM, Andriy Gapon wrote: > >> > on 19/10/2010 00:01 Giovanni Trematerra said the following: > >> >> > >> >> Your patch seems just a work around about initial slab size where the > >> >> keg is backed. > >> > > >> > Well, setting aside my confusion with the terminology - yes, the patch is > > just > >> > that, and precisely because I only tried to solve that particular problem. > >> > > >> >> Having dynamic slab sizes would allow to have the keg backed on a larger > > slab > >> >> without going OFFPAGE. > >> > > >> > I agree in principle. > >> > But without seeing code that implements that I can't guess if it would > > really be > >> > more efficient or more maintainable, i.e. more useful in general. > >> > Still a very good idea. > >> > > >> > >> Here the patch that was in my mind. > >> The patch doesn't implement dynamic slab size just allow > >> to have a multipage slab to back uma_zone objects. > >> I'm going to work more on the topic "dynamic slab size" soon. > >> I tested the patch on qemu with -smp 32. > >> I'm looking for real hw to test the patch on. > >> > >> here some interesting output: > >> qemulab# vmstat -z | more > >> ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP > >> > >> UMA Kegs: 208, 0, 149, 4, 149, 0, 0 > >> UMA Zones: 4480, 0, 149, 0, 149, 0, 0 > >> UMA Slabs: 568, 0, 836, 4, 1187, 0, 0 > >> UMA RCntSlabs: 568, 0, 202, 1, 202, 0, 0 > >> UMA Hash: 256, 0, 2, 13, 3, 0, 0 > >> > >> qemulab# sysctl kern | grep cpu > >> kern.ccpu: 0 > >> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, > >> 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, > >> 28, 29, 30, 31 > >> kern.smp.cpus: 32 > >> kern.smp.maxcpus: 32 > >> > >> Any feedback will be appreciate. > >> > >> -- > >> Giovanni Trematerra > >> > >> > >> ============================================================== > >> diff -r 36572cbc6817 sys/vm/uma_core.c > >> --- a/sys/vm/uma_core.c Tue Oct 05 04:49:04 2010 -0400 > >> +++ b/sys/vm/uma_core.c Mon Nov 01 11:54:38 2010 -0400 > >> @@ -930,27 +930,36 @@ startup_alloc(uma_zone_t zone, int bytes > >> { > >> uma_keg_t keg; > >> uma_slab_t tmps; > >> - > >> - keg = zone_first_keg(zone); > >> + u_int16_t pages; > >> + > >> + keg = zone_first_keg(zone); > >> + pages = bytes / PAGE_SIZE; > >> + > >> + /* Account for remainder */ > >> + if ((pages * PAGE_SIZE) < bytes) > >> + pages++; > >> + KASSERT(pages > 0, ("startup_alloc can't reserve 0 pages\n")); > > > > You can use 'pages = howmany(bytes, PAGE_SIZE)' here. > > Thanks for the hint. > > > Also, why did you make > > pages a uint16_t instead of an int? An int is generally more convenient > > unless you really need a uint16_t (and C99 spells it without an _ after the > > leading 'u'.. FYI). > > Uhm just to be coherent with field uk_ppera of struct keg, but I think > I can just use an int. > BTW is new code supposed to use C99 form even if the rest of the file > use u_int* form? Hmm, if it is matching existing code that is ok I guess. I do use the C99 names for all new code personally. > > Probably best to make the pages test here explicit. Also, is there any reason > > you can't do this as: > > > > while (--pages > 0) { > > tmps = LIST_FIRST(&uma_boot_pages); > > if (tmps != NULL) > > LIST_REMOVE(tmps, us_link); > > else if (booted == 0) > > panic(...); > > } > > > > Well, no, even if I'll need to initialize tmps to NULL otherwise the > compiler will > raise a warning. > do {} while(); might be still better than a while(){}. bytes parameter > will never be > zero so pages will always be at least one and KASSERT will catch some > wired behavior. > Anyway that looks to me more readable, thanks. I could add an "else > break;" just in > the few cases that "pages" is still > 0 and tmps == NULL, that could > be useless though. It is probably best to add the break. You can still use a do {} while to fix the compiler warning though, that's a legitimate reason. :) > > One question btw, how does this work since if you need to allocate more than 1 > > page it seems that the 'tmps' values for all but the last are simply ignored > > and leaked? > > When you extract one item from the list you have tmps->us_data > pointing to start address of the memory page. The pages are contiguous > in decrescent > order of address (see below) so when you extract 2 items the last one > will point at > the start address of 2 contiguous pages of memory, just what I need to return. Ahhh, ok. I would maybe add a comment to explain that it is ok to "lose" the tmps references for this reason. Hmm, one problem I see is that if you need to allocate 4 pages and only 3 are available, you will fail with 'tmps = NULL', but the 3 pages will be lost. I'm not sure how much we care about that case? You could always do two passes (one to see if N pages are available and fail the allocation if not) and a second one to actually pull the pages off of the LIST. -- John Baldwin