Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 1 Nov 2010 23:02:19 +0100
From:      Giovanni Trematerra <gianni@freebsd.org>
To:        John Baldwin <jhb@freebsd.org>
Cc:        alc@freebsd.org, Attilio Rao <attilio@freebsd.org>, freebsd-current@freebsd.org, kib@freebsd.org, Andriy Gapon <avg@freebsd.org>
Subject:   Re: panic in uma_startup for many-core amd64 system
Message-ID:  <AANLkTi=aJ9KgyK87RP42ZrM8h012-uDso%2BmmTfz8Lfin@mail.gmail.com>
In-Reply-To: <201011011514.50322.jhb@freebsd.org>
References:  <4C9B9B9C.6000807@freebsd.org> <4CBD40E2.7030507@freebsd.org> <AANLkTi=6aMwph7-mh4QUUUiiox2S%2BN86agZRgAVw6=QT@mail.gmail.com> <201011011514.50322.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Nov 1, 2010 at 8:14 PM, John Baldwin <jhb@freebsd.org> wrote:
> On Monday, November 01, 2010 1:09:22 pm Giovanni Trematerra wrote:
>> On Tue, Oct 19, 2010 at 8:55 AM, Andriy Gapon <avg@freebsd.org> wrote:
>> > on 19/10/2010 00:01 Giovanni Trematerra said the following:
>> >>
>> >> Your patch seems just a work around about initial slab size where the
>> >> keg is backed.
>> >
>> > Well, setting aside my confusion with the terminology - yes, the patch=
 is
> just
>> > that, and precisely because I only tried to solve that particular prob=
lem.
>> >
>> >> Having dynamic slab sizes would allow to have the keg backed on a lar=
ger
> slab
>> >> without going OFFPAGE.
>> >
>> > I agree in principle.
>> > But without seeing code that implements that I can't guess if it would
> really be
>> > more efficient or more maintainable, i.e. more useful in general.
>> > Still a very good idea.
>> >
>>
>> Here the patch that was in my mind.
>> The patch doesn't implement dynamic slab size just allow
>> to have a multipage slab to back uma_zone objects.
>> I'm going to work more on the topic "dynamic slab size" soon.
>> I tested the patch on qemu with -smp 32.
>> I'm looking for real hw to test the patch on.
>>
>> here some interesting output:
>> qemulab# vmstat -z | more
>> ITEM =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 SIZE =A0LIMIT =A0 =A0 USED =A0 =
=A0 FREE =A0 =A0 =A0REQ FAIL SLEEP
>>
>> UMA Kegs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 208, =A0 =A0 =A00, =A0 =A0 149, =
=A0 =A0 =A0 4, =A0 =A0 149, =A0 0, =A0 0
>> UMA Zones: =A0 =A0 =A0 =A0 =A0 =A0 4480, =A0 =A0 =A00, =A0 =A0 149, =A0 =
=A0 =A0 0, =A0 =A0 149, =A0 0, =A0 0
>> UMA Slabs: =A0 =A0 =A0 =A0 =A0 =A0 =A0568, =A0 =A0 =A00, =A0 =A0 836, =
=A0 =A0 =A0 4, =A0 =A01187, =A0 0, =A0 0
>> UMA RCntSlabs: =A0 =A0 =A0 =A0 =A0568, =A0 =A0 =A00, =A0 =A0 202, =A0 =
=A0 =A0 1, =A0 =A0 202, =A0 0, =A0 0
>> UMA Hash: =A0 =A0 =A0 =A0 =A0 =A0 =A0 256, =A0 =A0 =A00, =A0 =A0 =A0 2, =
=A0 =A0 =A013, =A0 =A0 =A0 3, =A0 0, =A0 0
>>
>> qemulab# sysctl kern | grep cpu
>> kern.ccpu: 0
>> =A0 <cpu count=3D"32" mask=3D"0xffffffff">0, 1, 2, 3, 4, 5, 6, 7, 8, 9, =
10,
>> 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
>> 28, 29, 30, 31</cpu>
>> kern.smp.cpus: 32
>> kern.smp.maxcpus: 32
>>
>> Any feedback will be appreciate.
>>
>> --
>> Giovanni Trematerra
>>
>>
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> diff -r 36572cbc6817 sys/vm/uma_core.c
>> --- a/sys/vm/uma_core.c =A0 =A0 =A0 Tue Oct 05 04:49:04 2010 -0400
>> +++ b/sys/vm/uma_core.c =A0 =A0 =A0 Mon Nov 01 11:54:38 2010 -0400
>> @@ -930,27 +930,36 @@ startup_alloc(uma_zone_t zone, int bytes
>> =A0{
>> =A0 =A0 =A0 uma_keg_t keg;
>> =A0 =A0 =A0 uma_slab_t tmps;
>> -
>> - =A0 =A0 keg =3D zone_first_keg(zone);
>> + =A0 =A0 u_int16_t pages;
>> +
>> + =A0 =A0 keg =A0 =3D zone_first_keg(zone);
>> + =A0 =A0 pages =3D bytes / PAGE_SIZE;
>> +
>> + =A0 =A0 /* Account for remainder */
>> + =A0 =A0 if ((pages * PAGE_SIZE) < bytes)
>> + =A0 =A0 =A0 =A0 =A0 =A0 pages++;
>> + =A0 =A0 KASSERT(pages > 0, ("startup_alloc can't reserve 0 pages\n"));
>
> You can use 'pages =3D howmany(bytes, PAGE_SIZE)' here.

Thanks for the hint.

> Also, why did you make
> pages a uint16_t instead of an int? =A0An int is generally more convenien=
t
> unless you really need a uint16_t (and C99 spells it without an _ after t=
he
> leading 'u'.. FYI).

Uhm just to be coherent with field uk_ppera of struct keg, but I think
I can just use an int.
BTW is new code supposed to use C99 form even if the rest of the file
use u_int* form?

>
>> =A0 =A0 =A0 /*
>> =A0 =A0 =A0 =A0* Check our small startup cache to see if it has pages re=
maining.
>> =A0 =A0 =A0 =A0*/
>> =A0 =A0 =A0 mtx_lock(&uma_boot_pages_mtx);
>> - =A0 =A0 if ((tmps =3D LIST_FIRST(&uma_boot_pages)) !=3D NULL) {
>> - =A0 =A0 =A0 =A0 =A0 =A0 LIST_REMOVE(tmps, us_link);
>> + =A0 =A0 do {
>> + =A0 =A0 =A0 =A0 =A0 =A0 if ((tmps =3D LIST_FIRST(&uma_boot_pages)) !=
=3D NULL)
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 LIST_REMOVE(tmps, us_link);
>> + =A0 =A0 } while (--pages && tmps !=3D NULL);
>> + =A0 =A0 if (tmps !=3D NULL) {
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 mtx_unlock(&uma_boot_pages_mtx);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 *pflag =3D tmps->us_flags;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 return (tmps->us_data);
>> - =A0 =A0 }
>> + =A0 =A0 } else if (booted =3D=3D 0)
>> + =A0 =A0 =A0 =A0 =A0 =A0 panic("UMA: Increase vm.boot_pages");
>> =A0 =A0 =A0 mtx_unlock(&uma_boot_pages_mtx);
>> - =A0 =A0 if (booted =3D=3D 0)
>> - =A0 =A0 =A0 =A0 =A0 =A0 panic("UMA: Increase vm.boot_pages");
>
> Probably best to make the pages test here explicit. =A0Also, is there any=
 reason
> you can't do this as:
>
> =A0 =A0 =A0 =A0while (--pages > 0) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0tmps =3D LIST_FIRST(&uma_boot_pages);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (tmps !=3D NULL)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0LIST_REMOVE(tmps, us_link)=
;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0else if (booted =3D=3D 0)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0panic(...);
> =A0 =A0 =A0 =A0}
>

Well, no, even if I'll need to initialize tmps to NULL otherwise the
compiler will
raise a warning.
do {} while(); might be still better than a while(){}. bytes parameter
will never be
zero so pages will always be at least one and KASSERT will catch some
wired behavior.
Anyway that looks to me more readable, thanks. I could add an "else
break;" just in
the few cases that "pages" is still > 0 and tmps =3D=3D NULL, that could
be useless though.

> One question btw, how does this work since if you need to allocate more t=
han 1
> page it seems that the 'tmps' values for all but the last are simply igno=
red
> and leaked?

When you extract one item from the list you have tmps->us_data
pointing to start address of the memory page. The pages are contiguous
in decrescent
order of address (see below) so when you extract 2 items the last one
will point at
the start address of 2 contiguous pages of memory, just what I need to retu=
rn.

>
> Is there some unwritten assumption that the free pages are all virtually
> contiguous (and in order), so you can just pull off a run of X and use
> the address from X?
>

sys/vm/vm_page.c:351 @@ vm_page_startup(vm_offset_t vaddr)

  /*
     * Allocate memory for use when boot strapping the kernel memory
     * allocator.
     */
    new_end =3D end - (boot_pages * UMA_SLAB_SIZE);
    new_end =3D trunc_page(new_end);
    mapped =3D pmap_map(&vaddr, new_end, end,
        VM_PROT_READ | VM_PROT_WRITE);
    bzero((void *)mapped, end - new_end);
    uma_startup((void *)mapped, boot_pages);

--------

sys/vm/uma_core.c:1683 @@ uma_startup(void *bootmem, int boot_pages)

#ifdef UMA_DEBUG
    printf("Filling boot free list.\n");
#endif
    for (i =3D 0; i < boot_pages; i++) {
        slab =3D (uma_slab_t)((u_int8_t *)bootmem + (i * UMA_SLAB_SIZE));
        slab->us_data =3D (u_int8_t *)slab;
        slab->us_flags =3D UMA_SLAB_BOOT;
        LIST_INSERT_HEAD(&uma_boot_pages, slab, us_link);
    }

So if I'm not wrong I'd say that the pages are virtually contiguous.

Just for the record, attilio@ made a light review of this patch in private.

--
Giovanni Trematerra



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTi=aJ9KgyK87RP42ZrM8h012-uDso%2BmmTfz8Lfin>