Date: Mon, 29 Sep 2014 17:51:32 +0200 From: Svatopluk Kraus <onwahe@gmail.com> To: Alan Cox <alc@rice.edu> Cc: alc@freebsd.org, FreeBSD Arch <freebsd-arch@freebsd.org> Subject: Re: vm_page_array and VM_PHYSSEG_SPARSE Message-ID: <CAFHCsPXE8FTBeii2wQDmkqGBZ-jJta0cQSVQBACs95ert-hVqw@mail.gmail.com> In-Reply-To: <5428AF3B.1030906@rice.edu> References: <CAFHCsPWkq09_RRDz7fy3UgsRFv8ZbNKdAH2Ft0x6aVSwLPi6BQ@mail.gmail.com> <CAJUyCcPXBuLu0nvaCqpg8NJ6KzAX9BA1Rt%2BooD%2B3pzq%2BFV%2B%2BTQ@mail.gmail.com> <CAFHCsPWq9WqeFnx1a%2BStfSxj=jwcE9GPyVsoyh0%2Bazr3HmM6vQ@mail.gmail.com> <5428AF3B.1030906@rice.edu>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --]
On Mon, Sep 29, 2014 at 3:00 AM, Alan Cox <alc@rice.edu> wrote:
> On 09/27/2014 03:51, Svatopluk Kraus wrote:
>
>
> On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox <alan.l.cox@gmail.com> wrote:
>
>>
>>
>> On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus <onwahe@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I and Michal are finishing new ARM pmap-v6 code. There is one problem
>>> we've
>>> dealt with somehow, but now we would like to do it better. It's about
>>> physical pages which are allocated before vm subsystem is initialized.
>>> While later on these pages could be found in vm_page_array when
>>> VM_PHYSSEG_DENSE memory model is used, it's not true for
>>> VM_PHYSSEG_SPARSE
>>> memory model. And ARM world uses VM_PHYSSEG_SPARSE model.
>>>
>>> It really would be nice to utilize vm_page_array for such preallocated
>>> physical pages even when VM_PHYSSEG_SPARSE memory model is used. Things
>>> could be much easier then. In our case, it's about pages which are used
>>> for
>>> level 2 page tables. In VM_PHYSSEG_SPARSE model, we have two sets of such
>>> pages. First ones are preallocated and second ones are allocated after vm
>>> subsystem was inited. We must deal with each set differently. So code is
>>> more complex and so is debugging.
>>>
>>> Thus we need some method how to say that some part of physical memory
>>> should be included in vm_page_array, but the pages from that region
>>> should
>>> not be put to free list during initialization. We think that such
>>> possibility could be utilized in general. There could be a need for some
>>> physical space which:
>>>
>>> (1) is needed only during boot and later on it can be freed and put to vm
>>> subsystem,
>>>
>>> (2) is needed for something else and vm_page_array code could be used
>>> without some kind of its duplication.
>>>
>>> There is already some code which deals with blacklisted pages in
>>> vm_page.c
>>> file. So the easiest way how to deal with presented situation is to add
>>> some callback to this part of code which will be able to either exclude
>>> whole phys_avail[i], phys_avail[i+1] region or single pages. As the
>>> biggest
>>> phys_avail region is used for vm subsystem allocations, there should be
>>> some more coding. (However, blacklisted pages are not dealt with on that
>>> part of region.)
>>>
>>> We would like to know if there is any objection:
>>>
>>> (1) to deal with presented problem,
>>> (2) to deal with the problem presented way.
>>> Some help is very appreciated. Thanks
>>>
>>>
>>
>> As an experiment, try modifying vm_phys.c to use dump_avail instead of
>> phys_avail when sizing vm_page_array. On amd64, where the same problem
>> exists, this allowed me to use VM_PHYSSEG_SPARSE. Right now, this is
>> probably my preferred solution. The catch being that not all architectures
>> implement dump_avail, but my recollection is that arm does.
>>
>
> Frankly, I would prefer this too, but there is one big open question:
>
> What is dump_avail for?
>
>
>
> dump_avail[] is solving a similar problem in the minidump code, hence, the
> prefix "dump_" in its name. In other words, the minidump code couldn't use
> phys_avail[] either because it didn't describe the full range of physical
> addresses that might be included in a minidump, so dump_avail[] was created.
>
> There is already precedent for what I'm suggesting. dump_avail[] is
> already (ab)used outside of the minidump code on x86 to solve this same
> problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c.
>
>
> Using it for vm_page_array initialization and segmentation means that
> phys_avail must be a subset of it. And this must be stated and be visible
> enough. Maybe it should be even checked in code. I like the idea of
> thinking about dump_avail as something what desribes all memory in a
> system, but it's not how dump_avail is defined in archs now.
>
>
>
> When you say "it's not how dump_avail is defined in archs now", I'm not
> sure whether you're talking about the code or the comments. In terms of
> code, dump_avail[] is a superset of phys_avail[], and I'm not aware of any
> code that would have to change. In terms of comments, I did a grep looking
> for comments defining what dump_avail[] is, because I couldn't remember
> any. I found one ... on arm. So, I don't think it's a onerous task
> changing the definition of dump_avail[]. :-)
>
> Already, as things stand today with dump_avail[] being used outside of the
> minidump code, one could reasonably argue that it should be renamed to
> something like phys_exists[].
>
>
>
> I will experiment with it on monday then. However, it's not only about how
> memory segments are created in vm_phys.c, but it's about how vm_page_array
> size is computed in vm_page.c too.
>
>
>
> Yes, and there is also a place in vm_reserv.c that needs to change. I've
> attached the patch that I developed and tested a long time ago. It still
> applies cleanly and runs ok on amd64.
>
>
I took your patch and added some changes to vm_page.c to make - IMHO -
things more consistent across dense and sparse cases. It runs ok on arm
(odroid-xu). New patch is attached.
I've investigated dump_avail and phys_avail in other archs. In mips,
dump_avail is equal to phys_avail, so it should run with no difference
there. However, sys\mips\atheros\ar71xx_machdep.c should be fixed
probably. There is no dump_avail definition in sparc64 and powerpc. There,
dump_avail could be defined same way like in mips. So, it should run with
no problem too. The involved files are:
sys\powerpc\aim\mmu_oea.c
sys\powerpc\aim\mmu_oea64.c
sys\powerpc\booke\pmap.c
sys\sparc64\sparc64\pmap.c
There are some files where I can imagine that phys_avail could be replaced
by dump_avail as a matter of purity.
sys\arm\arm\busdma_machdep.c
sys\mips\mips\busdma_machdep.c
sys\arm\arm\busdma_machdep-v6.c
sys\sparc64\sparc64\mem.c
sys\mips\mips\minidump_machdep.c
I agree that dump_avail should be renamed if this change happen.
I'm prepared to work on full patch.
Svata
[-- Attachment #2 --]
Index: sys/vm/vm_page.c
===================================================================
--- sys/vm/vm_page.c (revision 272281)
+++ sys/vm/vm_page.c (working copy)
@@ -301,31 +301,38 @@
biggestone = 0;
vaddr = round_page(vaddr);
- for (i = 0; phys_avail[i + 1]; i += 2) {
- phys_avail[i] = round_page(phys_avail[i]);
- phys_avail[i + 1] = trunc_page(phys_avail[i + 1]);
+ for (i = 0; dump_avail[i + 1]; i += 2) {
+ dump_avail[i] = round_page(dump_avail[i]);
+ dump_avail[i + 1] = trunc_page(dump_avail[i + 1]);
}
- low_water = phys_avail[0];
- high_water = phys_avail[1];
+ low_water = dump_avail[0];
+ high_water = dump_avail[1];
+ for (i = 0; dump_avail[i + 1]; i += 2) {
+ if (dump_avail[i] < low_water)
+ low_water = dump_avail[i];
+ if (dump_avail[i + 1] > high_water)
+ high_water = dump_avail[i + 1];
+ }
+
+#ifdef XEN
+ low_water = 0;
+#endif
+
for (i = 0; phys_avail[i + 1]; i += 2) {
- vm_paddr_t size = phys_avail[i + 1] - phys_avail[i];
+ vm_paddr_t size;
+ phys_avail[i] = round_page(phys_avail[i]);
+ phys_avail[i + 1] = trunc_page(phys_avail[i + 1]);
+
+ size = phys_avail[i + 1] - phys_avail[i];
if (size > biggestsize) {
biggestone = i;
biggestsize = size;
}
- if (phys_avail[i] < low_water)
- low_water = phys_avail[i];
- if (phys_avail[i + 1] > high_water)
- high_water = phys_avail[i + 1];
}
-#ifdef XEN
- low_water = 0;
-#endif
-
end = phys_avail[biggestone+1];
/*
@@ -393,8 +400,8 @@
first_page = low_water / PAGE_SIZE;
#ifdef VM_PHYSSEG_SPARSE
page_range = 0;
- for (i = 0; phys_avail[i + 1] != 0; i += 2)
- page_range += atop(phys_avail[i + 1] - phys_avail[i]);
+ for (i = 0; dump_avail[i + 1] != 0; i += 2)
+ page_range += atop(dump_avail[i + 1] - dump_avail[i]);
#elif defined(VM_PHYSSEG_DENSE)
page_range = high_water / PAGE_SIZE - first_page;
#else
Index: sys/vm/vm_phys.c
===================================================================
--- sys/vm/vm_phys.c (revision 272281)
+++ sys/vm/vm_phys.c (working copy)
@@ -365,17 +365,17 @@
struct vm_freelist *fl;
int dom, flind, i, oind, pind;
- for (i = 0; phys_avail[i + 1] != 0; i += 2) {
+ for (i = 0; dump_avail[i + 1] != 0; i += 2) {
#ifdef VM_FREELIST_ISADMA
- if (phys_avail[i] < 16777216) {
- if (phys_avail[i + 1] > 16777216) {
- vm_phys_create_seg(phys_avail[i], 16777216,
+ if (dump_avail[i] < 16777216) {
+ if (dump_avail[i + 1] > 16777216) {
+ vm_phys_create_seg(dump_avail[i], 16777216,
VM_FREELIST_ISADMA);
- vm_phys_create_seg(16777216, phys_avail[i + 1],
+ vm_phys_create_seg(16777216, dump_avail[i + 1],
VM_FREELIST_DEFAULT);
} else {
- vm_phys_create_seg(phys_avail[i],
- phys_avail[i + 1], VM_FREELIST_ISADMA);
+ vm_phys_create_seg(dump_avail[i],
+ dump_avail[i + 1], VM_FREELIST_ISADMA);
}
if (VM_FREELIST_ISADMA >= vm_nfreelists)
vm_nfreelists = VM_FREELIST_ISADMA + 1;
@@ -382,21 +382,21 @@
} else
#endif
#ifdef VM_FREELIST_HIGHMEM
- if (phys_avail[i + 1] > VM_HIGHMEM_ADDRESS) {
- if (phys_avail[i] < VM_HIGHMEM_ADDRESS) {
- vm_phys_create_seg(phys_avail[i],
+ if (dump_avail[i + 1] > VM_HIGHMEM_ADDRESS) {
+ if (dump_avail[i] < VM_HIGHMEM_ADDRESS) {
+ vm_phys_create_seg(dump_avail[i],
VM_HIGHMEM_ADDRESS, VM_FREELIST_DEFAULT);
vm_phys_create_seg(VM_HIGHMEM_ADDRESS,
- phys_avail[i + 1], VM_FREELIST_HIGHMEM);
+ dump_avail[i + 1], VM_FREELIST_HIGHMEM);
} else {
- vm_phys_create_seg(phys_avail[i],
- phys_avail[i + 1], VM_FREELIST_HIGHMEM);
+ vm_phys_create_seg(dump_avail[i],
+ dump_avail[i + 1], VM_FREELIST_HIGHMEM);
}
if (VM_FREELIST_HIGHMEM >= vm_nfreelists)
vm_nfreelists = VM_FREELIST_HIGHMEM + 1;
} else
#endif
- vm_phys_create_seg(phys_avail[i], phys_avail[i + 1],
+ vm_phys_create_seg(dump_avail[i], dump_avail[i + 1],
VM_FREELIST_DEFAULT);
}
for (dom = 0; dom < vm_ndomains; dom++) {
Index: sys/vm/vm_reserv.c
===================================================================
--- sys/vm/vm_reserv.c (revision 272281)
+++ sys/vm/vm_reserv.c (working copy)
@@ -824,9 +824,9 @@
* Initialize the reservation array. Specifically, initialize the
* "pages" field for every element that has an underlying superpage.
*/
- for (i = 0; phys_avail[i + 1] != 0; i += 2) {
- paddr = roundup2(phys_avail[i], VM_LEVEL_0_SIZE);
- while (paddr + VM_LEVEL_0_SIZE <= phys_avail[i + 1]) {
+ for (i = 0; dump_avail[i + 1] != 0; i += 2) {
+ paddr = roundup2(dump_avail[i], VM_LEVEL_0_SIZE);
+ while (paddr + VM_LEVEL_0_SIZE <= dump_avail[i + 1]) {
vm_reserv_array[paddr >> VM_LEVEL_0_SHIFT].pages =
PHYS_TO_VM_PAGE(paddr);
paddr += VM_LEVEL_0_SIZE;
help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFHCsPXE8FTBeii2wQDmkqGBZ-jJta0cQSVQBACs95ert-hVqw>
