Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 Oct 2014 14:22:52 +0100
From:      Svatopluk Kraus <onwahe@gmail.com>
To:        Alan Cox <alc@rice.edu>
Cc:        alc@freebsd.org, FreeBSD Arch <freebsd-arch@freebsd.org>
Subject:   Re: vm_page_array and VM_PHYSSEG_SPARSE
Message-ID:  <CAFHCsPV1H6XsOoDFitQFgJOP6u%2BgiEM=N--_7Q9uoWbYnAaeYQ@mail.gmail.com>
In-Reply-To: <544DED4C.3010501@rice.edu>
References:  <CAFHCsPWkq09_RRDz7fy3UgsRFv8ZbNKdAH2Ft0x6aVSwLPi6BQ@mail.gmail.com> <CAJUyCcPXBuLu0nvaCqpg8NJ6KzAX9BA1Rt%2BooD%2B3pzq%2BFV%2B%2BTQ@mail.gmail.com> <CAFHCsPWq9WqeFnx1a%2BStfSxj=jwcE9GPyVsoyh0%2Bazr3HmM6vQ@mail.gmail.com> <5428AF3B.1030906@rice.edu> <CAFHCsPWxF0G%2BbqBYgxH=WtV%2BSt_UTWZj%2BY2-PHfoYSLjC_Qpig@mail.gmail.com> <54497DC1.5070506@rice.edu> <CAFHCsPVj3PGbkSmkKsd2bGvmh3%2BdZLABi=AR7jQ4qJ8CigE=8Q@mail.gmail.com> <544DED4C.3010501@rice.edu>

index | next in thread | previous in thread | raw e-mail

[-- Attachment #1 --]
On Mon, Oct 27, 2014 at 7:59 AM, Alan Cox <alc@rice.edu> wrote:

>   On 10/24/2014 06:33, Svatopluk Kraus wrote:
>
>
> On Fri, Oct 24, 2014 at 12:14 AM, Alan Cox <alc@rice.edu> wrote:
>
>>  On 10/08/2014 10:38, Svatopluk Kraus wrote:
>> > On Mon, Sep 29, 2014 at 3:00 AM, Alan Cox <alc@rice.edu> wrote:
>> >
>> >>   On 09/27/2014 03:51, Svatopluk Kraus wrote:
>> >>
>> >>
>> >> On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox <alan.l.cox@gmail.com>
>> wrote:
>> >>
>> >>>
>> >>>  On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus <onwahe@gmail.com>
>> >>> wrote:
>> >>>
>> >>>> Hi,
>> >>>>
>> >>>> I and Michal are finishing new ARM pmap-v6 code. There is one problem
>> >>>> we've
>> >>>> dealt with somehow, but now we would like to do it better. It's about
>> >>>> physical pages which are allocated before vm subsystem is
>> initialized.
>> >>>> While later on these pages could be found in vm_page_array when
>> >>>> VM_PHYSSEG_DENSE memory model is used, it's not true for
>> >>>> VM_PHYSSEG_SPARSE
>> >>>> memory model. And ARM world uses VM_PHYSSEG_SPARSE model.
>> >>>>
>> >>>> It really would be nice to utilize vm_page_array for such
>> preallocated
>> >>>> physical pages even when VM_PHYSSEG_SPARSE memory model is used.
>> Things
>> >>>> could be much easier then. In our case, it's about pages which are
>> used
>> >>>> for
>> >>>> level 2 page tables. In VM_PHYSSEG_SPARSE model, we have two sets of
>> such
>> >>>> pages. First ones are preallocated and second ones are allocated
>> after vm
>> >>>> subsystem was inited. We must deal with each set differently. So
>> code is
>> >>>> more complex and so is debugging.
>> >>>>
>> >>>> Thus we need some method how to say that some part of physical memory
>> >>>> should be included in vm_page_array, but the pages from that region
>> >>>> should
>> >>>> not be put to free list during initialization. We think that such
>> >>>> possibility could be utilized in general. There could be a need for
>> some
>> >>>> physical space which:
>> >>>>
>> >>>> (1) is needed only during boot and later on it can be freed and put
>> to vm
>> >>>> subsystem,
>> >>>>
>> >>>> (2) is needed for something else and vm_page_array code could be used
>> >>>> without some kind of its duplication.
>> >>>>
>> >>>> There is already some code which deals with blacklisted pages in
>> >>>> vm_page.c
>> >>>> file. So the easiest way how to deal with presented situation is to
>> add
>> >>>> some callback to this part of code which will be able to either
>> exclude
>> >>>> whole phys_avail[i], phys_avail[i+1] region or single pages. As the
>> >>>> biggest
>> >>>> phys_avail region is used for vm subsystem allocations, there should
>> be
>> >>>> some more coding. (However, blacklisted pages are not dealt with on
>> that
>> >>>> part of region.)
>> >>>>
>> >>>> We would like to know if there is any objection:
>> >>>>
>> >>>> (1) to deal with presented problem,
>> >>>> (2) to deal with the problem presented way.
>> >>>> Some help is very appreciated. Thanks
>> >>>>
>> >>>>
>> >>> As an experiment, try modifying vm_phys.c to use dump_avail instead of
>> >>> phys_avail when sizing vm_page_array.  On amd64, where the same
>> problem
>> >>> exists, this allowed me to use VM_PHYSSEG_SPARSE.  Right now, this is
>> >>> probably my preferred solution.  The catch being that not all
>> architectures
>> >>> implement dump_avail, but my recollection is that arm does.
>> >>>
>> >> Frankly, I would prefer this too, but there is one big open question:
>> >>
>> >> What is dump_avail for?
>> >>
>> >>
>> >>
>> >> dump_avail[] is solving a similar problem in the minidump code, hence,
>> the
>> >> prefix "dump_" in its name.  In other words, the minidump code
>> couldn't use
>> >> phys_avail[] either because it didn't describe the full range of
>> physical
>> >> addresses that might be included in a minidump, so dump_avail[] was
>> created.
>> >>
>> >> There is already precedent for what I'm suggesting.  dump_avail[] is
>> >> already (ab)used outside of the minidump code on x86 to solve this same
>> >> problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c.
>> >>
>> >>
>> >>  Using it for vm_page_array initialization and segmentation means that
>> >> phys_avail must be a subset of it. And this must be stated and be
>> visible
>> >> enough. Maybe it should be even checked in code. I like the idea of
>> >> thinking about dump_avail as something what desribes all memory in a
>> >> system, but it's not how dump_avail is defined in archs now.
>> >>
>> >>
>> >>
>> >> When you say "it's not how dump_avail is defined in archs now", I'm not
>> >> sure whether you're talking about the code or the comments.  In terms
>> of
>> >> code, dump_avail[] is a superset of phys_avail[], and I'm not aware of
>> any
>> >> code that would have to change.  In terms of comments, I did a grep
>> looking
>> >> for comments defining what dump_avail[] is, because I couldn't remember
>> >> any.  I found one ... on arm.  So, I don't think it's a onerous task
>> >> changing the definition of dump_avail[].  :-)
>> >>
>> >> Already, as things stand today with dump_avail[] being used outside of
>> the
>> >> minidump code, one could reasonably argue that it should be renamed to
>> >> something like phys_exists[].
>> >>
>> >>
>> >>
>> >> I will experiment with it on monday then. However, it's not only about
>> how
>> >> memory segments are created in vm_phys.c, but it's about how
>> vm_page_array
>> >> size is computed in vm_page.c too.
>> >>
>> >>
>> >>
>> >> Yes, and there is also a place in vm_reserv.c that needs to change.
>>  I've
>> >> attached the patch that I developed and tested a long time ago.  It
>> still
>> >> applies cleanly and runs ok on amd64.
>> >>
>> >>
>> >>
>> >
>> >
>> > Well, I've created and tested minimalistic patch which - I hope - is
>> > commitable. It runs ok on pandaboard (arm-v6) and solves presented
>> problem.
>> > I would really appreciate if this will be commited. Thanks.
>>
>>
>> Sorry for the slow reply.  I've just been swamped with work lately.  I
>> finally had some time to look at this in the last day or so.
>>
>> The first thing that I propose to do is commit the attached patch.  This
>> patch changes pmap_init() on amd64, armv6, and i386 so that it no longer
>> consults phys_avail[] to determine the end of memory.  Instead, it calls
>> a new function provided by vm_phys.c to obtain the same information from
>> vm_phys_segs[].
>>
>> With this change, the new variable phys_managed in your patch wouldn't
>> need to be a global.  It could be a local variable in vm_page_startup()
>> that we pass as a parameter to vm_phys_init() and vm_reserv_init().
>>
>> More generally, the long-term vision that I have is that we would stop
>> using phys_avail[] after vm_page_startup() had completed.  It would only
>> be used during initialization.  After that we would use vm_phys_segs[]
>> and functions provided by vm_phys.c.
>>
>
> I understand. The patch and the long-term vision are fine for me. I just
> was not to bold to pass phys_managed as a parameter to vm_phys_init() and
> vm_reserv_init(). However, I certainly was thinking about it. While reading
> comment above vm_phys_get_end(), do we care of if last usable address is
> 0xFFFFFFFF?
>
>
>
> To date, this hasn't been a problem.  However, handling 0xFFFFFFFF is
> easy.  So, the final version of the patch that I committed this weekend
> does so.
>
> Can you please try the attached patch?  It replaces phys_avail[] with
> vm_phys_segs[] in arm's busdma.
>


It works fine on arm-v6 pandaboard. I have no objection to commit it.
However, it's only 1:1 replacement. In fact, I still keep the following
pattern in my head:

present memory in system <=> all RAM and whatsoever
nobounce memory <=> addressable by DMA
managed memory by vm subsystem  <=> i.e. kept in vm_page_array
available memory for vm subsystem <=> can be allocated

So, it's no problem to use phys_avail[], i.e. vm_phys_segs[], but it could
be too much limiting in some scenarios. I would like to see something
different in exclusion_bounce_check() in the future. Something what
reflects NOBOUNCE property and not NOALLOC one like now.



>
>
>
>  Do you think that the rest of my patch considering changes due to your
> patch is ok?
>
>
>
>
> Basically, yes.  I do, however, think that
>
> +#if defined(__arm__)
> +       phys_managed = dump_avail;
> +#else
> +       phys_managed = phys_avail;
> +#endif
>
> should also be conditioned on VM_PHYSSEG_SPARSE.
>



So I've prepared new patch. phys_managed[] is passed to vm_phys_init() and
vm_reserv_init() as a parameter and small optimalization is made in
vm_page_startup(). I add VM_PHYSSEG_SPARSE condition to place you
mentioned. Anyhow, I still think that this is only temporary hack. In
general, phys_managed[] should always be distinguished from phys_avail[].



>
>
>> >
>> > BTW, while I was inspecting all archs, I think that maybe it's time to
>> do
>> > what was done for busdma not long ago. There are many similar codes
>> across
>> > archs which deal with physical memory and could be generalized and put
>> to
>> > kern/subr_physmem.c for utilization. All work with physical memory
>> could be
>> > simplify to two arrays of regions.
>> >
>> > phys_present[] ... describes all present physical memory regions
>> > phys_exclude[] ... describes various exclusions from phys_present[]
>> >
>> > Each excluded region will be labeled by flags to say what kind of
>> exclusion
>> > it is. The flags like NODUMP, NOALLOC, NOMANAGE, NOBOUNCE, NOMEMRW
>> could
>> > be combined. This idea is taken from sys/arm/arm/physmem.c.
>> >
>> > All other arrays like phys_managed[], phys_avail[], dump_avail[] will be
>> > created from these phys_present[] and phys_exclude[].
>> > This way bootstrap codes in archs could be simplified and unified. For
>> > example, dealing with either hw.physmem or page with PA 0x00000000
>> could be
>> > transparent.
>> >
>> > I'm prepared to volunteer if the thing is ripe. However, some tutor
>> will be
>> > looked for.
>>
>>
>> I've never really looked at arm/arm/physmem.c before.  Let me do that
>> before I comment on this.
>>
>> No problem. This could be long-term aim. However, I hope the
> VM_PHYSSEG_SPARSE problem could be dealt with in MI code in present time.
> In every case, thanks for your help.
>
>
>
>
>

[-- Attachment #2 --]
Index: sys/vm/vm_page.c
===================================================================
--- sys/vm/vm_page.c	(revision 273734)
+++ sys/vm/vm_page.c	(working copy)
@@ -290,6 +290,7 @@
 	vm_paddr_t pa;
 	vm_paddr_t last_pa;
 	char *list;
+	vm_paddr_t *phys_managed;

 	/* the biggest memory array is the second group of pages */
 	vm_paddr_t end;
@@ -301,31 +302,39 @@
 	biggestone = 0;
 	vaddr = round_page(vaddr);

-	for (i = 0; phys_avail[i + 1]; i += 2) {
-		phys_avail[i] = round_page(phys_avail[i]);
-		phys_avail[i + 1] = trunc_page(phys_avail[i + 1]);
+#if defined(VM_PHYSSEG_SPARSE) && defined(__arm__)
+	phys_managed = dump_avail;
+#else
+	phys_managed = phys_avail;
+#endif
+
+	low_water = round_page(phys_managed[0]);
+	high_water = round_page(phys_managed[1]);
+	for (i = 2; phys_managed[i + 1]; i += 2) {
+		phys_managed[i] = round_page(phys_managed[i]);
+		phys_managed[i + 1] = trunc_page(phys_managed[i + 1]);
+		if (phys_managed[i] < low_water)
+			low_water = phys_managed[i];
+		if (phys_managed[i + 1] > high_water)
+			high_water = phys_managed[i + 1];
 	}

-	low_water = phys_avail[0];
-	high_water = phys_avail[1];
+#ifdef XEN
+	low_water = 0;
+#endif

 	for (i = 0; phys_avail[i + 1]; i += 2) {
-		vm_paddr_t size = phys_avail[i + 1] - phys_avail[i];
+		vm_paddr_t size;

+		phys_avail[i] = round_page(phys_avail[i]);
+		phys_avail[i + 1] = trunc_page(phys_avail[i + 1]);
+		size = phys_avail[i + 1] - phys_avail[i];
 		if (size > biggestsize) {
 			biggestone = i;
 			biggestsize = size;
 		}
-		if (phys_avail[i] < low_water)
-			low_water = phys_avail[i];
-		if (phys_avail[i + 1] > high_water)
-			high_water = phys_avail[i + 1];
 	}

-#ifdef XEN
-	low_water = 0;
-#endif
-
 	end = phys_avail[biggestone+1];

 	/*
@@ -393,8 +402,8 @@
 	first_page = low_water / PAGE_SIZE;
 #ifdef VM_PHYSSEG_SPARSE
 	page_range = 0;
-	for (i = 0; phys_avail[i + 1] != 0; i += 2)
-		page_range += atop(phys_avail[i + 1] - phys_avail[i]);
+	for (i = 0; phys_managed[i + 1] != 0; i += 2)
+		page_range += atop(phys_managed[i + 1] - phys_managed[i]);
 #elif defined(VM_PHYSSEG_DENSE)
 	page_range = high_water / PAGE_SIZE - first_page;
 #else
@@ -445,7 +454,7 @@
 	/*
 	 * Initialize the physical memory allocator.
 	 */
-	vm_phys_init();
+	vm_phys_init(phys_managed);

 	/*
 	 * Add every available physical page that is not blacklisted to
@@ -472,7 +481,7 @@
 	/*
 	 * Initialize the reservation management system.
 	 */
-	vm_reserv_init();
+	vm_reserv_init(phys_managed);
 #endif
 	return (vaddr);
 }
Index: sys/vm/vm_phys.c
===================================================================
--- sys/vm/vm_phys.c	(revision 273734)
+++ sys/vm/vm_phys.c	(working copy)
@@ -360,22 +360,22 @@
  * Initialize the physical memory allocator.
  */
 void
-vm_phys_init(void)
+vm_phys_init(vm_paddr_t *regions)
 {
 	struct vm_freelist *fl;
 	int dom, flind, i, oind, pind;

-	for (i = 0; phys_avail[i + 1] != 0; i += 2) {
+	for (i = 0; regions[i + 1] != 0; i += 2) {
 #ifdef	VM_FREELIST_ISADMA
-		if (phys_avail[i] < 16777216) {
-			if (phys_avail[i + 1] > 16777216) {
-				vm_phys_create_seg(phys_avail[i], 16777216,
+		if (regions[i] < 16777216) {
+			if (regions[i + 1] > 16777216) {
+				vm_phys_create_seg(regions[i], 16777216,
 				    VM_FREELIST_ISADMA);
-				vm_phys_create_seg(16777216, phys_avail[i + 1],
+				vm_phys_create_seg(16777216, regions[i + 1],
 				    VM_FREELIST_DEFAULT);
 			} else {
-				vm_phys_create_seg(phys_avail[i],
-				    phys_avail[i + 1], VM_FREELIST_ISADMA);
+				vm_phys_create_seg(regions[i], regions[i + 1],
+				    VM_FREELIST_ISADMA);
 			}
 			if (VM_FREELIST_ISADMA >= vm_nfreelists)
 				vm_nfreelists = VM_FREELIST_ISADMA + 1;
@@ -382,21 +382,21 @@
 		} else
 #endif
 #ifdef	VM_FREELIST_HIGHMEM
-		if (phys_avail[i + 1] > VM_HIGHMEM_ADDRESS) {
-			if (phys_avail[i] < VM_HIGHMEM_ADDRESS) {
-				vm_phys_create_seg(phys_avail[i],
+		if (regions[i + 1] > VM_HIGHMEM_ADDRESS) {
+			if (regions[i] < VM_HIGHMEM_ADDRESS) {
+				vm_phys_create_seg(regions[i],
 				    VM_HIGHMEM_ADDRESS, VM_FREELIST_DEFAULT);
 				vm_phys_create_seg(VM_HIGHMEM_ADDRESS,
-				    phys_avail[i + 1], VM_FREELIST_HIGHMEM);
+				    regions[i + 1], VM_FREELIST_HIGHMEM);
 			} else {
-				vm_phys_create_seg(phys_avail[i],
-				    phys_avail[i + 1], VM_FREELIST_HIGHMEM);
+				vm_phys_create_seg(regions[i], regions[i + 1],
+				    VM_FREELIST_HIGHMEM);
 			}
 			if (VM_FREELIST_HIGHMEM >= vm_nfreelists)
 				vm_nfreelists = VM_FREELIST_HIGHMEM + 1;
 		} else
 #endif
-		vm_phys_create_seg(phys_avail[i], phys_avail[i + 1],
+		vm_phys_create_seg(regions[i], regions[i + 1],
 		    VM_FREELIST_DEFAULT);
 	}
 	for (dom = 0; dom < vm_ndomains; dom++) {
Index: sys/vm/vm_phys.h
===================================================================
--- sys/vm/vm_phys.h	(revision 273734)
+++ sys/vm/vm_phys.h	(working copy)
@@ -80,7 +80,7 @@
 vm_page_t vm_phys_fictitious_to_vm_page(vm_paddr_t pa);
 void vm_phys_free_contig(vm_page_t m, u_long npages);
 void vm_phys_free_pages(vm_page_t m, int order);
-void vm_phys_init(void);
+void vm_phys_init(vm_paddr_t *regions);
 vm_page_t vm_phys_paddr_to_vm_page(vm_paddr_t pa);
 void vm_phys_set_pool(int pool, vm_page_t m, int order);
 boolean_t vm_phys_unfree_page(vm_page_t m);
Index: sys/vm/vm_reserv.c
===================================================================
--- sys/vm/vm_reserv.c	(revision 273734)
+++ sys/vm/vm_reserv.c	(working copy)
@@ -815,7 +815,7 @@
  * Requires that vm_page_array and first_page are initialized!
  */
 void
-vm_reserv_init(void)
+vm_reserv_init(vm_paddr_t *regions)
 {
 	vm_paddr_t paddr;
 	int i;
@@ -824,9 +824,9 @@
 	 * Initialize the reservation array.  Specifically, initialize the
 	 * "pages" field for every element that has an underlying superpage.
 	 */
-	for (i = 0; phys_avail[i + 1] != 0; i += 2) {
-		paddr = roundup2(phys_avail[i], VM_LEVEL_0_SIZE);
-		while (paddr + VM_LEVEL_0_SIZE <= phys_avail[i + 1]) {
+	for (i = 0; regions[i + 1] != 0; i += 2) {
+		paddr = roundup2(regions[i], VM_LEVEL_0_SIZE);
+		while (paddr + VM_LEVEL_0_SIZE <= regions[i + 1]) {
 			vm_reserv_array[paddr >> VM_LEVEL_0_SHIFT].pages =
 			    PHYS_TO_VM_PAGE(paddr);
 			paddr += VM_LEVEL_0_SIZE;
Index: sys/vm/vm_reserv.h
===================================================================
--- sys/vm/vm_reserv.h	(revision 273734)
+++ sys/vm/vm_reserv.h	(working copy)
@@ -52,7 +52,7 @@
 		    vm_page_t mpred);
 void		vm_reserv_break_all(vm_object_t object);
 boolean_t	vm_reserv_free_page(vm_page_t m);
-void		vm_reserv_init(void);
+void		vm_reserv_init(vm_paddr_t *regions);
 int		vm_reserv_level_iffullpop(vm_page_t m);
 boolean_t	vm_reserv_reactivate_page(vm_page_t m);
 boolean_t	vm_reserv_reclaim_contig(u_long npages, vm_paddr_t low,
home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFHCsPV1H6XsOoDFitQFgJOP6u%2BgiEM=N--_7Q9uoWbYnAaeYQ>