Date: Fri, 24 Oct 2014 13:33:15 +0200 From: Svatopluk Kraus <onwahe@gmail.com> To: Alan Cox <alc@rice.edu> Cc: alc@freebsd.org, FreeBSD Arch <freebsd-arch@freebsd.org> Subject: Re: vm_page_array and VM_PHYSSEG_SPARSE Message-ID: <CAFHCsPVj3PGbkSmkKsd2bGvmh3%2BdZLABi=AR7jQ4qJ8CigE=8Q@mail.gmail.com> In-Reply-To: <54497DC1.5070506@rice.edu> References: <CAFHCsPWkq09_RRDz7fy3UgsRFv8ZbNKdAH2Ft0x6aVSwLPi6BQ@mail.gmail.com> <CAJUyCcPXBuLu0nvaCqpg8NJ6KzAX9BA1Rt%2BooD%2B3pzq%2BFV%2B%2BTQ@mail.gmail.com> <CAFHCsPWq9WqeFnx1a%2BStfSxj=jwcE9GPyVsoyh0%2Bazr3HmM6vQ@mail.gmail.com> <5428AF3B.1030906@rice.edu> <CAFHCsPWxF0G%2BbqBYgxH=WtV%2BSt_UTWZj%2BY2-PHfoYSLjC_Qpig@mail.gmail.com> <54497DC1.5070506@rice.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Oct 24, 2014 at 12:14 AM, Alan Cox <alc@rice.edu> wrote: > On 10/08/2014 10:38, Svatopluk Kraus wrote: > > On Mon, Sep 29, 2014 at 3:00 AM, Alan Cox <alc@rice.edu> wrote: > > > >> On 09/27/2014 03:51, Svatopluk Kraus wrote: > >> > >> > >> On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox <alan.l.cox@gmail.com> wrote: > >> > >>> > >>> On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus <onwahe@gmail.com> > >>> wrote: > >>> > >>>> Hi, > >>>> > >>>> I and Michal are finishing new ARM pmap-v6 code. There is one problem > >>>> we've > >>>> dealt with somehow, but now we would like to do it better. It's about > >>>> physical pages which are allocated before vm subsystem is initialized. > >>>> While later on these pages could be found in vm_page_array when > >>>> VM_PHYSSEG_DENSE memory model is used, it's not true for > >>>> VM_PHYSSEG_SPARSE > >>>> memory model. And ARM world uses VM_PHYSSEG_SPARSE model. > >>>> > >>>> It really would be nice to utilize vm_page_array for such preallocated > >>>> physical pages even when VM_PHYSSEG_SPARSE memory model is used. > Things > >>>> could be much easier then. In our case, it's about pages which are > used > >>>> for > >>>> level 2 page tables. In VM_PHYSSEG_SPARSE model, we have two sets of > such > >>>> pages. First ones are preallocated and second ones are allocated > after vm > >>>> subsystem was inited. We must deal with each set differently. So code > is > >>>> more complex and so is debugging. > >>>> > >>>> Thus we need some method how to say that some part of physical memory > >>>> should be included in vm_page_array, but the pages from that region > >>>> should > >>>> not be put to free list during initialization. We think that such > >>>> possibility could be utilized in general. There could be a need for > some > >>>> physical space which: > >>>> > >>>> (1) is needed only during boot and later on it can be freed and put > to vm > >>>> subsystem, > >>>> > >>>> (2) is needed for something else and vm_page_array code could be used > >>>> without some kind of its duplication. > >>>> > >>>> There is already some code which deals with blacklisted pages in > >>>> vm_page.c > >>>> file. So the easiest way how to deal with presented situation is to > add > >>>> some callback to this part of code which will be able to either > exclude > >>>> whole phys_avail[i], phys_avail[i+1] region or single pages. As the > >>>> biggest > >>>> phys_avail region is used for vm subsystem allocations, there should > be > >>>> some more coding. (However, blacklisted pages are not dealt with on > that > >>>> part of region.) > >>>> > >>>> We would like to know if there is any objection: > >>>> > >>>> (1) to deal with presented problem, > >>>> (2) to deal with the problem presented way. > >>>> Some help is very appreciated. Thanks > >>>> > >>>> > >>> As an experiment, try modifying vm_phys.c to use dump_avail instead of > >>> phys_avail when sizing vm_page_array. On amd64, where the same problem > >>> exists, this allowed me to use VM_PHYSSEG_SPARSE. Right now, this is > >>> probably my preferred solution. The catch being that not all > architectures > >>> implement dump_avail, but my recollection is that arm does. > >>> > >> Frankly, I would prefer this too, but there is one big open question: > >> > >> What is dump_avail for? > >> > >> > >> > >> dump_avail[] is solving a similar problem in the minidump code, hence, > the > >> prefix "dump_" in its name. In other words, the minidump code couldn't > use > >> phys_avail[] either because it didn't describe the full range of > physical > >> addresses that might be included in a minidump, so dump_avail[] was > created. > >> > >> There is already precedent for what I'm suggesting. dump_avail[] is > >> already (ab)used outside of the minidump code on x86 to solve this same > >> problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c. > >> > >> > >> Using it for vm_page_array initialization and segmentation means that > >> phys_avail must be a subset of it. And this must be stated and be > visible > >> enough. Maybe it should be even checked in code. I like the idea of > >> thinking about dump_avail as something what desribes all memory in a > >> system, but it's not how dump_avail is defined in archs now. > >> > >> > >> > >> When you say "it's not how dump_avail is defined in archs now", I'm not > >> sure whether you're talking about the code or the comments. In terms of > >> code, dump_avail[] is a superset of phys_avail[], and I'm not aware of > any > >> code that would have to change. In terms of comments, I did a grep > looking > >> for comments defining what dump_avail[] is, because I couldn't remember > >> any. I found one ... on arm. So, I don't think it's a onerous task > >> changing the definition of dump_avail[]. :-) > >> > >> Already, as things stand today with dump_avail[] being used outside of > the > >> minidump code, one could reasonably argue that it should be renamed to > >> something like phys_exists[]. > >> > >> > >> > >> I will experiment with it on monday then. However, it's not only about > how > >> memory segments are created in vm_phys.c, but it's about how > vm_page_array > >> size is computed in vm_page.c too. > >> > >> > >> > >> Yes, and there is also a place in vm_reserv.c that needs to change. > I've > >> attached the patch that I developed and tested a long time ago. It > still > >> applies cleanly and runs ok on amd64. > >> > >> > >> > > > > > > Well, I've created and tested minimalistic patch which - I hope - is > > commitable. It runs ok on pandaboard (arm-v6) and solves presented > problem. > > I would really appreciate if this will be commited. Thanks. > > > Sorry for the slow reply. I've just been swamped with work lately. I > finally had some time to look at this in the last day or so. > > The first thing that I propose to do is commit the attached patch. This > patch changes pmap_init() on amd64, armv6, and i386 so that it no longer > consults phys_avail[] to determine the end of memory. Instead, it calls > a new function provided by vm_phys.c to obtain the same information from > vm_phys_segs[]. > > With this change, the new variable phys_managed in your patch wouldn't > need to be a global. It could be a local variable in vm_page_startup() > that we pass as a parameter to vm_phys_init() and vm_reserv_init(). > > More generally, the long-term vision that I have is that we would stop > using phys_avail[] after vm_page_startup() had completed. It would only > be used during initialization. After that we would use vm_phys_segs[] > and functions provided by vm_phys.c. > I understand. The patch and the long-term vision are fine for me. I just was not to bold to pass phys_managed as a parameter to vm_phys_init() and vm_reserv_init(). However, I certainly was thinking about it. While reading comment above vm_phys_get_end(), do we care of if last usable address is 0xFFFFFFFF? Do you think that the rest of my patch considering changes due to your patch is ok? > > > > BTW, while I was inspecting all archs, I think that maybe it's time to do > > what was done for busdma not long ago. There are many similar codes > across > > archs which deal with physical memory and could be generalized and put to > > kern/subr_physmem.c for utilization. All work with physical memory could > be > > simplify to two arrays of regions. > > > > phys_present[] ... describes all present physical memory regions > > phys_exclude[] ... describes various exclusions from phys_present[] > > > > Each excluded region will be labeled by flags to say what kind of > exclusion > > it is. The flags like NODUMP, NOALLOC, NOMANAGE, NOBOUNCE, NOMEMRW could > > be combined. This idea is taken from sys/arm/arm/physmem.c. > > > > All other arrays like phys_managed[], phys_avail[], dump_avail[] will be > > created from these phys_present[] and phys_exclude[]. > > This way bootstrap codes in archs could be simplified and unified. For > > example, dealing with either hw.physmem or page with PA 0x00000000 could > be > > transparent. > > > > I'm prepared to volunteer if the thing is ripe. However, some tutor will > be > > looked for. > > > I've never really looked at arm/arm/physmem.c before. Let me do that > before I comment on this. > > No problem. This could be long-term aim. However, I hope the VM_PHYSSEG_SPARSE problem could be dealt with in MI code in present time. In every case, thanks for your help.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFHCsPVj3PGbkSmkKsd2bGvmh3%2BdZLABi=AR7jQ4qJ8CigE=8Q>