Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 Oct 2014 11:31:50 -0500
From:      Alan Cox <alc@rice.edu>
To:        Svatopluk Kraus <onwahe@gmail.com>
Cc:        alc@freebsd.org, FreeBSD Arch <freebsd-arch@freebsd.org>
Subject:   Re: vm_page_array and VM_PHYSSEG_SPARSE
Message-ID:  <544E7376.6040002@rice.edu>
In-Reply-To: <CAFHCsPV1H6XsOoDFitQFgJOP6u%2BgiEM=N--_7Q9uoWbYnAaeYQ@mail.gmail.com>
References:  <CAFHCsPWkq09_RRDz7fy3UgsRFv8ZbNKdAH2Ft0x6aVSwLPi6BQ@mail.gmail.com>	<CAJUyCcPXBuLu0nvaCqpg8NJ6KzAX9BA1Rt%2BooD%2B3pzq%2BFV%2B%2BTQ@mail.gmail.com>	<CAFHCsPWq9WqeFnx1a%2BStfSxj=jwcE9GPyVsoyh0%2Bazr3HmM6vQ@mail.gmail.com>	<5428AF3B.1030906@rice.edu>	<CAFHCsPWxF0G%2BbqBYgxH=WtV%2BSt_UTWZj%2BY2-PHfoYSLjC_Qpig@mail.gmail.com>	<54497DC1.5070506@rice.edu>	<CAFHCsPVj3PGbkSmkKsd2bGvmh3%2BdZLABi=AR7jQ4qJ8CigE=8Q@mail.gmail.com>	<544DED4C.3010501@rice.edu> <CAFHCsPV1H6XsOoDFitQFgJOP6u%2BgiEM=N--_7Q9uoWbYnAaeYQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 10/27/2014 08:22, Svatopluk Kraus wrote:
>
> On Mon, Oct 27, 2014 at 7:59 AM, Alan Cox <alc@rice.edu
> <mailto:alc@rice.edu>> wrote:
>
>     On 10/24/2014 06:33, Svatopluk Kraus wrote:
>>
>>     On Fri, Oct 24, 2014 at 12:14 AM, Alan Cox <alc@rice.edu
>>     <mailto:alc@rice.edu>> wrote:
>>
>>         On 10/08/2014 10:38, Svatopluk Kraus wrote:
>>         > On Mon, Sep 29, 2014 at 3:00 AM, Alan Cox <alc@rice.edu
>>         <mailto:alc@rice.edu>> wrote:
>>         >
>>         >>   On 09/27/2014 03:51, Svatopluk Kraus wrote:
>>         >>
>>         >>
>>         >> On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox
>>         <alan.l.cox@gmail.com <mailto:alan.l.cox@gmail.com>> wrote:
>>         >>
>>         >>>
>>         >>>  On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus
>>         <onwahe@gmail.com <mailto:onwahe@gmail.com>>
>>         >>> wrote:
>>         >>>
>>         >>>> Hi,
>>         >>>>
>>         >>>> I and Michal are finishing new ARM pmap-v6 code. There
>>         is one problem
>>         >>>> we've
>>         >>>> dealt with somehow, but now we would like to do it
>>         better. It's about
>>         >>>> physical pages which are allocated before vm subsystem
>>         is initialized.
>>         >>>> While later on these pages could be found in
>>         vm_page_array when
>>         >>>> VM_PHYSSEG_DENSE memory model is used, it's not true for
>>         >>>> VM_PHYSSEG_SPARSE
>>         >>>> memory model. And ARM world uses VM_PHYSSEG_SPARSE model.
>>         >>>>
>>         >>>> It really would be nice to utilize vm_page_array for
>>         such preallocated
>>         >>>> physical pages even when VM_PHYSSEG_SPARSE memory model
>>         is used. Things
>>         >>>> could be much easier then. In our case, it's about pages
>>         which are used
>>         >>>> for
>>         >>>> level 2 page tables. In VM_PHYSSEG_SPARSE model, we have
>>         two sets of such
>>         >>>> pages. First ones are preallocated and second ones are
>>         allocated after vm
>>         >>>> subsystem was inited. We must deal with each set
>>         differently. So code is
>>         >>>> more complex and so is debugging.
>>         >>>>
>>         >>>> Thus we need some method how to say that some part of
>>         physical memory
>>         >>>> should be included in vm_page_array, but the pages from
>>         that region
>>         >>>> should
>>         >>>> not be put to free list during initialization. We think
>>         that such
>>         >>>> possibility could be utilized in general. There could be
>>         a need for some
>>         >>>> physical space which:
>>         >>>>
>>         >>>> (1) is needed only during boot and later on it can be
>>         freed and put to vm
>>         >>>> subsystem,
>>         >>>>
>>         >>>> (2) is needed for something else and vm_page_array code
>>         could be used
>>         >>>> without some kind of its duplication.
>>         >>>>
>>         >>>> There is already some code which deals with blacklisted
>>         pages in
>>         >>>> vm_page.c
>>         >>>> file. So the easiest way how to deal with presented
>>         situation is to add
>>         >>>> some callback to this part of code which will be able to
>>         either exclude
>>         >>>> whole phys_avail[i], phys_avail[i+1] region or single
>>         pages. As the
>>         >>>> biggest
>>         >>>> phys_avail region is used for vm subsystem allocations,
>>         there should be
>>         >>>> some more coding. (However, blacklisted pages are not
>>         dealt with on that
>>         >>>> part of region.)
>>         >>>>
>>         >>>> We would like to know if there is any objection:
>>         >>>>
>>         >>>> (1) to deal with presented problem,
>>         >>>> (2) to deal with the problem presented way.
>>         >>>> Some help is very appreciated. Thanks
>>         >>>>
>>         >>>>
>>         >>> As an experiment, try modifying vm_phys.c to use
>>         dump_avail instead of
>>         >>> phys_avail when sizing vm_page_array.  On amd64, where
>>         the same problem
>>         >>> exists, this allowed me to use VM_PHYSSEG_SPARSE.  Right
>>         now, this is
>>         >>> probably my preferred solution.  The catch being that not
>>         all architectures
>>         >>> implement dump_avail, but my recollection is that arm does.
>>         >>>
>>         >> Frankly, I would prefer this too, but there is one big
>>         open question:
>>         >>
>>         >> What is dump_avail for?
>>         >>
>>         >>
>>         >>
>>         >> dump_avail[] is solving a similar problem in the minidump
>>         code, hence, the
>>         >> prefix "dump_" in its name.  In other words, the minidump
>>         code couldn't use
>>         >> phys_avail[] either because it didn't describe the full
>>         range of physical
>>         >> addresses that might be included in a minidump, so
>>         dump_avail[] was created.
>>         >>
>>         >> There is already precedent for what I'm suggesting. 
>>         dump_avail[] is
>>         >> already (ab)used outside of the minidump code on x86 to
>>         solve this same
>>         >> problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c.
>>         >>
>>         >>
>>         >>  Using it for vm_page_array initialization and
>>         segmentation means that
>>         >> phys_avail must be a subset of it. And this must be stated
>>         and be visible
>>         >> enough. Maybe it should be even checked in code. I like
>>         the idea of
>>         >> thinking about dump_avail as something what desribes all
>>         memory in a
>>         >> system, but it's not how dump_avail is defined in archs now.
>>         >>
>>         >>
>>         >>
>>         >> When you say "it's not how dump_avail is defined in archs
>>         now", I'm not
>>         >> sure whether you're talking about the code or the
>>         comments.  In terms of
>>         >> code, dump_avail[] is a superset of phys_avail[], and I'm
>>         not aware of any
>>         >> code that would have to change.  In terms of comments, I
>>         did a grep looking
>>         >> for comments defining what dump_avail[] is, because I
>>         couldn't remember
>>         >> any.  I found one ... on arm.  So, I don't think it's a
>>         onerous task
>>         >> changing the definition of dump_avail[].  :-)
>>         >>
>>         >> Already, as things stand today with dump_avail[] being
>>         used outside of the
>>         >> minidump code, one could reasonably argue that it should
>>         be renamed to
>>         >> something like phys_exists[].
>>         >>
>>         >>
>>         >>
>>         >> I will experiment with it on monday then. However, it's
>>         not only about how
>>         >> memory segments are created in vm_phys.c, but it's about
>>         how vm_page_array
>>         >> size is computed in vm_page.c too.
>>         >>
>>         >>
>>         >>
>>         >> Yes, and there is also a place in vm_reserv.c that needs
>>         to change.   I've
>>         >> attached the patch that I developed and tested a long time
>>         ago.  It still
>>         >> applies cleanly and runs ok on amd64.
>>         >>
>>         >>
>>         >>
>>         >
>>         >
>>         > Well, I've created and tested minimalistic patch which - I
>>         hope - is
>>         > commitable. It runs ok on pandaboard (arm-v6) and solves
>>         presented problem.
>>         > I would really appreciate if this will be commited. Thanks.
>>
>>
>>         Sorry for the slow reply.  I've just been swamped with work
>>         lately.  I
>>         finally had some time to look at this in the last day or so.
>>
>>         The first thing that I propose to do is commit the attached
>>         patch.  This
>>         patch changes pmap_init() on amd64, armv6, and i386 so that
>>         it no longer
>>         consults phys_avail[] to determine the end of memory. 
>>         Instead, it calls
>>         a new function provided by vm_phys.c to obtain the same
>>         information from
>>         vm_phys_segs[].
>>
>>         With this change, the new variable phys_managed in your patch
>>         wouldn't
>>         need to be a global.  It could be a local variable in
>>         vm_page_startup()
>>         that we pass as a parameter to vm_phys_init() and
>>         vm_reserv_init().
>>
>>         More generally, the long-term vision that I have is that we
>>         would stop
>>         using phys_avail[] after vm_page_startup() had completed.  It
>>         would only
>>         be used during initialization.  After that we would use
>>         vm_phys_segs[]
>>         and functions provided by vm_phys.c.
>>
>>      
>>     I understand. The patch and the long-term vision are fine for me.
>>     I just was not to bold to pass phys_managed as a parameter to
>>     vm_phys_init() and vm_reserv_init(). However, I certainly was
>>     thinking about it. While reading comment above vm_phys_get_end(),
>>     do we care of if last usable address is 0xFFFFFFFF?
>
>
>     To date, this hasn't been a problem.  However, handling 0xFFFFFFFF
>     is easy.  So, the final version of the patch that I committed this
>     weekend does so.
>
>     Can you please try the attached patch?  It replaces phys_avail[]
>     with vm_phys_segs[] in arm's busdma.
>
>  
>  
> It works fine on arm-v6 pandaboard. I have no objection to commit it.
> However, it's only 1:1 replacement.


Right now, yes.  However, once your patch is committed, it won't be 1:1
anymore, because vm_phys_segs[] will be populated based on dump_avail[]
rather than phys_avail[].

My interpretation of the affected code is that using the ranges defined
by dump_avail[] is actually closer to what this code intended.


> In fact, I still keep the following pattern in my head:
>  
> present memory in system <=> all RAM and whatsoever
> nobounce memory <=> addressable by DMA


In general, I don't see how this can be an attribute of the memory,
because it's going to depend on the device.  In other words, a given
physical address may require bouncing for some device but not all devices.


> managed memory by vm subsystem  <=> i.e. kept in vm_page_array
> available memory for vm subsystem <=> can be allocated
>  
> So, it's no problem to use phys_avail[], i.e. vm_phys_segs[], but it
> could be too much limiting in some scenarios. I would like to see
> something different in exclusion_bounce_check() in the future.
> Something what reflects NOBOUNCE property and not NOALLOC one like now.
>  
>  
>
>
>
>
>>     Do you think that the rest of my patch considering changes due to
>>     your patch is ok?
>>      
>
>
>     Basically, yes.  I do, however, think that
>
>     +#if defined(__arm__)
>     +       phys_managed = dump_avail;
>     +#else
>     +       phys_managed = phys_avail;
>     +#endif
>
>     should also be conditioned on VM_PHYSSEG_SPARSE.
>
>  
>  
>  
> So I've prepared new patch. phys_managed[] is passed to vm_phys_init()
> and vm_reserv_init() as a parameter and small optimalization is made
> in vm_page_startup(). I add VM_PHYSSEG_SPARSE condition to place you
> mentioned. Anyhow, I still think that this is only temporary hack. In
> general, phys_managed[] should always be distinguished from phys_avail[].
>  
>  
>
>>      
>>
>>         >
>>         > BTW, while I was inspecting all archs, I think that maybe
>>         it's time to do
>>         > what was done for busdma not long ago. There are many
>>         similar codes across
>>         > archs which deal with physical memory and could be
>>         generalized and put to
>>         > kern/subr_physmem.c for utilization. All work with physical
>>         memory could be
>>         > simplify to two arrays of regions.
>>         >
>>         > phys_present[] ... describes all present physical memory
>>         regions
>>         > phys_exclude[] ... describes various exclusions from
>>         phys_present[]
>>         >
>>         > Each excluded region will be labeled by flags to say what
>>         kind of exclusion
>>         > it is. The flags like NODUMP, NOALLOC, NOMANAGE, NOBOUNCE,
>>         NOMEMRW  could
>>         > be combined. This idea is taken from sys/arm/arm/physmem.c.
>>         >
>>         > All other arrays like phys_managed[], phys_avail[],
>>         dump_avail[] will be
>>         > created from these phys_present[] and phys_exclude[].
>>         > This way bootstrap codes in archs could be simplified and
>>         unified. For
>>         > example, dealing with either hw.physmem or page with PA
>>         0x00000000 could be
>>         > transparent.
>>         >
>>         > I'm prepared to volunteer if the thing is ripe. However,
>>         some tutor will be
>>         > looked for.
>>
>>
>>         I've never really looked at arm/arm/physmem.c before.  Let me
>>         do that
>>         before I comment on this.
>>
>>     No problem. This could be long-term aim. However, I hope the
>>     VM_PHYSSEG_SPARSE problem could be dealt with in MI code in
>>     present time. In every case, thanks for your help.
>>      
>>      
>
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?544E7376.6040002>