Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 Oct 2014 01:59:24 -0500
From:      Alan Cox <alc@rice.edu>
To:        Svatopluk Kraus <onwahe@gmail.com>
Cc:        alc@freebsd.org, FreeBSD Arch <freebsd-arch@freebsd.org>
Subject:   Re: vm_page_array and VM_PHYSSEG_SPARSE
Message-ID:  <544DED4C.3010501@rice.edu>
In-Reply-To: <CAFHCsPVj3PGbkSmkKsd2bGvmh3%2BdZLABi=AR7jQ4qJ8CigE=8Q@mail.gmail.com>
References:  <CAFHCsPWkq09_RRDz7fy3UgsRFv8ZbNKdAH2Ft0x6aVSwLPi6BQ@mail.gmail.com>	<CAJUyCcPXBuLu0nvaCqpg8NJ6KzAX9BA1Rt%2BooD%2B3pzq%2BFV%2B%2BTQ@mail.gmail.com>	<CAFHCsPWq9WqeFnx1a%2BStfSxj=jwcE9GPyVsoyh0%2Bazr3HmM6vQ@mail.gmail.com>	<5428AF3B.1030906@rice.edu>	<CAFHCsPWxF0G%2BbqBYgxH=WtV%2BSt_UTWZj%2BY2-PHfoYSLjC_Qpig@mail.gmail.com>	<54497DC1.5070506@rice.edu> <CAFHCsPVj3PGbkSmkKsd2bGvmh3%2BdZLABi=AR7jQ4qJ8CigE=8Q@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------090909070100060609070401
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit

On 10/24/2014 06:33, Svatopluk Kraus wrote:
>
> On Fri, Oct 24, 2014 at 12:14 AM, Alan Cox <alc@rice.edu
> <mailto:alc@rice.edu>> wrote:
>
>     On 10/08/2014 10:38, Svatopluk Kraus wrote:
>     > On Mon, Sep 29, 2014 at 3:00 AM, Alan Cox <alc@rice.edu
>     <mailto:alc@rice.edu>> wrote:
>     >
>     >>   On 09/27/2014 03:51, Svatopluk Kraus wrote:
>     >>
>     >>
>     >> On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox <alan.l.cox@gmail.com
>     <mailto:alan.l.cox@gmail.com>> wrote:
>     >>
>     >>>
>     >>>  On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus
>     <onwahe@gmail.com <mailto:onwahe@gmail.com>>
>     >>> wrote:
>     >>>
>     >>>> Hi,
>     >>>>
>     >>>> I and Michal are finishing new ARM pmap-v6 code. There is one
>     problem
>     >>>> we've
>     >>>> dealt with somehow, but now we would like to do it better.
>     It's about
>     >>>> physical pages which are allocated before vm subsystem is
>     initialized.
>     >>>> While later on these pages could be found in vm_page_array when
>     >>>> VM_PHYSSEG_DENSE memory model is used, it's not true for
>     >>>> VM_PHYSSEG_SPARSE
>     >>>> memory model. And ARM world uses VM_PHYSSEG_SPARSE model.
>     >>>>
>     >>>> It really would be nice to utilize vm_page_array for such
>     preallocated
>     >>>> physical pages even when VM_PHYSSEG_SPARSE memory model is
>     used. Things
>     >>>> could be much easier then. In our case, it's about pages
>     which are used
>     >>>> for
>     >>>> level 2 page tables. In VM_PHYSSEG_SPARSE model, we have two
>     sets of such
>     >>>> pages. First ones are preallocated and second ones are
>     allocated after vm
>     >>>> subsystem was inited. We must deal with each set differently.
>     So code is
>     >>>> more complex and so is debugging.
>     >>>>
>     >>>> Thus we need some method how to say that some part of
>     physical memory
>     >>>> should be included in vm_page_array, but the pages from that
>     region
>     >>>> should
>     >>>> not be put to free list during initialization. We think that such
>     >>>> possibility could be utilized in general. There could be a
>     need for some
>     >>>> physical space which:
>     >>>>
>     >>>> (1) is needed only during boot and later on it can be freed
>     and put to vm
>     >>>> subsystem,
>     >>>>
>     >>>> (2) is needed for something else and vm_page_array code could
>     be used
>     >>>> without some kind of its duplication.
>     >>>>
>     >>>> There is already some code which deals with blacklisted pages in
>     >>>> vm_page.c
>     >>>> file. So the easiest way how to deal with presented situation
>     is to add
>     >>>> some callback to this part of code which will be able to
>     either exclude
>     >>>> whole phys_avail[i], phys_avail[i+1] region or single pages.
>     As the
>     >>>> biggest
>     >>>> phys_avail region is used for vm subsystem allocations, there
>     should be
>     >>>> some more coding. (However, blacklisted pages are not dealt
>     with on that
>     >>>> part of region.)
>     >>>>
>     >>>> We would like to know if there is any objection:
>     >>>>
>     >>>> (1) to deal with presented problem,
>     >>>> (2) to deal with the problem presented way.
>     >>>> Some help is very appreciated. Thanks
>     >>>>
>     >>>>
>     >>> As an experiment, try modifying vm_phys.c to use dump_avail
>     instead of
>     >>> phys_avail when sizing vm_page_array.  On amd64, where the
>     same problem
>     >>> exists, this allowed me to use VM_PHYSSEG_SPARSE.  Right now,
>     this is
>     >>> probably my preferred solution.  The catch being that not all
>     architectures
>     >>> implement dump_avail, but my recollection is that arm does.
>     >>>
>     >> Frankly, I would prefer this too, but there is one big open
>     question:
>     >>
>     >> What is dump_avail for?
>     >>
>     >>
>     >>
>     >> dump_avail[] is solving a similar problem in the minidump code,
>     hence, the
>     >> prefix "dump_" in its name.  In other words, the minidump code
>     couldn't use
>     >> phys_avail[] either because it didn't describe the full range
>     of physical
>     >> addresses that might be included in a minidump, so dump_avail[]
>     was created.
>     >>
>     >> There is already precedent for what I'm suggesting. 
>     dump_avail[] is
>     >> already (ab)used outside of the minidump code on x86 to solve
>     this same
>     >> problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c.
>     >>
>     >>
>     >>  Using it for vm_page_array initialization and segmentation
>     means that
>     >> phys_avail must be a subset of it. And this must be stated and
>     be visible
>     >> enough. Maybe it should be even checked in code. I like the idea of
>     >> thinking about dump_avail as something what desribes all memory
>     in a
>     >> system, but it's not how dump_avail is defined in archs now.
>     >>
>     >>
>     >>
>     >> When you say "it's not how dump_avail is defined in archs now",
>     I'm not
>     >> sure whether you're talking about the code or the comments.  In
>     terms of
>     >> code, dump_avail[] is a superset of phys_avail[], and I'm not
>     aware of any
>     >> code that would have to change.  In terms of comments, I did a
>     grep looking
>     >> for comments defining what dump_avail[] is, because I couldn't
>     remember
>     >> any.  I found one ... on arm.  So, I don't think it's a onerous
>     task
>     >> changing the definition of dump_avail[].  :-)
>     >>
>     >> Already, as things stand today with dump_avail[] being used
>     outside of the
>     >> minidump code, one could reasonably argue that it should be
>     renamed to
>     >> something like phys_exists[].
>     >>
>     >>
>     >>
>     >> I will experiment with it on monday then. However, it's not
>     only about how
>     >> memory segments are created in vm_phys.c, but it's about how
>     vm_page_array
>     >> size is computed in vm_page.c too.
>     >>
>     >>
>     >>
>     >> Yes, and there is also a place in vm_reserv.c that needs to
>     change.   I've
>     >> attached the patch that I developed and tested a long time
>     ago.  It still
>     >> applies cleanly and runs ok on amd64.
>     >>
>     >>
>     >>
>     >
>     >
>     > Well, I've created and tested minimalistic patch which - I hope - is
>     > commitable. It runs ok on pandaboard (arm-v6) and solves
>     presented problem.
>     > I would really appreciate if this will be commited. Thanks.
>
>
>     Sorry for the slow reply.  I've just been swamped with work lately.  I
>     finally had some time to look at this in the last day or so.
>
>     The first thing that I propose to do is commit the attached
>     patch.  This
>     patch changes pmap_init() on amd64, armv6, and i386 so that it no
>     longer
>     consults phys_avail[] to determine the end of memory.  Instead, it
>     calls
>     a new function provided by vm_phys.c to obtain the same
>     information from
>     vm_phys_segs[].
>
>     With this change, the new variable phys_managed in your patch wouldn't
>     need to be a global.  It could be a local variable in
>     vm_page_startup()
>     that we pass as a parameter to vm_phys_init() and vm_reserv_init().
>
>     More generally, the long-term vision that I have is that we would stop
>     using phys_avail[] after vm_page_startup() had completed.  It
>     would only
>     be used during initialization.  After that we would use vm_phys_segs[]
>     and functions provided by vm_phys.c.
>
>  
> I understand. The patch and the long-term vision are fine for me. I
> just was not to bold to pass phys_managed as a parameter to
> vm_phys_init() and vm_reserv_init(). However, I certainly was thinking
> about it. While reading comment above vm_phys_get_end(), do we care of
> if last usable address is 0xFFFFFFFF?


To date, this hasn't been a problem.  However, handling 0xFFFFFFFF is
easy.  So, the final version of the patch that I committed this weekend
does so.

Can you please try the attached patch?  It replaces phys_avail[] with
vm_phys_segs[] in arm's busdma.


> Do you think that the rest of my patch considering changes due to your
> patch is ok?
>  


Basically, yes.  I do, however, think that

+#if defined(__arm__)
+       phys_managed = dump_avail;
+#else
+       phys_managed = phys_avail;
+#endif

should also be conditioned on VM_PHYSSEG_SPARSE.


>  
>
>     >
>     > BTW, while I was inspecting all archs, I think that maybe it's
>     time to do
>     > what was done for busdma not long ago. There are many similar
>     codes across
>     > archs which deal with physical memory and could be generalized
>     and put to
>     > kern/subr_physmem.c for utilization. All work with physical
>     memory could be
>     > simplify to two arrays of regions.
>     >
>     > phys_present[] ... describes all present physical memory regions
>     > phys_exclude[] ... describes various exclusions from phys_present[]
>     >
>     > Each excluded region will be labeled by flags to say what kind
>     of exclusion
>     > it is. The flags like NODUMP, NOALLOC, NOMANAGE, NOBOUNCE,
>     NOMEMRW  could
>     > be combined. This idea is taken from sys/arm/arm/physmem.c.
>     >
>     > All other arrays like phys_managed[], phys_avail[], dump_avail[]
>     will be
>     > created from these phys_present[] and phys_exclude[].
>     > This way bootstrap codes in archs could be simplified and
>     unified. For
>     > example, dealing with either hw.physmem or page with PA
>     0x00000000 could be
>     > transparent.
>     >
>     > I'm prepared to volunteer if the thing is ripe. However, some
>     tutor will be
>     > looked for.
>
>
>     I've never really looked at arm/arm/physmem.c before.  Let me do that
>     before I comment on this.
>
> No problem. This could be long-term aim. However, I hope the
> VM_PHYSSEG_SPARSE problem could be dealt with in MI code in present
> time. In every case, thanks for your help.
>  
>  


--------------090909070100060609070401
Content-Type: text/plain; charset=ISO-8859-15;
 name="busdma_arm1.patch"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="busdma_arm1.patch"

SW5kZXg6IGFybS9hcm0vYnVzZG1hX21hY2hkZXAtdjYuYwo9PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSBh
cm0vYXJtL2J1c2RtYV9tYWNoZGVwLXY2LmMJKHJldmlzaW9uIDI3MzY5OSkKKysrIGFybS9h
cm0vYnVzZG1hX21hY2hkZXAtdjYuYwkod29ya2luZyBjb3B5KQpAQCAtNTQsOCArNTQsOSBA
QCBfX0ZCU0RJRCgiJEZyZWVCU0QkIik7CiAjaW5jbHVkZSA8c3lzL3Vpby5oPgogCiAjaW5j
bHVkZSA8dm0vdm0uaD4KKyNpbmNsdWRlIDx2bS92bV9wYXJhbS5oPgogI2luY2x1ZGUgPHZt
L3ZtX3BhZ2UuaD4KLSNpbmNsdWRlIDx2bS92bV9tYXAuaD4KKyNpbmNsdWRlIDx2bS92bV9w
aHlzLmg+CiAjaW5jbHVkZSA8dm0vdm1fZXh0ZXJuLmg+CiAjaW5jbHVkZSA8dm0vdm1fa2Vy
bi5oPgogCkBAIC0yNzcsMTYgKzI3OCwxOCBAQCBTWVNJTklUKGJ1c2RtYSwgU0lfU1VCX0tN
RU0rMSwgU0lfT1JERVJfRklSU1QsIGJ1cwogICogZXhwcmVzcywgc28gd2UgdGFrZSBhIGZh
c3Qgb3V0LgogICovCiBzdGF0aWMgaW50Ci1leGNsdXNpb25fYm91bmNlX2NoZWNrKHZtX29m
ZnNldF90IGxvd2FkZHIsIHZtX29mZnNldF90IGhpZ2hhZGRyKQorZXhjbHVzaW9uX2JvdW5j
ZV9jaGVjayh2bV9wYWRkcl90IGxvd2FkZHIsIHZtX3BhZGRyX3QgaGlnaGFkZHIpCiB7CisJ
c3RydWN0IHZtX3BoeXNfc2VnICpzZWc7CiAJaW50IGk7CiAKIAlpZiAobG93YWRkciA+PSBC
VVNfU1BBQ0VfTUFYQUREUikKIAkJcmV0dXJuICgwKTsKIAotCWZvciAoaSA9IDA7IHBoeXNf
YXZhaWxbaV0gJiYgcGh5c19hdmFpbFtpICsgMV07IGkgKz0gMikgewotCQlpZiAoKGxvd2Fk
ZHIgPj0gcGh5c19hdmFpbFtpXSAmJiBsb3dhZGRyIDwgcGh5c19hdmFpbFtpICsgMV0pIHx8
Ci0JCSAgICAobG93YWRkciA8IHBoeXNfYXZhaWxbaV0gJiYgaGlnaGFkZHIgPj0gcGh5c19h
dmFpbFtpXSkpCisJZm9yIChpID0gMDsgaSA8IHZtX3BoeXNfbnNlZ3M7IGkrKykgeworCQlz
ZWcgPSAmdm1fcGh5c19zZWdzW2ldOworCQlpZiAoKGxvd2FkZHIgPj0gc2VnLT5zdGFydCAm
JiBsb3dhZGRyIDwgc2VnLT5lbmQpIHx8CisJCSAgICAobG93YWRkciA8IHNlZy0+c3RhcnQg
JiYgaGlnaGFkZHIgPj0gc2VnLT5zdGFydCkpCiAJCQlyZXR1cm4gKDEpOwogCX0KIAlyZXR1
cm4gKDApOwpJbmRleDogYXJtL2FybS9idXNkbWFfbWFjaGRlcC5jCj09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0K
LS0tIGFybS9hcm0vYnVzZG1hX21hY2hkZXAuYwkocmV2aXNpb24gMjczNjk5KQorKysgYXJt
L2FybS9idXNkbWFfbWFjaGRlcC5jCSh3b3JraW5nIGNvcHkpCkBAIC03MCwxMCArNzAsMTEg
QEAgX19GQlNESUQoIiRGcmVlQlNEJCIpOwogCiAjaW5jbHVkZSA8dm0vdW1hLmg+CiAjaW5j
bHVkZSA8dm0vdm0uaD4KKyNpbmNsdWRlIDx2bS92bV9wYXJhbS5oPgogI2luY2x1ZGUgPHZt
L3ZtX2V4dGVybi5oPgogI2luY2x1ZGUgPHZtL3ZtX2tlcm4uaD4KICNpbmNsdWRlIDx2bS92
bV9wYWdlLmg+Ci0jaW5jbHVkZSA8dm0vdm1fbWFwLmg+CisjaW5jbHVkZSA8dm0vdm1fcGh5
cy5oPgogCiAjaW5jbHVkZSA8bWFjaGluZS9hdG9taWMuaD4KICNpbmNsdWRlIDxtYWNoaW5l
L2J1cy5oPgpAQCAtMzI2LDE3ICszMjcsMTkgQEAgcnVuX2ZpbHRlcihidXNfZG1hX3RhZ190
IGRtYXQsIGJ1c19hZGRyX3QgcGFkZHIpCiAgKiBleHByZXNzLCBzbyB3ZSB0YWtlIGEgZmFz
dCBvdXQuCiAgKi8KIHN0YXRpYyBfX2lubGluZSBpbnQKLV9idXNfZG1hX2Nhbl9ib3VuY2Uo
dm1fb2Zmc2V0X3QgbG93YWRkciwgdm1fb2Zmc2V0X3QgaGlnaGFkZHIpCitfYnVzX2RtYV9j
YW5fYm91bmNlKHZtX3BhZGRyX3QgbG93YWRkciwgdm1fcGFkZHJfdCBoaWdoYWRkcikKIHsK
KwlzdHJ1Y3Qgdm1fcGh5c19zZWcgKnNlZzsKIAlpbnQgaTsKIAogCWlmIChsb3dhZGRyID49
IEJVU19TUEFDRV9NQVhBRERSKQogCQlyZXR1cm4gKDApOwogCi0JZm9yIChpID0gMDsgcGh5
c19hdmFpbFtpXSAmJiBwaHlzX2F2YWlsW2kgKyAxXTsgaSArPSAyKSB7Ci0JCWlmICgobG93
YWRkciA+PSBwaHlzX2F2YWlsW2ldICYmIGxvd2FkZHIgPD0gcGh5c19hdmFpbFtpICsgMV0p
Ci0JCSAgICB8fCAobG93YWRkciA8IHBoeXNfYXZhaWxbaV0gJiYKLQkJICAgIGhpZ2hhZGRy
ID4gcGh5c19hdmFpbFtpXSkpCisJZm9yIChpID0gMDsgaSA8IHZtX3BoeXNfbnNlZ3M7IGkr
KykgeworCQlzZWcgPSAmdm1fcGh5c19zZWdzW2ldOworCQlpZiAoKGxvd2FkZHIgPj0gc2Vn
LT5zdGFydCAmJiBsb3dhZGRyIDw9IHNlZy0+ZW5kKQorCQkgICAgfHwgKGxvd2FkZHIgPCBz
ZWctPnN0YXJ0ICYmCisJCSAgICBoaWdoYWRkciA+IHNlZy0+c3RhcnQpKQogCQkJcmV0dXJu
ICgxKTsKIAl9CiAJcmV0dXJuICgwKTsK
--------------090909070100060609070401--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?544DED4C.3010501>