Date: Mon, 27 Oct 2014 11:31:50 -0500 From: Alan Cox <alc@rice.edu> To: Svatopluk Kraus <onwahe@gmail.com> Cc: alc@freebsd.org, FreeBSD Arch <freebsd-arch@freebsd.org> Subject: Re: vm_page_array and VM_PHYSSEG_SPARSE Message-ID: <544E7376.6040002@rice.edu> In-Reply-To: <CAFHCsPV1H6XsOoDFitQFgJOP6u%2BgiEM=N--_7Q9uoWbYnAaeYQ@mail.gmail.com> References: <CAFHCsPWkq09_RRDz7fy3UgsRFv8ZbNKdAH2Ft0x6aVSwLPi6BQ@mail.gmail.com> <CAJUyCcPXBuLu0nvaCqpg8NJ6KzAX9BA1Rt%2BooD%2B3pzq%2BFV%2B%2BTQ@mail.gmail.com> <CAFHCsPWq9WqeFnx1a%2BStfSxj=jwcE9GPyVsoyh0%2Bazr3HmM6vQ@mail.gmail.com> <5428AF3B.1030906@rice.edu> <CAFHCsPWxF0G%2BbqBYgxH=WtV%2BSt_UTWZj%2BY2-PHfoYSLjC_Qpig@mail.gmail.com> <54497DC1.5070506@rice.edu> <CAFHCsPVj3PGbkSmkKsd2bGvmh3%2BdZLABi=AR7jQ4qJ8CigE=8Q@mail.gmail.com> <544DED4C.3010501@rice.edu> <CAFHCsPV1H6XsOoDFitQFgJOP6u%2BgiEM=N--_7Q9uoWbYnAaeYQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 10/27/2014 08:22, Svatopluk Kraus wrote: > > On Mon, Oct 27, 2014 at 7:59 AM, Alan Cox <alc@rice.edu > <mailto:alc@rice.edu>> wrote: > > On 10/24/2014 06:33, Svatopluk Kraus wrote: >> >> On Fri, Oct 24, 2014 at 12:14 AM, Alan Cox <alc@rice.edu >> <mailto:alc@rice.edu>> wrote: >> >> On 10/08/2014 10:38, Svatopluk Kraus wrote: >> > On Mon, Sep 29, 2014 at 3:00 AM, Alan Cox <alc@rice.edu >> <mailto:alc@rice.edu>> wrote: >> > >> >> On 09/27/2014 03:51, Svatopluk Kraus wrote: >> >> >> >> >> >> On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox >> <alan.l.cox@gmail.com <mailto:alan.l.cox@gmail.com>> wrote: >> >> >> >>> >> >>> On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus >> <onwahe@gmail.com <mailto:onwahe@gmail.com>> >> >>> wrote: >> >>> >> >>>> Hi, >> >>>> >> >>>> I and Michal are finishing new ARM pmap-v6 code. There >> is one problem >> >>>> we've >> >>>> dealt with somehow, but now we would like to do it >> better. It's about >> >>>> physical pages which are allocated before vm subsystem >> is initialized. >> >>>> While later on these pages could be found in >> vm_page_array when >> >>>> VM_PHYSSEG_DENSE memory model is used, it's not true for >> >>>> VM_PHYSSEG_SPARSE >> >>>> memory model. And ARM world uses VM_PHYSSEG_SPARSE model. >> >>>> >> >>>> It really would be nice to utilize vm_page_array for >> such preallocated >> >>>> physical pages even when VM_PHYSSEG_SPARSE memory model >> is used. Things >> >>>> could be much easier then. In our case, it's about pages >> which are used >> >>>> for >> >>>> level 2 page tables. In VM_PHYSSEG_SPARSE model, we have >> two sets of such >> >>>> pages. First ones are preallocated and second ones are >> allocated after vm >> >>>> subsystem was inited. We must deal with each set >> differently. So code is >> >>>> more complex and so is debugging. >> >>>> >> >>>> Thus we need some method how to say that some part of >> physical memory >> >>>> should be included in vm_page_array, but the pages from >> that region >> >>>> should >> >>>> not be put to free list during initialization. We think >> that such >> >>>> possibility could be utilized in general. There could be >> a need for some >> >>>> physical space which: >> >>>> >> >>>> (1) is needed only during boot and later on it can be >> freed and put to vm >> >>>> subsystem, >> >>>> >> >>>> (2) is needed for something else and vm_page_array code >> could be used >> >>>> without some kind of its duplication. >> >>>> >> >>>> There is already some code which deals with blacklisted >> pages in >> >>>> vm_page.c >> >>>> file. So the easiest way how to deal with presented >> situation is to add >> >>>> some callback to this part of code which will be able to >> either exclude >> >>>> whole phys_avail[i], phys_avail[i+1] region or single >> pages. As the >> >>>> biggest >> >>>> phys_avail region is used for vm subsystem allocations, >> there should be >> >>>> some more coding. (However, blacklisted pages are not >> dealt with on that >> >>>> part of region.) >> >>>> >> >>>> We would like to know if there is any objection: >> >>>> >> >>>> (1) to deal with presented problem, >> >>>> (2) to deal with the problem presented way. >> >>>> Some help is very appreciated. Thanks >> >>>> >> >>>> >> >>> As an experiment, try modifying vm_phys.c to use >> dump_avail instead of >> >>> phys_avail when sizing vm_page_array. On amd64, where >> the same problem >> >>> exists, this allowed me to use VM_PHYSSEG_SPARSE. Right >> now, this is >> >>> probably my preferred solution. The catch being that not >> all architectures >> >>> implement dump_avail, but my recollection is that arm does. >> >>> >> >> Frankly, I would prefer this too, but there is one big >> open question: >> >> >> >> What is dump_avail for? >> >> >> >> >> >> >> >> dump_avail[] is solving a similar problem in the minidump >> code, hence, the >> >> prefix "dump_" in its name. In other words, the minidump >> code couldn't use >> >> phys_avail[] either because it didn't describe the full >> range of physical >> >> addresses that might be included in a minidump, so >> dump_avail[] was created. >> >> >> >> There is already precedent for what I'm suggesting. >> dump_avail[] is >> >> already (ab)used outside of the minidump code on x86 to >> solve this same >> >> problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c. >> >> >> >> >> >> Using it for vm_page_array initialization and >> segmentation means that >> >> phys_avail must be a subset of it. And this must be stated >> and be visible >> >> enough. Maybe it should be even checked in code. I like >> the idea of >> >> thinking about dump_avail as something what desribes all >> memory in a >> >> system, but it's not how dump_avail is defined in archs now. >> >> >> >> >> >> >> >> When you say "it's not how dump_avail is defined in archs >> now", I'm not >> >> sure whether you're talking about the code or the >> comments. In terms of >> >> code, dump_avail[] is a superset of phys_avail[], and I'm >> not aware of any >> >> code that would have to change. In terms of comments, I >> did a grep looking >> >> for comments defining what dump_avail[] is, because I >> couldn't remember >> >> any. I found one ... on arm. So, I don't think it's a >> onerous task >> >> changing the definition of dump_avail[]. :-) >> >> >> >> Already, as things stand today with dump_avail[] being >> used outside of the >> >> minidump code, one could reasonably argue that it should >> be renamed to >> >> something like phys_exists[]. >> >> >> >> >> >> >> >> I will experiment with it on monday then. However, it's >> not only about how >> >> memory segments are created in vm_phys.c, but it's about >> how vm_page_array >> >> size is computed in vm_page.c too. >> >> >> >> >> >> >> >> Yes, and there is also a place in vm_reserv.c that needs >> to change. I've >> >> attached the patch that I developed and tested a long time >> ago. It still >> >> applies cleanly and runs ok on amd64. >> >> >> >> >> >> >> > >> > >> > Well, I've created and tested minimalistic patch which - I >> hope - is >> > commitable. It runs ok on pandaboard (arm-v6) and solves >> presented problem. >> > I would really appreciate if this will be commited. Thanks. >> >> >> Sorry for the slow reply. I've just been swamped with work >> lately. I >> finally had some time to look at this in the last day or so. >> >> The first thing that I propose to do is commit the attached >> patch. This >> patch changes pmap_init() on amd64, armv6, and i386 so that >> it no longer >> consults phys_avail[] to determine the end of memory. >> Instead, it calls >> a new function provided by vm_phys.c to obtain the same >> information from >> vm_phys_segs[]. >> >> With this change, the new variable phys_managed in your patch >> wouldn't >> need to be a global. It could be a local variable in >> vm_page_startup() >> that we pass as a parameter to vm_phys_init() and >> vm_reserv_init(). >> >> More generally, the long-term vision that I have is that we >> would stop >> using phys_avail[] after vm_page_startup() had completed. It >> would only >> be used during initialization. After that we would use >> vm_phys_segs[] >> and functions provided by vm_phys.c. >> >> >> I understand. The patch and the long-term vision are fine for me. >> I just was not to bold to pass phys_managed as a parameter to >> vm_phys_init() and vm_reserv_init(). However, I certainly was >> thinking about it. While reading comment above vm_phys_get_end(), >> do we care of if last usable address is 0xFFFFFFFF? > > > To date, this hasn't been a problem. However, handling 0xFFFFFFFF > is easy. So, the final version of the patch that I committed this > weekend does so. > > Can you please try the attached patch? It replaces phys_avail[] > with vm_phys_segs[] in arm's busdma. > > > > It works fine on arm-v6 pandaboard. I have no objection to commit it. > However, it's only 1:1 replacement. Right now, yes. However, once your patch is committed, it won't be 1:1 anymore, because vm_phys_segs[] will be populated based on dump_avail[] rather than phys_avail[]. My interpretation of the affected code is that using the ranges defined by dump_avail[] is actually closer to what this code intended. > In fact, I still keep the following pattern in my head: > > present memory in system <=> all RAM and whatsoever > nobounce memory <=> addressable by DMA In general, I don't see how this can be an attribute of the memory, because it's going to depend on the device. In other words, a given physical address may require bouncing for some device but not all devices. > managed memory by vm subsystem <=> i.e. kept in vm_page_array > available memory for vm subsystem <=> can be allocated > > So, it's no problem to use phys_avail[], i.e. vm_phys_segs[], but it > could be too much limiting in some scenarios. I would like to see > something different in exclusion_bounce_check() in the future. > Something what reflects NOBOUNCE property and not NOALLOC one like now. > > > > > > >> Do you think that the rest of my patch considering changes due to >> your patch is ok? >> > > > Basically, yes. I do, however, think that > > +#if defined(__arm__) > + phys_managed = dump_avail; > +#else > + phys_managed = phys_avail; > +#endif > > should also be conditioned on VM_PHYSSEG_SPARSE. > > > > > So I've prepared new patch. phys_managed[] is passed to vm_phys_init() > and vm_reserv_init() as a parameter and small optimalization is made > in vm_page_startup(). I add VM_PHYSSEG_SPARSE condition to place you > mentioned. Anyhow, I still think that this is only temporary hack. In > general, phys_managed[] should always be distinguished from phys_avail[]. > > > >> >> >> > >> > BTW, while I was inspecting all archs, I think that maybe >> it's time to do >> > what was done for busdma not long ago. There are many >> similar codes across >> > archs which deal with physical memory and could be >> generalized and put to >> > kern/subr_physmem.c for utilization. All work with physical >> memory could be >> > simplify to two arrays of regions. >> > >> > phys_present[] ... describes all present physical memory >> regions >> > phys_exclude[] ... describes various exclusions from >> phys_present[] >> > >> > Each excluded region will be labeled by flags to say what >> kind of exclusion >> > it is. The flags like NODUMP, NOALLOC, NOMANAGE, NOBOUNCE, >> NOMEMRW could >> > be combined. This idea is taken from sys/arm/arm/physmem.c. >> > >> > All other arrays like phys_managed[], phys_avail[], >> dump_avail[] will be >> > created from these phys_present[] and phys_exclude[]. >> > This way bootstrap codes in archs could be simplified and >> unified. For >> > example, dealing with either hw.physmem or page with PA >> 0x00000000 could be >> > transparent. >> > >> > I'm prepared to volunteer if the thing is ripe. However, >> some tutor will be >> > looked for. >> >> >> I've never really looked at arm/arm/physmem.c before. Let me >> do that >> before I comment on this. >> >> No problem. This could be long-term aim. However, I hope the >> VM_PHYSSEG_SPARSE problem could be dealt with in MI code in >> present time. In every case, thanks for your help. >> >> > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?544E7376.6040002>