From owner-freebsd-arch@FreeBSD.ORG Fri Oct 24 11:33:17 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DA091EA8; Fri, 24 Oct 2014 11:33:17 +0000 (UTC) Received: from mail-yh0-x22c.google.com (mail-yh0-x22c.google.com [IPv6:2607:f8b0:4002:c01::22c]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8A5D93D2; Fri, 24 Oct 2014 11:33:17 +0000 (UTC) Received: by mail-yh0-f44.google.com with SMTP id i57so406783yha.31 for ; Fri, 24 Oct 2014 04:33:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=2MaDnmzdq1JKr1oAl2BoE9Avux+lNTyaJx40wtam2ds=; b=zKEvLM7v2hh3Ismqe16ybfNAjS1p0KZUuVHwiwrTacMo8/IYzg2D8zaJQSWIxpDDJY E0QwZKZIAJFhMrgajq7qvRW6CimHPhpiZQmx9uYWZcstngri/M66flXjTgceMgrKVDQ/ +dFZ84Pcuv5BD5vUvHYeverZ/WkC8i23esEyauYRDkvujDTcIQphqSz9Mm+8Dc5AUm2m ZYGFbbRZEZRBo8+95ATmiJuR442sZT+SpkE9CvG1DMHhXhmgb7T2zo0Gv3CdTCn6CqlW nv7qbXanJZjKKi6gPaS38UBTcStZV1Yd078cJIBt7v5JGulcTS39EkHxQj/gFvTSHtow rJMQ== MIME-Version: 1.0 X-Received: by 10.170.205.129 with SMTP id w123mr5886180yke.0.1414150395114; Fri, 24 Oct 2014 04:33:15 -0700 (PDT) Received: by 10.140.23.242 with HTTP; Fri, 24 Oct 2014 04:33:15 -0700 (PDT) In-Reply-To: <54497DC1.5070506@rice.edu> References: <5428AF3B.1030906@rice.edu> <54497DC1.5070506@rice.edu> Date: Fri, 24 Oct 2014 13:33:15 +0200 Message-ID: Subject: Re: vm_page_array and VM_PHYSSEG_SPARSE From: Svatopluk Kraus To: Alan Cox Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: alc@freebsd.org, FreeBSD Arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Oct 2014 11:33:17 -0000 On Fri, Oct 24, 2014 at 12:14 AM, Alan Cox wrote: > On 10/08/2014 10:38, Svatopluk Kraus wrote: > > On Mon, Sep 29, 2014 at 3:00 AM, Alan Cox wrote: > > > >> On 09/27/2014 03:51, Svatopluk Kraus wrote: > >> > >> > >> On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox wrote: > >> > >>> > >>> On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus > >>> wrote: > >>> > >>>> Hi, > >>>> > >>>> I and Michal are finishing new ARM pmap-v6 code. There is one problem > >>>> we've > >>>> dealt with somehow, but now we would like to do it better. It's about > >>>> physical pages which are allocated before vm subsystem is initialized. > >>>> While later on these pages could be found in vm_page_array when > >>>> VM_PHYSSEG_DENSE memory model is used, it's not true for > >>>> VM_PHYSSEG_SPARSE > >>>> memory model. And ARM world uses VM_PHYSSEG_SPARSE model. > >>>> > >>>> It really would be nice to utilize vm_page_array for such preallocated > >>>> physical pages even when VM_PHYSSEG_SPARSE memory model is used. > Things > >>>> could be much easier then. In our case, it's about pages which are > used > >>>> for > >>>> level 2 page tables. In VM_PHYSSEG_SPARSE model, we have two sets of > such > >>>> pages. First ones are preallocated and second ones are allocated > after vm > >>>> subsystem was inited. We must deal with each set differently. So code > is > >>>> more complex and so is debugging. > >>>> > >>>> Thus we need some method how to say that some part of physical memory > >>>> should be included in vm_page_array, but the pages from that region > >>>> should > >>>> not be put to free list during initialization. We think that such > >>>> possibility could be utilized in general. There could be a need for > some > >>>> physical space which: > >>>> > >>>> (1) is needed only during boot and later on it can be freed and put > to vm > >>>> subsystem, > >>>> > >>>> (2) is needed for something else and vm_page_array code could be used > >>>> without some kind of its duplication. > >>>> > >>>> There is already some code which deals with blacklisted pages in > >>>> vm_page.c > >>>> file. So the easiest way how to deal with presented situation is to > add > >>>> some callback to this part of code which will be able to either > exclude > >>>> whole phys_avail[i], phys_avail[i+1] region or single pages. As the > >>>> biggest > >>>> phys_avail region is used for vm subsystem allocations, there should > be > >>>> some more coding. (However, blacklisted pages are not dealt with on > that > >>>> part of region.) > >>>> > >>>> We would like to know if there is any objection: > >>>> > >>>> (1) to deal with presented problem, > >>>> (2) to deal with the problem presented way. > >>>> Some help is very appreciated. Thanks > >>>> > >>>> > >>> As an experiment, try modifying vm_phys.c to use dump_avail instead of > >>> phys_avail when sizing vm_page_array. On amd64, where the same problem > >>> exists, this allowed me to use VM_PHYSSEG_SPARSE. Right now, this is > >>> probably my preferred solution. The catch being that not all > architectures > >>> implement dump_avail, but my recollection is that arm does. > >>> > >> Frankly, I would prefer this too, but there is one big open question: > >> > >> What is dump_avail for? > >> > >> > >> > >> dump_avail[] is solving a similar problem in the minidump code, hence, > the > >> prefix "dump_" in its name. In other words, the minidump code couldn't > use > >> phys_avail[] either because it didn't describe the full range of > physical > >> addresses that might be included in a minidump, so dump_avail[] was > created. > >> > >> There is already precedent for what I'm suggesting. dump_avail[] is > >> already (ab)used outside of the minidump code on x86 to solve this same > >> problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c. > >> > >> > >> Using it for vm_page_array initialization and segmentation means that > >> phys_avail must be a subset of it. And this must be stated and be > visible > >> enough. Maybe it should be even checked in code. I like the idea of > >> thinking about dump_avail as something what desribes all memory in a > >> system, but it's not how dump_avail is defined in archs now. > >> > >> > >> > >> When you say "it's not how dump_avail is defined in archs now", I'm not > >> sure whether you're talking about the code or the comments. In terms of > >> code, dump_avail[] is a superset of phys_avail[], and I'm not aware of > any > >> code that would have to change. In terms of comments, I did a grep > looking > >> for comments defining what dump_avail[] is, because I couldn't remember > >> any. I found one ... on arm. So, I don't think it's a onerous task > >> changing the definition of dump_avail[]. :-) > >> > >> Already, as things stand today with dump_avail[] being used outside of > the > >> minidump code, one could reasonably argue that it should be renamed to > >> something like phys_exists[]. > >> > >> > >> > >> I will experiment with it on monday then. However, it's not only about > how > >> memory segments are created in vm_phys.c, but it's about how > vm_page_array > >> size is computed in vm_page.c too. > >> > >> > >> > >> Yes, and there is also a place in vm_reserv.c that needs to change. > I've > >> attached the patch that I developed and tested a long time ago. It > still > >> applies cleanly and runs ok on amd64. > >> > >> > >> > > > > > > Well, I've created and tested minimalistic patch which - I hope - is > > commitable. It runs ok on pandaboard (arm-v6) and solves presented > problem. > > I would really appreciate if this will be commited. Thanks. > > > Sorry for the slow reply. I've just been swamped with work lately. I > finally had some time to look at this in the last day or so. > > The first thing that I propose to do is commit the attached patch. This > patch changes pmap_init() on amd64, armv6, and i386 so that it no longer > consults phys_avail[] to determine the end of memory. Instead, it calls > a new function provided by vm_phys.c to obtain the same information from > vm_phys_segs[]. > > With this change, the new variable phys_managed in your patch wouldn't > need to be a global. It could be a local variable in vm_page_startup() > that we pass as a parameter to vm_phys_init() and vm_reserv_init(). > > More generally, the long-term vision that I have is that we would stop > using phys_avail[] after vm_page_startup() had completed. It would only > be used during initialization. After that we would use vm_phys_segs[] > and functions provided by vm_phys.c. > I understand. The patch and the long-term vision are fine for me. I just was not to bold to pass phys_managed as a parameter to vm_phys_init() and vm_reserv_init(). However, I certainly was thinking about it. While reading comment above vm_phys_get_end(), do we care of if last usable address is 0xFFFFFFFF? Do you think that the rest of my patch considering changes due to your patch is ok? > > > > BTW, while I was inspecting all archs, I think that maybe it's time to do > > what was done for busdma not long ago. There are many similar codes > across > > archs which deal with physical memory and could be generalized and put to > > kern/subr_physmem.c for utilization. All work with physical memory could > be > > simplify to two arrays of regions. > > > > phys_present[] ... describes all present physical memory regions > > phys_exclude[] ... describes various exclusions from phys_present[] > > > > Each excluded region will be labeled by flags to say what kind of > exclusion > > it is. The flags like NODUMP, NOALLOC, NOMANAGE, NOBOUNCE, NOMEMRW could > > be combined. This idea is taken from sys/arm/arm/physmem.c. > > > > All other arrays like phys_managed[], phys_avail[], dump_avail[] will be > > created from these phys_present[] and phys_exclude[]. > > This way bootstrap codes in archs could be simplified and unified. For > > example, dealing with either hw.physmem or page with PA 0x00000000 could > be > > transparent. > > > > I'm prepared to volunteer if the thing is ripe. However, some tutor will > be > > looked for. > > > I've never really looked at arm/arm/physmem.c before. Let me do that > before I comment on this. > > No problem. This could be long-term aim. However, I hope the VM_PHYSSEG_SPARSE problem could be dealt with in MI code in present time. In every case, thanks for your help.