From owner-freebsd-arch@FreeBSD.ORG Thu Oct 23 22:14:36 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 42BFFFBD; Thu, 23 Oct 2014 22:14:36 +0000 (UTC) Received: from pp1.rice.edu (proofpoint1.mail.rice.edu [128.42.201.100]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 036E0A0F; Thu, 23 Oct 2014 22:14:35 +0000 (UTC) Received: from pps.filterd (pp1.rice.edu [127.0.0.1]) by pp1.rice.edu (8.14.5/8.14.5) with SMTP id s9NMD0sH022264; Thu, 23 Oct 2014 17:14:27 -0500 Received: from mh3.mail.rice.edu (mh3.mail.rice.edu [128.42.199.10]) by pp1.rice.edu with ESMTP id 1q76u685ky-1; Thu, 23 Oct 2014 17:14:26 -0500 X-Virus-Scanned: by amavis-2.7.0 at mh3.mail.rice.edu, auth channel Received: from 108-254-203-201.lightspeed.hstntx.sbcglobal.net (108-254-203-201.lightspeed.hstntx.sbcglobal.net [108.254.203.201]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) (Authenticated sender: alc) by mh3.mail.rice.edu (Postfix) with ESMTPSA id 489484003F; Thu, 23 Oct 2014 17:14:26 -0500 (CDT) Message-ID: <54497DC1.5070506@rice.edu> Date: Thu, 23 Oct 2014 17:14:25 -0500 From: Alan Cox User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Svatopluk Kraus Subject: Re: vm_page_array and VM_PHYSSEG_SPARSE References: <5428AF3B.1030906@rice.edu> In-Reply-To: X-Enigmail-Version: 1.6 Content-Type: multipart/mixed; boundary="------------070109070505060304030005" X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=11 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000 definitions=main-1410230150 Cc: alc@freebsd.org, FreeBSD Arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Oct 2014 22:14:36 -0000 This is a multi-part message in MIME format. --------------070109070505060304030005 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 10/08/2014 10:38, Svatopluk Kraus wrote: > On Mon, Sep 29, 2014 at 3:00 AM, Alan Cox wrote: > >> On 09/27/2014 03:51, Svatopluk Kraus wrote: >> >> >> On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox wrote: >> >>> >>> On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus >>> wrote: >>> >>>> Hi, >>>> >>>> I and Michal are finishing new ARM pmap-v6 code. There is one problem >>>> we've >>>> dealt with somehow, but now we would like to do it better. It's about >>>> physical pages which are allocated before vm subsystem is initialized. >>>> While later on these pages could be found in vm_page_array when >>>> VM_PHYSSEG_DENSE memory model is used, it's not true for >>>> VM_PHYSSEG_SPARSE >>>> memory model. And ARM world uses VM_PHYSSEG_SPARSE model. >>>> >>>> It really would be nice to utilize vm_page_array for such preallocated >>>> physical pages even when VM_PHYSSEG_SPARSE memory model is used. Things >>>> could be much easier then. In our case, it's about pages which are used >>>> for >>>> level 2 page tables. In VM_PHYSSEG_SPARSE model, we have two sets of such >>>> pages. First ones are preallocated and second ones are allocated after vm >>>> subsystem was inited. We must deal with each set differently. So code is >>>> more complex and so is debugging. >>>> >>>> Thus we need some method how to say that some part of physical memory >>>> should be included in vm_page_array, but the pages from that region >>>> should >>>> not be put to free list during initialization. We think that such >>>> possibility could be utilized in general. There could be a need for some >>>> physical space which: >>>> >>>> (1) is needed only during boot and later on it can be freed and put to vm >>>> subsystem, >>>> >>>> (2) is needed for something else and vm_page_array code could be used >>>> without some kind of its duplication. >>>> >>>> There is already some code which deals with blacklisted pages in >>>> vm_page.c >>>> file. So the easiest way how to deal with presented situation is to add >>>> some callback to this part of code which will be able to either exclude >>>> whole phys_avail[i], phys_avail[i+1] region or single pages. As the >>>> biggest >>>> phys_avail region is used for vm subsystem allocations, there should be >>>> some more coding. (However, blacklisted pages are not dealt with on that >>>> part of region.) >>>> >>>> We would like to know if there is any objection: >>>> >>>> (1) to deal with presented problem, >>>> (2) to deal with the problem presented way. >>>> Some help is very appreciated. Thanks >>>> >>>> >>> As an experiment, try modifying vm_phys.c to use dump_avail instead of >>> phys_avail when sizing vm_page_array. On amd64, where the same problem >>> exists, this allowed me to use VM_PHYSSEG_SPARSE. Right now, this is >>> probably my preferred solution. The catch being that not all architectures >>> implement dump_avail, but my recollection is that arm does. >>> >> Frankly, I would prefer this too, but there is one big open question: >> >> What is dump_avail for? >> >> >> >> dump_avail[] is solving a similar problem in the minidump code, hence, the >> prefix "dump_" in its name. In other words, the minidump code couldn't use >> phys_avail[] either because it didn't describe the full range of physical >> addresses that might be included in a minidump, so dump_avail[] was created. >> >> There is already precedent for what I'm suggesting. dump_avail[] is >> already (ab)used outside of the minidump code on x86 to solve this same >> problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c. >> >> >> Using it for vm_page_array initialization and segmentation means that >> phys_avail must be a subset of it. And this must be stated and be visible >> enough. Maybe it should be even checked in code. I like the idea of >> thinking about dump_avail as something what desribes all memory in a >> system, but it's not how dump_avail is defined in archs now. >> >> >> >> When you say "it's not how dump_avail is defined in archs now", I'm not >> sure whether you're talking about the code or the comments. In terms of >> code, dump_avail[] is a superset of phys_avail[], and I'm not aware of any >> code that would have to change. In terms of comments, I did a grep looking >> for comments defining what dump_avail[] is, because I couldn't remember >> any. I found one ... on arm. So, I don't think it's a onerous task >> changing the definition of dump_avail[]. :-) >> >> Already, as things stand today with dump_avail[] being used outside of the >> minidump code, one could reasonably argue that it should be renamed to >> something like phys_exists[]. >> >> >> >> I will experiment with it on monday then. However, it's not only about how >> memory segments are created in vm_phys.c, but it's about how vm_page_array >> size is computed in vm_page.c too. >> >> >> >> Yes, and there is also a place in vm_reserv.c that needs to change. I've >> attached the patch that I developed and tested a long time ago. It still >> applies cleanly and runs ok on amd64. >> >> >> > > > Well, I've created and tested minimalistic patch which - I hope - is > commitable. It runs ok on pandaboard (arm-v6) and solves presented problem. > I would really appreciate if this will be commited. Thanks. Sorry for the slow reply. I've just been swamped with work lately. I finally had some time to look at this in the last day or so. The first thing that I propose to do is commit the attached patch. This patch changes pmap_init() on amd64, armv6, and i386 so that it no longer consults phys_avail[] to determine the end of memory. Instead, it calls a new function provided by vm_phys.c to obtain the same information from vm_phys_segs[]. With this change, the new variable phys_managed in your patch wouldn't need to be a global. It could be a local variable in vm_page_startup() that we pass as a parameter to vm_phys_init() and vm_reserv_init(). More generally, the long-term vision that I have is that we would stop using phys_avail[] after vm_page_startup() had completed. It would only be used during initialization. After that we would use vm_phys_segs[] and functions provided by vm_phys.c. > > BTW, while I was inspecting all archs, I think that maybe it's time to do > what was done for busdma not long ago. There are many similar codes across > archs which deal with physical memory and could be generalized and put to > kern/subr_physmem.c for utilization. All work with physical memory could be > simplify to two arrays of regions. > > phys_present[] ... describes all present physical memory regions > phys_exclude[] ... describes various exclusions from phys_present[] > > Each excluded region will be labeled by flags to say what kind of exclusion > it is. The flags like NODUMP, NOALLOC, NOMANAGE, NOBOUNCE, NOMEMRW could > be combined. This idea is taken from sys/arm/arm/physmem.c. > > All other arrays like phys_managed[], phys_avail[], dump_avail[] will be > created from these phys_present[] and phys_exclude[]. > This way bootstrap codes in archs could be simplified and unified. For > example, dealing with either hw.physmem or page with PA 0x00000000 could be > transparent. > > I'm prepared to volunteer if the thing is ripe. However, some tutor will be > looked for. I've never really looked at arm/arm/physmem.c before. Let me do that before I comment on this. --------------070109070505060304030005 Content-Type: text/plain; charset=ISO-8859-15; name="vm_phys_get_end1.patch" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="vm_phys_get_end1.patch" SW5kZXg6IGFtZDY0L2FtZDY0L3BtYXAuYwo9PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSBhbWQ2NC9hbWQ2 NC9wbWFwLmMJKHJldmlzaW9uIDI3MzU1MCkKKysrIGFtZDY0L2FtZDY0L3BtYXAuYwkod29y a2luZyBjb3B5KQpAQCAtMTMwLDYgKzEzMCw3IEBAIF9fRkJTRElEKCIkRnJlZUJTRCQiKTsK ICNpbmNsdWRlIDx2bS92bV9leHRlcm4uaD4KICNpbmNsdWRlIDx2bS92bV9wYWdlb3V0Lmg+ CiAjaW5jbHVkZSA8dm0vdm1fcGFnZXIuaD4KKyNpbmNsdWRlIDx2bS92bV9waHlzLmg+CiAj aW5jbHVkZSA8dm0vdm1fcmFkaXguaD4KICNpbmNsdWRlIDx2bS92bV9yZXNlcnYuaD4KICNp bmNsdWRlIDx2bS91bWEuaD4KQEAgLTEwNjAsOCArMTA2MSw3IEBAIHBtYXBfaW5pdCh2b2lk KQogCS8qCiAJICogQ2FsY3VsYXRlIHRoZSBzaXplIG9mIHRoZSBwdiBoZWFkIHRhYmxlIGZv ciBzdXBlcnBhZ2VzLgogCSAqLwotCWZvciAoaSA9IDA7IHBoeXNfYXZhaWxbaSArIDFdOyBp ICs9IDIpOwotCXB2X25wZyA9IHJvdW5kXzJtcGFnZShwaHlzX2F2YWlsWyhpIC0gMikgKyAx XSkgLyBOQlBEUjsKKwlwdl9ucGcgPSByb3VuZF8ybXBhZ2Uodm1fcGh5c19nZXRfZW5kKCkp IC8gTkJQRFI7CiAKIAkvKgogCSAqIEFsbG9jYXRlIG1lbW9yeSBmb3IgdGhlIHB2IGhlYWQg dGFibGUgZm9yIHN1cGVycGFnZXMuCkluZGV4OiBhcm0vYXJtL3BtYXAtdjYuYwo9PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09Ci0tLSBhcm0vYXJtL3BtYXAtdjYuYwkocmV2aXNpb24gMjczNTUwKQorKysgYXJt L2FybS9wbWFwLXY2LmMJKHdvcmtpbmcgY29weSkKQEAgLTE3Miw2ICsxNzIsNyBAQCBfX0ZC U0RJRCgiJEZyZWVCU0QkIik7CiAjaW5jbHVkZSA8dm0vdm1fbWFwLmg+CiAjaW5jbHVkZSA8 dm0vdm1fcGFnZS5oPgogI2luY2x1ZGUgPHZtL3ZtX3BhZ2VvdXQuaD4KKyNpbmNsdWRlIDx2 bS92bV9waHlzLmg+CiAjaW5jbHVkZSA8dm0vdm1fZXh0ZXJuLmg+CiAjaW5jbHVkZSA8dm0v dm1fcmVzZXJ2Lmg+CiAKQEAgLTEzNDMsOCArMTM0NCw3IEBAIHBtYXBfaW5pdCh2b2lkKQog CS8qCiAJICogQ2FsY3VsYXRlIHRoZSBzaXplIG9mIHRoZSBwdiBoZWFkIHRhYmxlIGZvciBz dXBlcnBhZ2VzLgogCSAqLwotCWZvciAoaSA9IDA7IHBoeXNfYXZhaWxbaSArIDFdOyBpICs9 IDIpOwotCXB2X25wZyA9IHJvdW5kXzFtcGFnZShwaHlzX2F2YWlsWyhpIC0gMikgKyAxXSkg LyBOQlBEUjsKKwlwdl9ucGcgPSByb3VuZF8xbXBhZ2Uodm1fcGh5c19nZXRfZW5kKCkpIC8g TkJQRFI7CiAKIAkvKgogCSAqIEFsbG9jYXRlIG1lbW9yeSBmb3IgdGhlIHB2IGhlYWQgdGFi bGUgZm9yIHN1cGVycGFnZXMuCkluZGV4OiBpMzg2L2kzODYvcG1hcC5jCj09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT0KLS0tIGkzODYvaTM4Ni9wbWFwLmMJKHJldmlzaW9uIDI3MzU1MCkKKysrIGkzODYvaTM4 Ni9wbWFwLmMJKHdvcmtpbmcgY29weSkKQEAgLTEzMyw2ICsxMzMsNyBAQCBfX0ZCU0RJRCgi JEZyZWVCU0QkIik7CiAjaW5jbHVkZSA8dm0vdm1fZXh0ZXJuLmg+CiAjaW5jbHVkZSA8dm0v dm1fcGFnZW91dC5oPgogI2luY2x1ZGUgPHZtL3ZtX3BhZ2VyLmg+CisjaW5jbHVkZSA8dm0v dm1fcGh5cy5oPgogI2luY2x1ZGUgPHZtL3ZtX3JhZGl4Lmg+CiAjaW5jbHVkZSA8dm0vdm1f cmVzZXJ2Lmg+CiAjaW5jbHVkZSA8dm0vdW1hLmg+CkBAIC03NzksOCArNzgwLDcgQEAgcG1h cF9pbml0KHZvaWQpCiAJLyoKIAkgKiBDYWxjdWxhdGUgdGhlIHNpemUgb2YgdGhlIHB2IGhl YWQgdGFibGUgZm9yIHN1cGVycGFnZXMuCiAJICovCi0JZm9yIChpID0gMDsgcGh5c19hdmFp bFtpICsgMV07IGkgKz0gMik7Ci0JcHZfbnBnID0gcm91bmRfNG1wYWdlKHBoeXNfYXZhaWxb KGkgLSAyKSArIDFdKSAvIE5CUERSOworCXB2X25wZyA9IHJvdW5kXzRtcGFnZSh2bV9waHlz X2dldF9lbmQoKSkgLyBOQlBEUjsKIAogCS8qCiAJICogQWxsb2NhdGUgbWVtb3J5IGZvciB0 aGUgcHYgaGVhZCB0YWJsZSBmb3Igc3VwZXJwYWdlcy4KSW5kZXg6IHZtL3ZtX3BoeXMuYwo9 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09Ci0tLSB2bS92bV9waHlzLmMJKHJldmlzaW9uIDI3MzU1MCkKKysrIHZt L3ZtX3BoeXMuYwkod29ya2luZyBjb3B5KQpAQCAtODU2LDYgKzg1NiwxNyBAQCB2bV9waHlz X2ZyZWVfY29udGlnKHZtX3BhZ2VfdCBtLCB1X2xvbmcgbnBhZ2VzKQogfQogCiAvKgorICog UmV0dXJuIHRoZSBwaHlzaWNhbCBhZGRyZXNzIG9mIHRoZSBlbmQgb2YgbWVtb3J5LCB0aGF0 IGlzLCB0aGUgYWRkcmVzcyBvZgorICogdGhlIGxhc3QgdXNlYWJsZSBieXRlIG9mIFJBTSBw bHVzIG9uZS4KKyAqLwordm1fcGFkZHJfdAordm1fcGh5c19nZXRfZW5kKHZvaWQpCit7CisK KwlyZXR1cm4gKHZtX3BoeXNfc2Vnc1t2bV9waHlzX25zZWdzIC0gMV0uZW5kKTsKK30KKwor LyoKICAqIFNldCB0aGUgcG9vbCBmb3IgYSBjb250aWd1b3VzLCBwb3dlciBvZiB0d28tc2l6 ZWQgc2V0IG9mIHBoeXNpY2FsIHBhZ2VzLiAKICAqLwogdm9pZApJbmRleDogdm0vdm1fcGh5 cy5oCj09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT0KLS0tIHZtL3ZtX3BoeXMuaAkocmV2aXNpb24gMjczNTUwKQor Kysgdm0vdm1fcGh5cy5oCSh3b3JraW5nIGNvcHkpCkBAIC04MCw2ICs4MCw3IEBAIHZvaWQg dm1fcGh5c19maWN0aXRpb3VzX3VucmVnX3JhbmdlKHZtX3BhZGRyX3Qgc3RhCiB2bV9wYWdl X3Qgdm1fcGh5c19maWN0aXRpb3VzX3RvX3ZtX3BhZ2Uodm1fcGFkZHJfdCBwYSk7CiB2b2lk IHZtX3BoeXNfZnJlZV9jb250aWcodm1fcGFnZV90IG0sIHVfbG9uZyBucGFnZXMpOwogdm9p ZCB2bV9waHlzX2ZyZWVfcGFnZXModm1fcGFnZV90IG0sIGludCBvcmRlcik7Cit2bV9wYWRk cl90IHZtX3BoeXNfZ2V0X2VuZCh2b2lkKTsKIHZvaWQgdm1fcGh5c19pbml0KHZvaWQpOwog dm1fcGFnZV90IHZtX3BoeXNfcGFkZHJfdG9fdm1fcGFnZSh2bV9wYWRkcl90IHBhKTsKIHZv aWQgdm1fcGh5c19zZXRfcG9vbChpbnQgcG9vbCwgdm1fcGFnZV90IG0sIGludCBvcmRlcik7 Cg== --------------070109070505060304030005--