From owner-freebsd-current Thu Jun 21 0: 6:26 2001 Delivered-To: freebsd-current@freebsd.org Received: from albatross.prod.itd.earthlink.net (albatross.mail.pas.earthlink.net [207.217.120.120]) by hub.freebsd.org (Postfix) with ESMTP id 8E95237B401; Thu, 21 Jun 2001 00:06:18 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from mindspring.com (dialup-209.247.140.53.Dial1.SanJose1.Level3.net [209.247.140.53]) by albatross.prod.itd.earthlink.net (EL-8_9_3_3/8.9.3) with ESMTP id AAA26356; Thu, 21 Jun 2001 00:06:15 -0700 (PDT) Message-ID: <3B319D07.3E47AD6D@mindspring.com> Date: Thu, 21 Jun 2001 00:06:47 -0700 From: Terry Lambert Reply-To: tlambert2@mindspring.com X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: John Baldwin Cc: current@FreeBSD.ORG, alfred@FreeBSD.ORG Subject: STARTUP ARCH. (was Re: swap_pager_swap_init()) References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG John Baldwin wrote: > > The swap pager getpages/putpages routines depend on > swap_pager_swap_init() being called before they are > called. However, swap_pager_swap_init() isn't called > until the pagedaemon starts up. Granted, it should > always be run before init has a chance to exec swapon > via /etc/rc, however, would it be more correct to > instead let swap_pager_swap_init() be run by a SI_SUB_VM > SYSINIT (SI_ORDER_ANY, with the other VM sysinit's bumped > up to be less than ANY). The race is incredibly small, > but I'd feel better if it was more correct. Comments? This heavy a change probably belongs on -arch; I will tell you what I think, and why, and let consensus sort it out. I don't think there is a race; the stuff is not really started until the scheduler is run, and that is the last thing; before that, it is merely on a run queue. A serious monkey-wrench in your plan is that the proc structure for the thing is allocated out of a zone, and the VM system is not really up at that point, sp doing it that early is not really an option. My gut tells me that it should actually _at least_ be after the SI_SUB_MBUF has started. You can _probably get away with as early as after the SI_SUB_VM_CONF has occurred (notice how this happens well after the SI_SUB_VM). But earlier is worse. Basically, the VM system comes up in stages: o load the loader o make real mem look like 16M o load the kernel o statically allocate some pages and page tables o build a VM that looks just like real memory o use a stack trampoline to enter into the virtualized version of the kernel address space at the relocation address o cpu_startup --- everything above this line needs a serious cleanup pass --- o Start running init_main.c (the stuff you are poking into right now) o Get the tunables from the loader o Start the console o Print the first thing we ever print to say we are on our way to being alive o Allocate some more stuff semi-statically by grabbing physical memory via VALLOC() o Create some page tables that can be used by the VM system to refer to the allocations that have already taken place o Remap the kernel into a single 4G page, if the CPU supports it, with the global bit, if the CPU supports it, to avoid CR3 reload shootdowns o Start the VM system, which startups up malloc() and friends --- Now you have a VM system that can't swap, but can fault to get pages from unallocated physical memory --- The problem is that the machdep.c code needs to have executed, as well as the pmap.c code, before you start trying to do zone allocations, and the VM system needs to be there and capable of fault handling for grow_kernel() (page table allocations to grow the KVA that wasn't preallocated) before you can do malloc's. This should probably be documented somewhere _AFTER_ it is cleaned up to get rid of magic incantations, like NKPDE, KPTDI, MPPTDI, APTPTDI, PTDPTDI, UMAXPTDI, and UMAXPTEOFF; I could document it all now (I spent two weeks running a backup tape through my dental fillings over it, and I'm the original author of the SYSINT() code), but it would show everyone the skid marks on our underwear. Maybe Kirk's new book will cover it, or maybe it wants another book on top of that one. This is heavy /* You are not expected to understand this.*/ territory... I haven't really looked at whether it tries to do an allocation immediately, or if it just sits there like a lump until first reference. If it's not using zinit(), with the interrupt flag, it's probably not safe that early; if it's not, please don't change it, since it will eat KVA space and potentially not use it after the change, when it didn't before (only if it's not). The reason I say "don't change this", even if it's not a problem (e.g. it sits like a lump), is that you will still end up limiting what's permissable later. Generally, the order of operations is the way that it is not because it was "arbitrarily decreed thus", but to permit the most flexibility for implementation for the people who follow, and are unlucky enough to not have metal dental fillings. I suspect that for the IA64 and Alpha support at e.g. 512G of physical RAM, we are going to want to dynamically allocate swap_pager_object_list, instead of using a static allocation, and moving it up too far from where it is would break that (bigtime). My personal preference is, to quote Buckaroo Banzai, "Hey, hey... don't pull on that: you never know what those things are attached to, that far inside the brain" when it comes to startup ordering, since some of it works because I built an initial dependency graph before SYSINIT went in, and some of it works because nothing anyone has done since then and committed has intermittently broken the dependency graph (if they broke it, they were lucky and it wasn't intermittent, so their system became a doorstop until they undid what they had done, and put it in a safer place). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message