Date: Wed, 17 Feb 2016 11:42:41 +0200 From: Konstantin Belousov <kostikbel@gmail.com> To: John Baldwin <jhb@freebsd.org> Cc: arch@freebsd.org Subject: Re: Starting APs earlier during boot Message-ID: <20160217094241.GF91220@kib.kiev.ua> In-Reply-To: <1730061.8Ii36ORVKt@ralph.baldwin.cx> References: <1730061.8Ii36ORVKt@ralph.baldwin.cx>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Feb 16, 2016 at 12:50:22PM -0800, John Baldwin wrote: > Currently the kernel bootstraps the non-boot processors fairly early in the > SI_SUB_CPU SYSINIT. The APs then spin waiting to be "released". We currently > release the APs as one of the last steps at SI_SUB_SMP. On the one hand this > removes much of the need for synchronization while SYSINITs are running since > SYSINITs basically assume they are single-threaded. However, it also enforces > some odd quirks. Several places that deal with per-CPU resources have to > split initialization up so that the BSP init happens in one SYSINIT and the > initialization of the APs happens in a second SYSINIT at SI_SUB_SMP. > > Another issue that is becoming more prominent on x86 (and probably will also > affect other platforms if it isn't already) is that to support working > interrupts for interrupt config hooks we bind all interrupts to the BSP during > boot and only distribute them among other CPUs near the end at SI_SUB_SMP. > This is especially problematic with drivers for modern hardware allocating > num(CPUs) interrupts (hoping to use one per CPU). On x86 we have aboug 190 > IDT vectors available for device interrupts, so in theory we should be able to > tolerate a lot of drivers doing this (e.g. 60 drivers could allocate 3 > interrupts for every CPU and we should still be fine). However, if you have, > say, 32 cores in a system, then you can only handle about 5 drivers doing > this before you run out of vectors on CPU 0. > > Longer term we would also like to eventually have most drivers attach in the > same environment during boot as during post-boot. Right now post-boot is > quite different as all CPUs are running, interrupts work, etc. One of the > goals of multipass support for new-bus is to help us get there by probing > enough hardware to get timers working and starting the scheduler before > probing the rest of the devices. That goal isn't quite realized yet. > > However, we can run a slightly simpler version of our scheduler before > timers are working. In fact, sleep/wakeup work just fine fairly early (we > allocate the necessary structures at SI_SUB_KMEM which is before the APs > are even started). Once idle threads are created and ready we could in > theory let the APs startup and run other threads. You just don't have working > timeouts. OTOH, you can sort of simulate timeouts if you modify the scheduler > to yield the CPU instead of blocking the thread for a sleep with a timeout. > The effect would be for threads that do sleeps with a timeout to fall back to > polling before timers are working. In practice, all of the early kernel > threads use sleeps without timeouts when idle so this doesn't really matter. I understand that timeouts can be somewhat simulated this way. But I do not quite understand how generic scheduling can work without (timer) interrupts. Suppose that we have two threads 1 and 2 of the same priority, both runnable, and due to some event thread 2 preempted thread 1. If thread 2 just runs without calling the preempt functions like msleep, what would guarentee that thread 1 eventually gets it CPU slice ? E.g. there might be no interrupts set up yet, and idle thread on UP gets on CPU, then the whole boot process could deadlock. > > I've implemented these changes and tested them for x86. For x86 at least > AP startup needed some bits of the interrupt infrastructure in place, so > I moved SI_SUB_SMP up to after SI_SUB_INTR but before SI_SUB_SOFTINTR. I > modified the *sleep() and cv_*wait*() routines to not always bail if cold > is true. Instead, sleeps without a timeout are permitted to sleep > "normally". Sleeps with a timeout drop their interlock and yield the > CPU (but remain runnable). Since APs are now fully running this means > interrupts are now routed to all CPUs from the get go removing the need for > the post-boot shuffle. This also resolves the issue of running out of IDT > vectors on the boot CPU. > > I believe that adopting other platforms for this change should be relatively > simple, but we should do that before committing the full patch. I do think > that some parts of the patch (such as the changes to the sleep routines, and > using SI_SUB_LAST instead of SI_SUB_SMP as a catch-all SYSINIT) can be > committed now without breaking anything. > > However, I'd like feedback on the general idea and if it is acceptable I'd > like to coordinate testing with other platforms so this can go into the > tree. > > The current changes are in the 'ap_startup' branch at github/bsdjhb/freebsd. > You can view them here: > > https://github.com/bsdjhb/freebsd/compare/master...bsdjhb:ap_startup > > -- > John Baldwin > _______________________________________________ > freebsd-arch@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160217094241.GF91220>