From owner-freebsd-arch@freebsd.org Wed Feb 17 09:42:51 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7EFC3AAB4EC for ; Wed, 17 Feb 2016 09:42:51 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 6A14B1527 for ; Wed, 17 Feb 2016 09:42:51 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id 69101AAB4EA; Wed, 17 Feb 2016 09:42:51 +0000 (UTC) Delivered-To: arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 689B5AAB4E9 for ; Wed, 17 Feb 2016 09:42:51 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 124791525; Wed, 17 Feb 2016 09:42:50 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u1H9gflt094594 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Wed, 17 Feb 2016 11:42:42 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u1H9gflt094594 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id u1H9gfvN094593; Wed, 17 Feb 2016 11:42:41 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 17 Feb 2016 11:42:41 +0200 From: Konstantin Belousov To: John Baldwin Cc: arch@freebsd.org Subject: Re: Starting APs earlier during boot Message-ID: <20160217094241.GF91220@kib.kiev.ua> References: <1730061.8Ii36ORVKt@ralph.baldwin.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1730061.8Ii36ORVKt@ralph.baldwin.cx> User-Agent: Mutt/1.5.24 (2015-08-30) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Feb 2016 09:42:51 -0000 On Tue, Feb 16, 2016 at 12:50:22PM -0800, John Baldwin wrote: > Currently the kernel bootstraps the non-boot processors fairly early in the > SI_SUB_CPU SYSINIT. The APs then spin waiting to be "released". We currently > release the APs as one of the last steps at SI_SUB_SMP. On the one hand this > removes much of the need for synchronization while SYSINITs are running since > SYSINITs basically assume they are single-threaded. However, it also enforces > some odd quirks. Several places that deal with per-CPU resources have to > split initialization up so that the BSP init happens in one SYSINIT and the > initialization of the APs happens in a second SYSINIT at SI_SUB_SMP. > > Another issue that is becoming more prominent on x86 (and probably will also > affect other platforms if it isn't already) is that to support working > interrupts for interrupt config hooks we bind all interrupts to the BSP during > boot and only distribute them among other CPUs near the end at SI_SUB_SMP. > This is especially problematic with drivers for modern hardware allocating > num(CPUs) interrupts (hoping to use one per CPU). On x86 we have aboug 190 > IDT vectors available for device interrupts, so in theory we should be able to > tolerate a lot of drivers doing this (e.g. 60 drivers could allocate 3 > interrupts for every CPU and we should still be fine). However, if you have, > say, 32 cores in a system, then you can only handle about 5 drivers doing > this before you run out of vectors on CPU 0. > > Longer term we would also like to eventually have most drivers attach in the > same environment during boot as during post-boot. Right now post-boot is > quite different as all CPUs are running, interrupts work, etc. One of the > goals of multipass support for new-bus is to help us get there by probing > enough hardware to get timers working and starting the scheduler before > probing the rest of the devices. That goal isn't quite realized yet. > > However, we can run a slightly simpler version of our scheduler before > timers are working. In fact, sleep/wakeup work just fine fairly early (we > allocate the necessary structures at SI_SUB_KMEM which is before the APs > are even started). Once idle threads are created and ready we could in > theory let the APs startup and run other threads. You just don't have working > timeouts. OTOH, you can sort of simulate timeouts if you modify the scheduler > to yield the CPU instead of blocking the thread for a sleep with a timeout. > The effect would be for threads that do sleeps with a timeout to fall back to > polling before timers are working. In practice, all of the early kernel > threads use sleeps without timeouts when idle so this doesn't really matter. I understand that timeouts can be somewhat simulated this way. But I do not quite understand how generic scheduling can work without (timer) interrupts. Suppose that we have two threads 1 and 2 of the same priority, both runnable, and due to some event thread 2 preempted thread 1. If thread 2 just runs without calling the preempt functions like msleep, what would guarentee that thread 1 eventually gets it CPU slice ? E.g. there might be no interrupts set up yet, and idle thread on UP gets on CPU, then the whole boot process could deadlock. > > I've implemented these changes and tested them for x86. For x86 at least > AP startup needed some bits of the interrupt infrastructure in place, so > I moved SI_SUB_SMP up to after SI_SUB_INTR but before SI_SUB_SOFTINTR. I > modified the *sleep() and cv_*wait*() routines to not always bail if cold > is true. Instead, sleeps without a timeout are permitted to sleep > "normally". Sleeps with a timeout drop their interlock and yield the > CPU (but remain runnable). Since APs are now fully running this means > interrupts are now routed to all CPUs from the get go removing the need for > the post-boot shuffle. This also resolves the issue of running out of IDT > vectors on the boot CPU. > > I believe that adopting other platforms for this change should be relatively > simple, but we should do that before committing the full patch. I do think > that some parts of the patch (such as the changes to the sleep routines, and > using SI_SUB_LAST instead of SI_SUB_SMP as a catch-all SYSINIT) can be > committed now without breaking anything. > > However, I'd like feedback on the general idea and if it is acceptable I'd > like to coordinate testing with other platforms so this can go into the > tree. > > The current changes are in the 'ap_startup' branch at github/bsdjhb/freebsd. > You can view them here: > > https://github.com/bsdjhb/freebsd/compare/master...bsdjhb:ap_startup > > -- > John Baldwin > _______________________________________________ > freebsd-arch@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"