From owner-freebsd-arch@freebsd.org  Wed Feb 17 09:42:51 2016
Return-Path: <owner-freebsd-arch@freebsd.org>
Delivered-To: freebsd-arch@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7EFC3AAB4EC
 for <freebsd-arch@mailman.ysv.freebsd.org>;
 Wed, 17 Feb 2016 09:42:51 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id 6A14B1527
 for <freebsd-arch@freebsd.org>; Wed, 17 Feb 2016 09:42:51 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 69101AAB4EA; Wed, 17 Feb 2016 09:42:51 +0000 (UTC)
Delivered-To: arch@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 689B5AAB4E9
 for <arch@mailman.ysv.freebsd.org>; Wed, 17 Feb 2016 09:42:51 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 124791525;
 Wed, 17 Feb 2016 09:42:50 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u1H9gflt094594
 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
 Wed, 17 Feb 2016 11:42:42 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u1H9gflt094594
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id u1H9gfvN094593;
 Wed, 17 Feb 2016 11:42:41 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Wed, 17 Feb 2016 11:42:41 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Cc: arch@freebsd.org
Subject: Re: Starting APs earlier during boot
Message-ID: <20160217094241.GF91220@kib.kiev.ua>
References: <1730061.8Ii36ORVKt@ralph.baldwin.cx>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1730061.8Ii36ORVKt@ralph.baldwin.cx>
User-Agent: Mutt/1.5.24 (2015-08-30)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Feb 2016 09:42:51 -0000

On Tue, Feb 16, 2016 at 12:50:22PM -0800, John Baldwin wrote:
> Currently the kernel bootstraps the non-boot processors fairly early in the
> SI_SUB_CPU SYSINIT.  The APs then spin waiting to be "released".  We currently
> release the APs as one of the last steps at SI_SUB_SMP.  On the one hand this
> removes much of the need for synchronization while SYSINITs are running since
> SYSINITs basically assume they are single-threaded.  However, it also enforces
> some odd quirks.  Several places that deal with per-CPU resources have to
> split initialization up so that the BSP init happens in one SYSINIT and the
> initialization of the APs happens in a second SYSINIT at SI_SUB_SMP.
> 
> Another issue that is becoming more prominent on x86 (and probably will also
> affect other platforms if it isn't already) is that to support working
> interrupts for interrupt config hooks we bind all interrupts to the BSP during
> boot and only distribute them among other CPUs near the end at SI_SUB_SMP. 
> This is especially problematic with drivers for modern hardware allocating
> num(CPUs) interrupts (hoping to use one per CPU).  On x86 we have aboug 190
> IDT vectors available for device interrupts, so in theory we should be able to
> tolerate a lot of drivers doing this (e.g. 60 drivers could allocate 3
> interrupts for every CPU and we should still be fine).  However, if you have,
> say, 32 cores in a system, then you can only handle about 5 drivers doing
> this before you run out of vectors on CPU 0.
> 
> Longer term we would also like to eventually have most drivers attach in the 
> same environment during boot as during post-boot.  Right now post-boot is 
> quite different as all CPUs are running, interrupts work, etc.  One of the 
> goals of multipass support for new-bus is to help us get there by probing 
> enough hardware to get timers working and starting the scheduler before 
> probing the rest of the devices.  That goal isn't quite realized yet.
> 
> However, we can run a slightly simpler version of our scheduler before
> timers are working.  In fact, sleep/wakeup work just fine fairly early (we
> allocate the necessary structures at SI_SUB_KMEM which is before the APs
> are even started).  Once idle threads are created and ready we could in
> theory let the APs startup and run other threads.  You just don't have working 
> timeouts.  OTOH, you can sort of simulate timeouts if you modify the scheduler 
> to yield the CPU instead of blocking the thread for a sleep with a timeout.  
> The effect would be for threads that do sleeps with a timeout to fall back to 
> polling before timers are working.  In practice, all of the early kernel 
> threads use sleeps without timeouts when idle so this doesn't really matter.
I understand that timeouts can be somewhat simulated this way.

But I do not quite understand how generic scheduling can work without
(timer) interrupts. Suppose that we have two threads 1 and 2 of the same
priority, both runnable, and due to some event thread 2 preempted thread
1. If thread 2 just runs without calling the preempt functions like
msleep, what would guarentee that thread 1 eventually gets it CPU slice ?

E.g. there might be no interrupts set up yet, and idle thread on UP
gets on CPU, then the whole boot process could deadlock.

> 
> I've implemented these changes and tested them for x86.  For x86 at least
> AP startup needed some bits of the interrupt infrastructure in place, so
> I moved SI_SUB_SMP up to after SI_SUB_INTR but before SI_SUB_SOFTINTR.  I
> modified the *sleep() and cv_*wait*() routines to not always bail if cold
> is true.  Instead, sleeps without a timeout are permitted to sleep
> "normally".  Sleeps with a timeout drop their interlock and yield the
> CPU (but remain runnable).  Since APs are now fully running this means
> interrupts are now routed to all CPUs from the get go removing the need for 
> the post-boot shuffle.  This also resolves the issue of running out of IDT 
> vectors on the boot CPU.
> 
> I believe that adopting other platforms for this change should be relatively
> simple, but we should do that before committing the full patch.  I do think
> that some parts of the patch (such as the changes to the sleep routines, and
> using SI_SUB_LAST instead of SI_SUB_SMP as a catch-all SYSINIT) can be 
> committed now without breaking anything.
> 
> However, I'd like feedback on the general idea and if it is acceptable I'd
> like to coordinate testing with other platforms so this can go into the
> tree.
> 
> The current changes are in the 'ap_startup' branch at github/bsdjhb/freebsd.
> You can view them here:
> 
> https://github.com/bsdjhb/freebsd/compare/master...bsdjhb:ap_startup
> 
> -- 
> John Baldwin
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"