From owner-freebsd-current@freebsd.org Sun Jul 31 21:25:20 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 227A8BAA2A6 for ; Sun, 31 Jul 2016 21:25:20 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 0FAD61A58 for ; Sun, 31 Jul 2016 21:25:20 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: by mailman.ysv.freebsd.org (Postfix) id 0B6BCBAA2A5; Sun, 31 Jul 2016 21:25:20 +0000 (UTC) Delivered-To: current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0B100BAA2A4 for ; Sun, 31 Jul 2016 21:25:20 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C011F1A57 for ; Sun, 31 Jul 2016 21:25:19 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 59D2AB96E; Sun, 31 Jul 2016 17:25:18 -0400 (EDT) From: John Baldwin To: gljennjohn@gmail.com Cc: current@freebsd.org Subject: Re: EARLY_AP_STARTUP hangs during boot Date: Sun, 31 Jul 2016 14:22:35 -0700 Message-ID: <1944919.bPt123bEA6@ralph.baldwin.cx> User-Agent: KMail/4.14.3 (FreeBSD/10.3-STABLE; KDE/4.14.3; amd64; ; ) In-Reply-To: <20160731112914.58bd9765@ernst.home> References: <20160516122242.39249a54@ernst.home> <8097239.52FCHCROUA@ralph.baldwin.cx> <20160731112914.58bd9765@ernst.home> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Sun, 31 Jul 2016 17:25:18 -0400 (EDT) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 31 Jul 2016 21:25:20 -0000 On Sunday, July 31, 2016 11:29:14 AM Gary Jennejohn wrote: > On Sat, 30 Jul 2016 12:03:59 -0700 > John Baldwin wrote: > > > On Saturday, July 30, 2016 09:44:22 AM Gary Jennejohn wrote: > > > On Fri, 29 Jul 2016 13:17:42 -0700 > > > John Baldwin wrote: > > > > > > > On Thursday, July 28, 2016 12:31:31 AM Gary Jennejohn wrote: > > > > > Well, now I know that ULE is a prerequiste for EARLY_AP_STARTUP! I > > > > > wasn't aware of that. I prefer BSD and that's the scheduler I did > > > > > the first tests with. > > > > > > > > > > But with the ULE scheduler the system comes up all the way. > > > > > > > > > > It would be nice if the BSD scheduler could also be modified to > > > > > work with EARLY_AP_STARTUP. > > > > > > > > I wasn't able to reproduce your hang with 4BSD, but I think I see a > > > > possible problem. Try this: > > > > > > > > diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c > > > > index 7de56b6..d53331a 100644 > > > > --- a/sys/kern/sched_4bsd.c > > > > +++ b/sys/kern/sched_4bsd.c > > > > @@ -327,7 +327,6 @@ maybe_preempt(struct thread *td) > > > > * - The current thread has a higher (numerically lower) or > > > > * equivalent priority. Note that this prevents curthread from > > > > * trying to preempt to itself. > > > > - * - It is too early in the boot for context switches (cold is set). > > > > * - The current thread has an inhibitor set or is in the process of > > > > * exiting. In this case, the current thread is about to switch > > > > * out anyways, so there's no point in preempting. If we did, > > > > @@ -348,7 +347,7 @@ maybe_preempt(struct thread *td) > > > > ("maybe_preempt: trying to run inhibited thread")); > > > > pri = td->td_priority; > > > > cpri = ctd->td_priority; > > > > - if (panicstr != NULL || pri >= cpri || cold /* || dumping */ || > > > > + if (panicstr != NULL || pri >= cpri /* || dumping */ || > > > > TD_IS_INHIBITED(ctd)) > > > > return (0); > > > > #ifndef FULL_PREEMPTION > > > > @@ -1127,7 +1126,7 @@ forward_wakeup(int cpunum) > > > > if ((!forward_wakeup_enabled) || > > > > (forward_wakeup_use_mask == 0 && forward_wakeup_use_loop == 0)) > > > > return (0); > > > > - if (!smp_started || cold || panicstr) > > > > + if (!smp_started || panicstr) > > > > return (0); > > > > > > > > forward_wakeups_requested++; > > > > > > > > > > Thanks, but with this patch the kernel hangs in exactly the same > > > place as before - after the HPET output. > > > > > > Maybe I'm missing some kernel option which ULE works around, or > > > something like that. > > > > Hmm, ok. Please add KTR_RUNQ and KTR_SMP to the KTR masks, that is > > 'options KTR_COMPILE=(KTR_PROC|KTR_RUNQ|KTR_SMP)' and > > 'options KTR_MASK=(KTR_PROC|KTR_RUNQ|KTR_SMP)' > > > > Please also add this patch (on top of the previous patch): > > > > diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c > > index 2973a23..bab2278 100644 > > --- a/sys/kern/sched_4bsd.c > > +++ b/sys/kern/sched_4bsd.c > > @@ -1278,6 +1278,8 @@ sched_add(struct thread *td, int flags) > > KASSERT(td->td_flags & TDF_INMEM, > > ("sched_add: thread swapped out")); > > > > + CTR2(KTR_PROC, "sched_add: thread %d (%s)", td->td_tid, > > + sched_tdname(td)); > > KTR_STATE2(KTR_SCHED, "thread", sched_tdname(td), "runq add", > > "prio:%d", td->td_priority, KTR_ATTR_LINKED, > > sched_tdname(curthread)); > > diff --git a/sys/x86/x86/cpu_machdep.c b/sys/x86/x86/cpu_machdep.c > > index f07b97e..1f418f1 100644 > > --- a/sys/x86/x86/cpu_machdep.c > > +++ b/sys/x86/x86/cpu_machdep.c > > @@ -440,6 +440,7 @@ cpu_idle_wakeup(int cpu) > > return (0); > > if (*state == STATE_MWAIT) > > *state = STATE_RUNNING; > > + CTR1(KTR_PROC, "cpu_idle_wakeup: wokeup CPU %d", cpu); > > return (1); > > } > > > > (I haven't tried compiling it, you might have to add the sys/ktr.h > > header to cpu_machdep.c if it doesn't build.) > > > > Hopefully we will get some better trace messages before it hangs > > with this added info. The root issue seems to be that 4BSD is > > pinning thread0 to some other CPU (due to sched_bind that happens > > inside of bus_bind_intr() when the HPET driver pins IRQs to CPUs) > > and that other CPU isn't waking up to realize it needs to run thread0. > > > > It compiled with no changes needed. > > Even though I set MAXCPU to a mere 2, the boot still hadn't > completed after 90 minutes and I broke it off. I still have > the kernel, so I can try it another time when I have less need > for my FreeBSD box. Did you have the KTR options enabled from before? I don't expect this to fix the issue, this is more about getting better debug info when it hangs. -- John Baldwin