From owner-freebsd-current@freebsd.org Thu Aug 4 06:59:12 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 379F2BAE173 for ; Thu, 4 Aug 2016 06:59:12 +0000 (UTC) (envelope-from gljennjohn@gmail.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 146D21EC0 for ; Thu, 4 Aug 2016 06:59:12 +0000 (UTC) (envelope-from gljennjohn@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id 044ADBAE172; Thu, 4 Aug 2016 06:59:12 +0000 (UTC) Delivered-To: current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 03EA2BAE171 for ; Thu, 4 Aug 2016 06:59:12 +0000 (UTC) (envelope-from gljennjohn@gmail.com) Received: from mail-wm0-x22e.google.com (mail-wm0-x22e.google.com [IPv6:2a00:1450:400c:c09::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8526D1EBD; Thu, 4 Aug 2016 06:59:11 +0000 (UTC) (envelope-from gljennjohn@gmail.com) Received: by mail-wm0-x22e.google.com with SMTP id q128so473448270wma.1; Wed, 03 Aug 2016 23:59:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; bh=hplOsfB3c4o3K4rrGCEgPTJ0dW+jHZPPQKJl5Qa78nk=; b=qB2kCKcjGoStSJPa+HUwJZt65dtuHLU3AC6D75pEK+gNN83oNRckU/BbhQMFzBDyBb ovLQfBg/x2VLLCfaLtOkANzcnGXa+6L37yVVoXFHvNzxP56AqxR9WGhibDNicmIyH1Yy DcyR97Dty924bsIM5XPXNIdHORQryWysHKU4n32AfVBgnv3yTYoJ+1QoMGu7Z6nDcHKv 6fvU+zjWI8YdueG95l31LMZ4ilZbxhmr9DEwzFMxcSyCstX03sMBnOyTR857VEtmD+t9 Cqz6lQpyTtPcdOSYCcQ3jnZ1YicscU7AW3Y3B/A5G/83ZRXX00G0pHX8GnA5Mm/oDlew y4Sg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:reply-to:mime-version:content-transfer-encoding; bh=hplOsfB3c4o3K4rrGCEgPTJ0dW+jHZPPQKJl5Qa78nk=; b=c1gGlHFZo5APzgXlPuLwt5VjBm+G8wAzm1ZK3xuUJwLsnBbFr+hHKZBX9BUtoAeVx6 Ly8IzI4TAQ+f+cSOBoUrhhAuG5ioEemHInRvBVkGwkW9YAC0ksfNrrttaP/JFmqZpn3g vIFfry5FWOoM/TPfyDByi/tj42WU08mBueyMy86KehS0J7VZV9mgILXAGOkqbPY6N5pD RKcXyssjWvcOd13tMTK827dT5ejZeOhUvycXYcwO5fsik2bhZc2MNiTe2CWp7txrAeDC +4w60Z1CNp1lr35g02O+u7naK62PwwG3QxVKHvAG7s0h4bKVtWb0CNLpYCaNsZ7BHm+J YxvA== X-Gm-Message-State: AEkoouuTZ51+W13ass3f/OHBsdtz1rwmmPx04oj4H6cpV3SDjSpx4Khlv/4samDIuLDPww== X-Received: by 10.194.88.137 with SMTP id bg9mr35251321wjb.155.1470293949654; Wed, 03 Aug 2016 23:59:09 -0700 (PDT) Received: from ernst.home (p578E1507.dip0.t-ipconnect.de. [87.142.21.7]) by smtp.gmail.com with ESMTPSA id iw1sm11258657wjb.20.2016.08.03.23.59.08 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 03 Aug 2016 23:59:08 -0700 (PDT) Date: Thu, 4 Aug 2016 08:59:06 +0200 From: Gary Jennejohn To: John Baldwin Cc: current@freebsd.org Subject: Re: EARLY_AP_STARTUP hangs during boot Message-ID: <20160804085906.2228aa0b@ernst.home> In-Reply-To: <4473265.DhpanzHFA4@ralph.baldwin.cx> References: <20160516122242.39249a54@ernst.home> <9937686.eTkQvkYRyu@ralph.baldwin.cx> <20160802090310.7a03d6d0@ernst.home> <4473265.DhpanzHFA4@ralph.baldwin.cx> Reply-To: gljennjohn@gmail.com X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.29; amd64-portbld-freebsd11.0) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Aug 2016 06:59:12 -0000 On Tue, 02 Aug 2016 10:41:23 -0700 John Baldwin wrote: > On Tuesday, August 02, 2016 09:03:10 AM Gary Jennejohn wrote: > > On Mon, 01 Aug 2016 13:19:16 -0700 > > John Baldwin wrote: > > > > > On Monday, August 01, 2016 03:31:11 PM Gary Jennejohn wrote: > > > > On Mon, 1 Aug 2016 09:34:34 +0200 > > > > Gary Jennejohn wrote: > > > > > > > > > On Sun, 31 Jul 2016 14:22:35 -0700 > > > > > John Baldwin wrote: > > > > > > > > > > > On Sunday, July 31, 2016 11:29:14 AM Gary Jennejohn wrote: > > > > > > > On Sat, 30 Jul 2016 12:03:59 -0700 > > > > > > > John Baldwin wrote: > > > > > > > > > > > > > > > On Saturday, July 30, 2016 09:44:22 AM Gary Jennejohn wrote: > > > > > > > > > On Fri, 29 Jul 2016 13:17:42 -0700 > > > > > > > > > John Baldwin wrote: > > > > > > > > > > > > > > > > > > > On Thursday, July 28, 2016 12:31:31 AM Gary Jennejohn wrote: > > > > > > > > > > > Well, now I know that ULE is a prerequiste for EARLY_AP_STARTUP! I > > > > > > > > > > > wasn't aware of that. I prefer BSD and that's the scheduler I did > > > > > > > > > > > the first tests with. > > > > > > > > > > > > > > > > > > > > > > But with the ULE scheduler the system comes up all the way. > > > > > > > > > > > > > > > > > > > > > > It would be nice if the BSD scheduler could also be modified to > > > > > > > > > > > work with EARLY_AP_STARTUP. > > > > > > > > > > > > > > > > > > > > I wasn't able to reproduce your hang with 4BSD, but I think I see a > > > > > > > > > > possible problem. Try this: > > > > > > > > > > > > > > > > > > > > diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c > > > > > > > > > > index 7de56b6..d53331a 100644 > > > > > > > > > > --- a/sys/kern/sched_4bsd.c > > > > > > > > > > +++ b/sys/kern/sched_4bsd.c > > > > > > > > > > @@ -327,7 +327,6 @@ maybe_preempt(struct thread *td) > > > > > > > > > > * - The current thread has a higher (numerically lower) or > > > > > > > > > > * equivalent priority. Note that this prevents curthread from > > > > > > > > > > * trying to preempt to itself. > > > > > > > > > > - * - It is too early in the boot for context switches (cold is set). > > > > > > > > > > * - The current thread has an inhibitor set or is in the process of > > > > > > > > > > * exiting. In this case, the current thread is about to switch > > > > > > > > > > * out anyways, so there's no point in preempting. If we did, > > > > > > > > > > @@ -348,7 +347,7 @@ maybe_preempt(struct thread *td) > > > > > > > > > > ("maybe_preempt: trying to run inhibited thread")); > > > > > > > > > > pri = td->td_priority; > > > > > > > > > > cpri = ctd->td_priority; > > > > > > > > > > - if (panicstr != NULL || pri >= cpri || cold /* || dumping */ || > > > > > > > > > > + if (panicstr != NULL || pri >= cpri /* || dumping */ || > > > > > > > > > > TD_IS_INHIBITED(ctd)) > > > > > > > > > > return (0); > > > > > > > > > > #ifndef FULL_PREEMPTION > > > > > > > > > > @@ -1127,7 +1126,7 @@ forward_wakeup(int cpunum) > > > > > > > > > > if ((!forward_wakeup_enabled) || > > > > > > > > > > (forward_wakeup_use_mask == 0 && forward_wakeup_use_loop == 0)) > > > > > > > > > > return (0); > > > > > > > > > > - if (!smp_started || cold || panicstr) > > > > > > > > > > + if (!smp_started || panicstr) > > > > > > > > > > return (0); > > > > > > > > > > > > > > > > > > > > forward_wakeups_requested++; > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, but with this patch the kernel hangs in exactly the same > > > > > > > > > place as before - after the HPET output. > > > > > > > > > > > > > > > > > > Maybe I'm missing some kernel option which ULE works around, or > > > > > > > > > something like that. > > > > > > > > > > > > > > > > Hmm, ok. Please add KTR_RUNQ and KTR_SMP to the KTR masks, that is > > > > > > > > 'options KTR_COMPILE=(KTR_PROC|KTR_RUNQ|KTR_SMP)' and > > > > > > > > 'options KTR_MASK=(KTR_PROC|KTR_RUNQ|KTR_SMP)' > > > > > > > > > > > > > > > > Please also add this patch (on top of the previous patch): > > > > > > > > > > > > > > > > diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c > > > > > > > > index 2973a23..bab2278 100644 > > > > > > > > --- a/sys/kern/sched_4bsd.c > > > > > > > > +++ b/sys/kern/sched_4bsd.c > > > > > > > > @@ -1278,6 +1278,8 @@ sched_add(struct thread *td, int flags) > > > > > > > > KASSERT(td->td_flags & TDF_INMEM, > > > > > > > > ("sched_add: thread swapped out")); > > > > > > > > > > > > > > > > + CTR2(KTR_PROC, "sched_add: thread %d (%s)", td->td_tid, > > > > > > > > + sched_tdname(td)); > > > > > > > > KTR_STATE2(KTR_SCHED, "thread", sched_tdname(td), "runq add", > > > > > > > > "prio:%d", td->td_priority, KTR_ATTR_LINKED, > > > > > > > > sched_tdname(curthread)); > > > > > > > > diff --git a/sys/x86/x86/cpu_machdep.c b/sys/x86/x86/cpu_machdep.c > > > > > > > > index f07b97e..1f418f1 100644 > > > > > > > > --- a/sys/x86/x86/cpu_machdep.c > > > > > > > > +++ b/sys/x86/x86/cpu_machdep.c > > > > > > > > @@ -440,6 +440,7 @@ cpu_idle_wakeup(int cpu) > > > > > > > > return (0); > > > > > > > > if (*state == STATE_MWAIT) > > > > > > > > *state = STATE_RUNNING; > > > > > > > > + CTR1(KTR_PROC, "cpu_idle_wakeup: wokeup CPU %d", cpu); > > > > > > > > return (1); > > > > > > > > } > > > > > > > > > > > > > > > > (I haven't tried compiling it, you might have to add the sys/ktr.h > > > > > > > > header to cpu_machdep.c if it doesn't build.) > > > > > > > > > > > > > > > > Hopefully we will get some better trace messages before it hangs > > > > > > > > with this added info. The root issue seems to be that 4BSD is > > > > > > > > pinning thread0 to some other CPU (due to sched_bind that happens > > > > > > > > inside of bus_bind_intr() when the HPET driver pins IRQs to CPUs) > > > > > > > > and that other CPU isn't waking up to realize it needs to run thread0. > > > > > > > > > > > > > > > > > > > > > > It compiled with no changes needed. > > > > > > > > > > > > > > Even though I set MAXCPU to a mere 2, the boot still hadn't > > > > > > > completed after 90 minutes and I broke it off. I still have > > > > > > > the kernel, so I can try it another time when I have less need > > > > > > > for my FreeBSD box. > > > > > > > > > > > > Did you have the KTR options enabled from before? I don't expect this > > > > > > to fix the issue, this is more about getting better debug info when it > > > > > > hangs. > > > > > > > > > > > > > > > > Yes, all the options from before were enabled. Maybe I should have > > > > > disabled KTR_VERBOSE=1? I'll try it without that. > > > > > > > > > > > > > KTR_VERBOSE=1 is necessary. > > > > > > Yes. > > > > > > > OK, after about 5 hours it landed in an infinite loop emitting: > > > > > > > > cpu_0 ipi_cpu: cpu: 1 to 5 ipi: 2 (my CPU has 6 cores) > > > > > > Humm, can you capture a picture right when it hangs? Those interrupts > > > are due to clock interrupts (IPI_HARDCLOCK) and are noise. More what > > > I'm trying to see is if we send IPI_AST/IPI_PREEMPT to the CPU after > > > binding to it. > > > > > > > I can't tell when it hangs due to the amount of trace coming out. > > The hang is hidden in the noise and I have no way to suppress the > > trace that I'm aware of. The trace is coming so fast that even > > a photo of the screen looks smeared. > > > > Is there a way to limit the trace to IPI_AST/IPI_PREEMPT? > > Yes: > > diff --git a/sys/x86/x86/mp_x86.c b/sys/x86/x86/mp_x86.c > index 91c119a..6c57b20 100644 > --- a/sys/x86/x86/mp_x86.c > +++ b/sys/x86/x86/mp_x86.c > @@ -1160,7 +1160,8 @@ ipi_cpu(int cpu, u_int ipi) > if (ipi == IPI_STOP_HARD) > CPU_SET_ATOMIC(cpu, &ipi_stop_nmi_pending); > > - CTR3(KTR_SMP, "%s: cpu: %d ipi: %x", __func__, cpu, ipi); > + if (ipi == IPI_AST || ipi == IPI_PREEMPT) > + CTR3(KTR_SMP, "%s: cpu: %d ipi: %x", __func__, cpu, ipi); > ipi_send_cpu(cpu, ipi); > } > > I limited output to KTR_SMP and used only the patch above. The last lines which appear are: hpet0: iomem 0xfed00000-0xfed003ff irq 0,8 on acpi0 cpu0: smp_targeted_tlb_shootdown: cpu: 1 ipi: f6 cpu0: smp_targeted_tlb_shootdown: cpu: 2 ipi: f6 cpu0: smp_targeted_tlb_shootdown: cpu: 3 ipi: f6 cpu0: smp_targeted_tlb_shootdown: cpu: 4 ipi: f6 cpu0: smp_targeted_tlb_shootdown: cpu: 5 ipi: f6 Timecounter "HPET" frequency 14318180 Hz qualty 950 -- Gary Jennejohn