From owner-freebsd-current@freebsd.org Tue May 31 20:10:12 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 589ADB58BBD for ; Tue, 31 May 2016 20:10:12 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 46B481808 for ; Tue, 31 May 2016 20:10:12 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: by mailman.ysv.freebsd.org (Postfix) id 42777B58BBC; Tue, 31 May 2016 20:10:12 +0000 (UTC) Delivered-To: current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 42254B58BBB for ; Tue, 31 May 2016 20:10:12 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1DA081807 for ; Tue, 31 May 2016 20:10:12 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 4A19BB977; Tue, 31 May 2016 16:10:10 -0400 (EDT) From: John Baldwin To: gljennjohn@gmail.com Cc: current@freebsd.org Subject: Re: EARLY_AP_STARTUP hangs during boot Date: Tue, 31 May 2016 13:10:06 -0700 Message-ID: <8812233.S6jxPboLEa@ralph.baldwin.cx> User-Agent: KMail/4.14.3 (FreeBSD/10.2-STABLE; KDE/4.14.3; amd64; ; ) In-Reply-To: <20160528141141.232185a9@ernst.home> References: <20160516122242.39249a54@ernst.home> <20160527095005.0e0dc1be@ernst.home> <20160528141141.232185a9@ernst.home> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 31 May 2016 16:10:10 -0400 (EDT) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 May 2016 20:10:12 -0000 On Saturday, May 28, 2016 02:11:41 PM Gary Jennejohn wrote: > On Fri, 27 May 2016 09:50:05 +0200 > Gary Jennejohn wrote: > > > On Thu, 26 May 2016 16:54:35 -0700 > > John Baldwin wrote: > > > > > On Tuesday, May 17, 2016 06:47:41 PM Gary Jennejohn wrote: > > > > On Mon, 16 May 2016 10:54:19 -0700 > > > > John Baldwin wrote: > > > > > > > > > On Monday, May 16, 2016 12:22:42 PM Gary Jennejohn wrote: > > > > > > I tried out EARLY_AP_STARTUP, but the kernel hangs and I can't > > > > > > break into DDB. > > > > > > > > > > > > I did a verbose boot and the last lines I see are related to routing > > > > > > MSI-X to various local APIC vectors. I copied the last few lines and > > > > > > they look like this: > > > > > > > > > > > > msi: routing MSI-X IRQ 256 to local APIC 2 vector 48 > > > > > > msi: routing MSI-X IRQ 257 to local APIC 3 vector 48 > > > > > > msi: routing MSI-X IRQ 258 to local APIC 4 vector 48 > > > > > > msi: routing MSI-X IRQ 256 to local APIC 0 vector 49 > > > > ^^^^^^^ Assigning > > > > > > > > > > > > I tried disabling msi and msix in /boot/loader.conf, but the settings > > > > > > were ignored (probabaly too early). > > > > > > > > > > No, those settings are not too early. However, the routing to different > > > > > CPUs now happens earlier than it used to. What is the line before the > > > > > MSI lines? You can take a picture with your phone/camera if that's simplest. > > > > > > > > > > > > > Here a few lines before the MSI routing happens: > > > > > > > > hpet0: iomem 0xfed00000-0xfed003ff irq 0,8 on acpi0 > > > > hpet0: vendor 0x4353, rev 0x1, 14318180 Hz, 3 timers, legacy route > > > > hpet0: t0 : irqs 0x00c0ff (0), MSI, periodic > > > > hpet0: t1 : irqs 0x00c0ff (0), MSI, periodic > > > > hpet0: t2 : irqs 0x00c0ff (0), MSI, periodic > > > > Timecounter "HPET" frequency 14318180 Hz quality 950 > > > > > > The assigning message means it is in the loop using > > > bus_bind_intr() to setup per-CPU timers. Can you please try > > > setting 'hint.hpet.0.per_cpu=0' at the loader prompt to see if > > > disabling the use of per-CPU timers allows you to boot? > > > > > > > Something has changed since the last time I generated a kernel with > > this option. > > > > Now I get a NULL-pointer dereference in the kernel, doesn't matter > > whether I set the hint or not. > > > > OK, now that the startup has been fixed, I tried setting the hint at > the loader prompt, but the kenel hangs in exactly the same place as > before. I actually booted twice to make certain I hadn't made a > typo when setting the hint. Humm, it shouldn't be calling bus_bind_intr() if the hint is set. Actually, I guess it just binds them all to first CPU if per-CPU timers aren't set. Can you add debug printfs to hpet_attach() in sys/dev/acpica/acpi_hpet.c to narrow down which line in that function it hangs after? Another option to try is to add the following to your kernel config: options KTR options KTR_COMPILE=KTR_PROC options KTR_MASK=KTR_PROC options KTR_VERBOSE=1 this will spew a lot of crap to the screen, but if it stops spewing when it hangs then it might be tell us where the system is hung. If you have any way to configure a serial console then this would also be useful even if it spews constantly when it is hung (assuming you could log the output of the serial console). -- John Baldwin