From owner-freebsd-stable@freebsd.org Sat Jun 3 21:18:53 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B0935BF9336 for ; Sat, 3 Jun 2017 21:18:53 +0000 (UTC) (envelope-from mcdouga9@egr.msu.edu) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 9AAD27E499 for ; Sat, 3 Jun 2017 21:18:53 +0000 (UTC) (envelope-from mcdouga9@egr.msu.edu) Received: by mailman.ysv.freebsd.org (Postfix) id 99622BF9335; Sat, 3 Jun 2017 21:18:53 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 97290BF9334 for ; Sat, 3 Jun 2017 21:18:53 +0000 (UTC) (envelope-from mcdouga9@egr.msu.edu) Received: from mail.egr.msu.edu (boomhauer.egr.msu.edu [35.9.37.164]) by mx1.freebsd.org (Postfix) with ESMTP id 68AA57E498; Sat, 3 Jun 2017 21:18:52 +0000 (UTC) (envelope-from mcdouga9@egr.msu.edu) Received: from boomhauer (localhost [127.0.0.1]) by mail.egr.msu.edu (Postfix) with ESMTP id 95AB537BE7; Sat, 3 Jun 2017 17:11:37 -0400 (EDT) X-Virus-Scanned: amavisd-new at egr.msu.edu Received: from mail.egr.msu.edu ([127.0.0.1]) by boomhauer (boomhauer.egr.msu.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JnbTpeDni_TR; Sat, 3 Jun 2017 17:11:37 -0400 (EDT) Received: from EGR authenticated sender mcdouga9 Subject: Re: Boot hang on Xen after r318347/(310418) From: Adam McDougall To: =?UTF-8?Q?Roger_Pau_Monn=c3=a9?= Cc: stable@freebsd.org References: <20170524223307.GS79337@egr.msu.edu> <20170525094103.iedycf2t4dy367fc@dhcp-3-128.uk.xensource.com> <20170525132854.GA7604@egr.msu.edu> Message-ID: Date: Sat, 3 Jun 2017 17:11:35 -0400 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: <20170525132854.GA7604@egr.msu.edu> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Jun 2017 21:18:53 -0000 On 05/25/2017 09:28, Adam McDougall wrote: > On Thu, May 25, 2017 at 10:41:03AM +0100, Roger Pau Monné wrote: > >> On Wed, May 24, 2017 at 06:33:07PM -0400, Adam McDougall wrote: >>> Hello, >>> >>> Recently I made a new build of 11-STABLE but encountered a boot hang >>> at this state: >>> http://www.egr.msu.edu/~mcdouga9/pics/r318347-smp-hang.png >>> >>> It is easy to reproduce, I can just boot from any 11 or 12 ISO that >>> contains the commit. >> >> I have just tested latest HEAD (r318861) and stable/11 (r318854) and >> they both work fine on my environment (a VM with 4 vCPUs and 2GB of >> RAM on OSS Xen 4.9). I'm also adding Colin in case he has some input, >> he has been doing some tests on HEAD and AFAIK he hasn't seen any >> issues. >> >>> I compiled various svn revisions to confirm that r318347 caused the >>> issue and r318346 is fine. With r318347 or later including the latest >>> 11-STABLE, the system will only boot with one virtual CPU in XenServer. >>> Any more cpus and it hangs. I also tried a 12 kernel from head this >>> afternoon and I have the same hang. I had this issue on XenServer 7 >>> (Xen 4.7) and XenServer 6.5 (Xen 4.4). I did most of my testing on 7. I >>> also did much of my testing with a GENERIC kernel to try to rule out >>> kernel configuration mistakes. When it hangs, the performance >>> monitoring in Xen tells me at least one CPU is pegged. r318674 boots >>> fine on physical hardware without Xen involved. >>> >>> Looking at r318347 which mentions EARLY_AP_STARTUP and later seeing >>> r318763 which enables EARLY_AP_STARTUP in GENERIC, I tried adding it to >>> my kernel but it turned the hang into a panic but with any number of >>> CPUs: >>> http://www.egr.msu.edu/~mcdouga9/pics/r318347-early-ap-startup-panic.png >> >> I guess this is on stable/11 right? The panic looks easier to debug >> that the hang, so let's start by this one. Can you enable the serial >> console and kernel debug options in order to get a trace? With just >> this it's almost impossible to know what went wrong. > > Yes this was on stable/11 amd64. > >> >> Roger. I worked on this today and the short version is recent kernels no longer hang or panic with EARLY_AP_STARTUP which includes the 20170602 iso images of 11 and 12. Adding EARLY_AP_STARTUP to my kernel config appears to prevent the hang and something between r318855 (May 24) and r319554 (today, June 3) prevents the panic. I'm tempted to figure out which commit but I already spent hours bisecting and building today, so since this seems to be a forward working solution, I'm content. Thanks.