From owner-freebsd-stable@freebsd.org Thu May 25 13:28:57 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0FA67D811A9 for ; Thu, 25 May 2017 13:28:57 +0000 (UTC) (envelope-from mcdouga9@egr.msu.edu) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id EE8F41639 for ; Thu, 25 May 2017 13:28:56 +0000 (UTC) (envelope-from mcdouga9@egr.msu.edu) Received: by mailman.ysv.freebsd.org (Postfix) id EAFCBD811A7; Thu, 25 May 2017 13:28:56 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EA9F3D811A6 for ; Thu, 25 May 2017 13:28:56 +0000 (UTC) (envelope-from mcdouga9@egr.msu.edu) Received: from mail.egr.msu.edu (hill.egr.msu.edu [35.9.37.163]) by mx1.freebsd.org (Postfix) with ESMTP id 3D44B1637; Thu, 25 May 2017 13:28:56 +0000 (UTC) (envelope-from mcdouga9@egr.msu.edu) Received: from hill (localhost [127.0.0.1]) by mail.egr.msu.edu (Postfix) with ESMTP id C01935AC2D; Thu, 25 May 2017 09:28:55 -0400 (EDT) X-Virus-Scanned: amavisd-new at egr.msu.edu Received: from mail.egr.msu.edu ([127.0.0.1]) by hill (hill.egr.msu.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YKdSiZ5SGicZ; Thu, 25 May 2017 09:28:55 -0400 (EDT) Received: from daemon.localdomain (daemon.egr.msu.edu [35.9.44.65]) by mail.egr.msu.edu (Postfix) with ESMTP id 63DB65AC1C; Thu, 25 May 2017 09:28:55 -0400 (EDT) Received: by daemon.localdomain (Postfix, from userid 21281) id 5FCC71735D1; Thu, 25 May 2017 09:28:55 -0400 (EDT) Date: Thu, 25 May 2017 09:28:55 -0400 From: Adam McDougall To: Roger Pau =?iso-8859-1?Q?Monn=E9?= Cc: stable@freebsd.org, cperciva@freebsd.org Subject: Re: Boot hang on Xen after r318347/(310418) Message-ID: <20170525132854.GA7604@egr.msu.edu> References: <20170524223307.GS79337@egr.msu.edu> <20170525094103.iedycf2t4dy367fc@dhcp-3-128.uk.xensource.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20170525094103.iedycf2t4dy367fc@dhcp-3-128.uk.xensource.com> User-Agent: Mutt/1.8.2 (2017-04-18) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 May 2017 13:28:57 -0000 On Thu, May 25, 2017 at 10:41:03AM +0100, Roger Pau Monné wrote: > On Wed, May 24, 2017 at 06:33:07PM -0400, Adam McDougall wrote: > > Hello, > > > > Recently I made a new build of 11-STABLE but encountered a boot hang > > at this state: > > http://www.egr.msu.edu/~mcdouga9/pics/r318347-smp-hang.png > > > > It is easy to reproduce, I can just boot from any 11 or 12 ISO that > > contains the commit. > > I have just tested latest HEAD (r318861) and stable/11 (r318854) and > they both work fine on my environment (a VM with 4 vCPUs and 2GB of > RAM on OSS Xen 4.9). I'm also adding Colin in case he has some input, > he has been doing some tests on HEAD and AFAIK he hasn't seen any > issues. > > > I compiled various svn revisions to confirm that r318347 caused the > > issue and r318346 is fine. With r318347 or later including the latest > > 11-STABLE, the system will only boot with one virtual CPU in XenServer. > > Any more cpus and it hangs. I also tried a 12 kernel from head this > > afternoon and I have the same hang. I had this issue on XenServer 7 > > (Xen 4.7) and XenServer 6.5 (Xen 4.4). I did most of my testing on 7. I > > also did much of my testing with a GENERIC kernel to try to rule out > > kernel configuration mistakes. When it hangs, the performance > > monitoring in Xen tells me at least one CPU is pegged. r318674 boots > > fine on physical hardware without Xen involved. > > > > Looking at r318347 which mentions EARLY_AP_STARTUP and later seeing > > r318763 which enables EARLY_AP_STARTUP in GENERIC, I tried adding it to > > my kernel but it turned the hang into a panic but with any number of > > CPUs: > > http://www.egr.msu.edu/~mcdouga9/pics/r318347-early-ap-startup-panic.png > > I guess this is on stable/11 right? The panic looks easier to debug > that the hang, so let's start by this one. Can you enable the serial > console and kernel debug options in order to get a trace? With just > this it's almost impossible to know what went wrong. Yes this was on stable/11 amd64. > If you still have that kernel around (and it's debug symbols), can you > do: > > $ addr2line -e /usr/lib/debug/boot/kernel/kernel.debug 0xffffffff80793344 > > (The address is the instruction pointer on the crash image, I think I > got it right) I'll reproduce this soon and get the results from that command. > In order to compile a stable/11 kernel with full debugging support you > will have to add: > > # For full debugger support use (turn off in stable branch): > options BUF_TRACKING # Track buffer history > options DDB # Support DDB. > options FULL_BUF_TRACKING # Track more buffer history > options GDB # Support remote GDB. > options DEADLKRES # Enable the deadlock resolver > options INVARIANTS # Enable calls of extra sanity checking > options INVARIANT_SUPPORT # Extra sanity checks of internal structures, required by INVARIANTS > options WITNESS # Enable checks to detect deadlocks and cycles > options WITNESS_SKIPSPIN # Don't run witness on spinlocks for speed > options MALLOC_DEBUG_MAXZONES=8 # Separate malloc(9) zones > > To your kernel config file. I'll work on that soon too when I get a chance, thanks. > > Just to be sure, this is an amd64 kernel right? yes > > Roger. > _______________________________________________ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >