Date: Thu, 25 May 2017 09:28:55 -0400 From: Adam McDougall <mcdouga9@egr.msu.edu> To: Roger Pau =?iso-8859-1?Q?Monn=E9?= <royger@FreeBSD.org> Cc: stable@freebsd.org, cperciva@freebsd.org Subject: Re: Boot hang on Xen after r318347/(310418) Message-ID: <20170525132854.GA7604@egr.msu.edu> In-Reply-To: <20170525094103.iedycf2t4dy367fc@dhcp-3-128.uk.xensource.com> References: <20170524223307.GS79337@egr.msu.edu> <20170525094103.iedycf2t4dy367fc@dhcp-3-128.uk.xensource.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, May 25, 2017 at 10:41:03AM +0100, Roger Pau Monné wrote: > On Wed, May 24, 2017 at 06:33:07PM -0400, Adam McDougall wrote: > > Hello, > > > > Recently I made a new build of 11-STABLE but encountered a boot hang > > at this state: > > http://www.egr.msu.edu/~mcdouga9/pics/r318347-smp-hang.png > > > > It is easy to reproduce, I can just boot from any 11 or 12 ISO that > > contains the commit. > > I have just tested latest HEAD (r318861) and stable/11 (r318854) and > they both work fine on my environment (a VM with 4 vCPUs and 2GB of > RAM on OSS Xen 4.9). I'm also adding Colin in case he has some input, > he has been doing some tests on HEAD and AFAIK he hasn't seen any > issues. > > > I compiled various svn revisions to confirm that r318347 caused the > > issue and r318346 is fine. With r318347 or later including the latest > > 11-STABLE, the system will only boot with one virtual CPU in XenServer. > > Any more cpus and it hangs. I also tried a 12 kernel from head this > > afternoon and I have the same hang. I had this issue on XenServer 7 > > (Xen 4.7) and XenServer 6.5 (Xen 4.4). I did most of my testing on 7. I > > also did much of my testing with a GENERIC kernel to try to rule out > > kernel configuration mistakes. When it hangs, the performance > > monitoring in Xen tells me at least one CPU is pegged. r318674 boots > > fine on physical hardware without Xen involved. > > > > Looking at r318347 which mentions EARLY_AP_STARTUP and later seeing > > r318763 which enables EARLY_AP_STARTUP in GENERIC, I tried adding it to > > my kernel but it turned the hang into a panic but with any number of > > CPUs: > > http://www.egr.msu.edu/~mcdouga9/pics/r318347-early-ap-startup-panic.png > > I guess this is on stable/11 right? The panic looks easier to debug > that the hang, so let's start by this one. Can you enable the serial > console and kernel debug options in order to get a trace? With just > this it's almost impossible to know what went wrong. Yes this was on stable/11 amd64. > If you still have that kernel around (and it's debug symbols), can you > do: > > $ addr2line -e /usr/lib/debug/boot/kernel/kernel.debug 0xffffffff80793344 > > (The address is the instruction pointer on the crash image, I think I > got it right) I'll reproduce this soon and get the results from that command. > In order to compile a stable/11 kernel with full debugging support you > will have to add: > > # For full debugger support use (turn off in stable branch): > options BUF_TRACKING # Track buffer history > options DDB # Support DDB. > options FULL_BUF_TRACKING # Track more buffer history > options GDB # Support remote GDB. > options DEADLKRES # Enable the deadlock resolver > options INVARIANTS # Enable calls of extra sanity checking > options INVARIANT_SUPPORT # Extra sanity checks of internal structures, required by INVARIANTS > options WITNESS # Enable checks to detect deadlocks and cycles > options WITNESS_SKIPSPIN # Don't run witness on spinlocks for speed > options MALLOC_DEBUG_MAXZONES=8 # Separate malloc(9) zones > > To your kernel config file. I'll work on that soon too when I get a chance, thanks. > > Just to be sure, this is an amd64 kernel right? yes > > Roger. > _______________________________________________ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170525132854.GA7604>