Date: Wed, 14 Dec 2016 14:10:48 -0800 From: Mark Johnston <markj@freebsd.org> To: kargl@uw.edu Cc: freebsd-current@freebsd.org, kib@freebsd.org Subject: Re: Revision 309657 to stack_machdep.c renders unbootable system Message-ID: <20161214221048.GB64767@wkstn-mjohnston.west.isilon.com> In-Reply-To: <20161214201416.GA64767@wkstn-mjohnston.west.isilon.com> References: <20161214194848.GA881@troutmask.apl.washington.edu> <20161214201416.GA64767@wkstn-mjohnston.west.isilon.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Dec 14, 2016 at 12:14:16PM -0800, Mark Johnston wrote: > On Wed, Dec 14, 2016 at 11:49:26AM -0800, Steven G. Kargl wrote: > > Well, after 3 days of bisection, I finally found the commit > > that renders my system unbootable. The system does not panic. > > It simply gets stuck in some state. Nonfunctional keyboard, > > so can't break into debugger. No serial console available. > > The verbose dmesg.boot for a working kernel from revision > > 309656 is at > > > > http://troutmask.apl.washington.edu/~kargl/freebsd/dmesg.309656.txt > > > > The kernel config file is at > > > > http://troutmask.apl.washington.edu/~kargl/freebsd/SPEW.txt > > > > In looking at /usr/src/UPDATING, there is no warning that one > > can create a boat anchor by upgrading to 309657. If compiling > > a kernel with 'options DDB' is no longer supported, this should > > be stated in UPDATING. Or, UPDATING should state that 'options > > DDB' requires 'options STACK'. Or, 'options DDB' should simply > > to the right thing and pull in whatever 'option STACK' does. > > It is supported though - the point of that change was to fix a problem > that occurred when DDB is configured but STACK isn't. While testing I > tried every combination of the two options, and I just tried and > successfully booted a kernel with DDB and !STACK. > > Does the kernel boot successfully if STACK is added to your > configuration? I tried your config (plus virtio drivers) and was able to reproduce the hang in bhyve. Adding STACK "fixed" the hang, as did reverting part of my change to re-add dead code into the kernel. My VM was always hanging after printing 000.000050 [ 426] vtnet_netmap_attach virtio attached txq=1, txd=1024 rxq=1, rxd=1024 Sure enough, removing "device netmap" from your config also fixes the hang. When the hang occurs, I can see with "bhyvectl --get-rip" that we're stuck in DELAY(), but I can't get a stack at that point. I think my change is an innocent bystander - it just happened to expose a latent issue elsewhere. I don't have much more time to look at this right now, but I'll look into it more tonight.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20161214221048.GB64767>