Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 8 May 2014 13:03:16 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-virtualization@freebsd.org
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: consistent VM hang during reboot
Message-ID:  <201405081303.17079.jhb@freebsd.org>
In-Reply-To: <BED233F2-EAFF-41A3-9C5B-869041A9AED8@jnielsen.net>
References:  <BED233F2-EAFF-41A3-9C5B-869041A9AED8@jnielsen.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday, May 07, 2014 7:15:43 pm John Nielsen wrote:
> I am trying to solve a problem with amd64 FreeBSD virtual machines running on a Linux+KVM hypervisor. To be honest I'm not sure if the problem is in FreeBSD or 
the hypervisor, but I'm trying to rule out the OS first.
> 
> The _second_ time FreeBSD boots in a virtual machine with more than one core, the boot hangs just before the kernel would normally print e.g. "SMP: AP CPU #1 
Launched!" (The last line on the console is "usbus0: 12Mbps Full Speed USB v1.0", but the problem persists even without USB). The VM will boot fine a first time, 
but running either "shutdown -r now" OR "reboot" will lead to a hung second boot. Stopping and starting the host qemu-kvm process is the only way to continue.
> 
> The problem seems to be triggered by something in the SMP portion of cpu_reset() (from sys/amd64/amd64/vm_machdep.c). If I hit the virtual "reset" button the next 
boot is fine. If I have 'kern.smp.disabled="1"' set for the initial boot then subsequent boots are fine (but I can only use one CPU core, of course). However, if I 
boot normally the first time then set 'kern.smp.disabled="1"' for the second (re)boot, the problem is triggered. Apparently something in the shutdown code is 
"poisoning the well" for the next boot.
> 
> The problem is present in FreeBSD 8.4, 9.2, 10.0 and 11-CURRENT as of yesterday.
> 
> This (heavy-handed and wrong) patch (to HEAD) lets me avoid the issue:
> 
> --- sys/amd64/amd64/vm_machdep.c.orig	2014-05-07 13:19:07.400981580 -0600
> +++ sys/amd64/amd64/vm_machdep.c	2014-05-07 17:02:52.416783795 -0600
> @@ -593,7 +593,7 @@
>  void
>  cpu_reset()
>  {
> -#ifdef SMP
> +#if 0
>  	cpuset_t map;
>  	u_int cnt;
> 
> I've tried skipping or disabling smaller chunks of code within the #if block but haven't found a consistent winner yet.
> 
> I'm hoping the list will have suggestions on how I can further narrow down the problem, or theories on what might be going on.

Can you try forcing the reboot to occur on the BSP (via 'cpuset -l 0 reboot')
or a non-BSP ('cpuset -l 1 reboot') to see if that has any effect?  It might
not, but if it does it would help narrow down the code to consider.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201405081303.17079.jhb>