Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 8 May 2014 11:55:53 -0600
From:      John Nielsen <lists@jnielsen.net>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-hackers@freebsd.org, freebsd-virtualization@freebsd.org
Subject:   Re: consistent VM hang during reboot
Message-ID:  <E97C3027-79CF-45F9-B5ED-3339D7AE0B5F@jnielsen.net>
In-Reply-To: <201405081303.17079.jhb@freebsd.org>
References:  <BED233F2-EAFF-41A3-9C5B-869041A9AED8@jnielsen.net> <201405081303.17079.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On May 8, 2014, at 11:03 AM, John Baldwin <jhb@freebsd.org> wrote:

> On Wednesday, May 07, 2014 7:15:43 pm John Nielsen wrote:
>> I am trying to solve a problem with amd64 FreeBSD virtual machines =
running on a Linux+KVM hypervisor. To be honest I'm not sure if the =
problem is in FreeBSD or=20
> the hypervisor, but I'm trying to rule out the OS first.
>>=20
>> The _second_ time FreeBSD boots in a virtual machine with more than =
one core, the boot hangs just before the kernel would normally print =
e.g. "SMP: AP CPU #1=20
> Launched!" (The last line on the console is "usbus0: 12Mbps Full Speed =
USB v1.0", but the problem persists even without USB). The VM will boot =
fine a first time,=20
> but running either "shutdown -r now" OR "reboot" will lead to a hung =
second boot. Stopping and starting the host qemu-kvm process is the only =
way to continue.
>>=20
>> The problem seems to be triggered by something in the SMP portion of =
cpu_reset() (from sys/amd64/amd64/vm_machdep.c). If I hit the virtual =
"reset" button the next=20
> boot is fine. If I have 'kern.smp.disabled=3D"1"' set for the initial =
boot then subsequent boots are fine (but I can only use one CPU core, of =
course). However, if I=20
> boot normally the first time then set 'kern.smp.disabled=3D"1"' for =
the second (re)boot, the problem is triggered. Apparently something in =
the shutdown code is=20
> "poisoning the well" for the next boot.
>>=20
>> The problem is present in FreeBSD 8.4, 9.2, 10.0 and 11-CURRENT as of =
yesterday.
>>=20
>> This (heavy-handed and wrong) patch (to HEAD) lets me avoid the =
issue:
>>=20
>> --- sys/amd64/amd64/vm_machdep.c.orig	2014-05-07 =
13:19:07.400981580 -0600
>> +++ sys/amd64/amd64/vm_machdep.c	2014-05-07 17:02:52.416783795 =
-0600
>> @@ -593,7 +593,7 @@
>> void
>> cpu_reset()
>> {
>> -#ifdef SMP
>> +#if 0
>> 	cpuset_t map;
>> 	u_int cnt;
>>=20
>> I've tried skipping or disabling smaller chunks of code within the =
#if block but haven't found a consistent winner yet.
>>=20
>> I'm hoping the list will have suggestions on how I can further narrow =
down the problem, or theories on what might be going on.
>=20
> Can you try forcing the reboot to occur on the BSP (via 'cpuset -l 0 =
reboot')
> or a non-BSP ('cpuset -l 1 reboot') to see if that has any effect?  It =
might
> not, but if it does it would help narrow down the code to consider.

Hello jhb, thanks for responding.

I tried your suggestion but unfortunately it does not make any =
difference. The reboot hangs regardless of which CPU I assign the =
command to.

Any other suggestions?

JN




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E97C3027-79CF-45F9-B5ED-3339D7AE0B5F>