Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 9 May 2014 12:41:50 -0600
From:      John Nielsen <lists@jnielsen.net>
To:        Andrew Duane <aduane@juniper.net>
Cc:        "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>, "freebsd-virtualization@freebsd.org" <freebsd-virtualization@freebsd.org>
Subject:   Re: consistent VM hang during reboot
Message-ID:  <2CCD4068-A9CB-442C-BB91-ADBF62FF22C6@jnielsen.net>
In-Reply-To: <af0f4c6348d64ab0b5ea56d2ea777e99@BY2PR05MB582.namprd05.prod.outlook.com>
References:  <BED233F2-EAFF-41A3-9C5B-869041A9AED8@jnielsen.net> <201405081303.17079.jhb@freebsd.org> <E97C3027-79CF-45F9-B5ED-3339D7AE0B5F@jnielsen.net> <af0f4c6348d64ab0b5ea56d2ea777e99@BY2PR05MB582.namprd05.prod.outlook.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On May 8, 2014, at 12:42 PM, Andrew Duane <aduane@juniper.net> wrote:

> From: owner-freebsd-hackers@freebsd.org =
[mailto:owner-freebsd-hackers@freebsd.org] On Behalf Of John Nielsen
>=20
>> On May 8, 2014, at 11:03 AM, John Baldwin <jhb@freebsd.org> wrote:
>>=20
>>> On Wednesday, May 07, 2014 7:15:43 pm John Nielsen wrote:
>>>> I am trying to solve a problem with amd64 FreeBSD virtual machines =
running on a Linux+KVM hypervisor. To be honest I'm not sure if the =
problem is in FreeBSD or=20
>>> the hypervisor, but I'm trying to rule out the OS first.
>>>>=20
>>>> The _second_ time FreeBSD boots in a virtual machine with more than =
one core, the boot hangs just before the kernel would normally print =
e.g. "SMP: AP CPU #1=20
>>> Launched!" (The last line on the console is "usbus0: 12Mbps Full =
Speed USB v1.0", but the problem persists even without USB). The VM will =
boot fine a first time,=20
>>> but running either "shutdown -r now" OR "reboot" will lead to a hung =
second boot. Stopping and starting the host qemu-kvm process is the only =
way to continue.
>>>>=20
>>>> The problem seems to be triggered by something in the SMP portion =
of cpu_reset() (from sys/amd64/amd64/vm_machdep.c). If I hit the virtual =
"reset" button the next=20
>>> boot is fine. If I have 'kern.smp.disabled=3D"1"' set for the =
initial boot then subsequent boots are fine (but I can only use one CPU =
core, of course). However, if I=20
>>> boot normally the first time then set 'kern.smp.disabled=3D"1"' for =
the second (re)boot, the problem is triggered. Apparently something in =
the shutdown code is=20
>>> "poisoning the well" for the next boot.
>>>>=20
>>>> The problem is present in FreeBSD 8.4, 9.2, 10.0 and 11-CURRENT as =
of yesterday.
>>>>=20
>>>> This (heavy-handed and wrong) patch (to HEAD) lets me avoid the =
issue:
>>>>=20
>>>> --- sys/amd64/amd64/vm_machdep.c.orig	2014-05-07 =
13:19:07.400981580 -0600
>>>> +++ sys/amd64/amd64/vm_machdep.c	2014-05-07 17:02:52.416783795 =
-0600
>>>> @@ -593,7 +593,7 @@
>>>> void
>>>> cpu_reset()
>>>> {
>>>> -#ifdef SMP
>>>> +#if 0
>>>> 	cpuset_t map;
>>>> 	u_int cnt;
>>>>=20
>>>> I've tried skipping or disabling smaller chunks of code within the =
#if block but haven't found a consistent winner yet.
>>>>=20
>>>> I'm hoping the list will have suggestions on how I can further =
narrow down the problem, or theories on what might be going on.
>>>=20
>>> Can you try forcing the reboot to occur on the BSP (via 'cpuset -l 0 =
reboot')
>>> or a non-BSP ('cpuset -l 1 reboot') to see if that has any effect?  =
It might
>>> not, but if it does it would help narrow down the code to consider.
>>=20
>> Hello jhb, thanks for responding.
>>=20
>> I tried your suggestion but unfortunately it does not make any =
difference. The reboot hangs regardless of which CPU I assign the =
command to.
>>=20
>> Any other suggestions?
>=20
> When I was doing some early work on some of the Octeon multi-core =
chips, I encountered something similar. If I remember correctly, there =
was an issue in the shutdown sequence that did not properly halt the =
cores and set up the "start jump" vector. So the first core would start, =
and when it tried to start the next ones it would hang waiting for the =
ACK that they were running (since they didn't have a start vector and =
hence never started). I know MIPS, not AMD, so I can't say what the =
equivalent would be, but I'm sure there is one. Check that part, setting =
up the early state.
>=20
> If Juli and/or Adrian are reading this: do you remember anything about =
that, something like 2 years ago?

That does sound promising, would love more details if anyone can provide =
them.

Here's another wrinkle:

The KVM machine in question is part of a cluster of identical servers =
(hardware, OS, software revisions). The problem is present on all =
servers in the cluster.

I also have access to a second homogenous cluster. The OS and software =
revisions on this cluster are identical to the first. The hardware is =
_nearly_ identical--slightly different mainboards from the same =
manufacturer and slightly older CPUs. The same VMs (identical disk image =
and definition, including CPU flags passed to the guest) that have a =
problem on the first cluster work flawlessly on this one.

Not sure if that means the bad behavior only appears on certain CPUs or =
if it's timing-related or something else entirely. I'd welcome =
speculation at this point.

CPU details below in case it makes a difference.

=3D=3D Problem Host =3D=3D
model name      : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge =
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe =
syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good =
nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 =
monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 =
sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand =
lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi =
flexpriority ept vpid fsgsbase smep erms

=3D=3D Good Host =3D=3D
model name      : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge =
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe =
syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good =
nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 =
monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 =
sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat =
epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid

Thanks,

JN




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2CCD4068-A9CB-442C-BB91-ADBF62FF22C6>