From owner-freebsd-hackers@FreeBSD.ORG  Fri Jun 13 22:31:13 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 0EE22B7;
 Fri, 13 Jun 2014 22:31:13 +0000 (UTC)
Received: from secure.freebsdsolutions.net (secure.freebsdsolutions.net
 [69.55.234.48])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id E1A952D1F;
 Fri, 13 Jun 2014 22:31:12 +0000 (UTC)
Received: from [10.10.1.198] (office.betterlinux.com [199.58.199.60])
 (authenticated bits=0)
 by secure.freebsdsolutions.net (8.14.4/8.14.4) with ESMTP id s5DMNBR8098589
 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT);
 Fri, 13 Jun 2014 18:23:12 -0400 (EDT)
 (envelope-from lists@jnielsen.net)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.2\))
Subject: Re: consistent VM hang during reboot
From: John Nielsen <lists@jnielsen.net>
In-Reply-To: <83DA2398-0004-49EC-8AC1-9AA64F33A194@jnielsen.net>
Date: Fri, 13 Jun 2014 16:23:13 -0600
Content-Transfer-Encoding: quoted-printable
Message-Id: <0238084D-FD0F-42A5-85F5-597A590E666C@jnielsen.net>
References: <BED233F2-EAFF-41A3-9C5B-869041A9AED8@jnielsen.net>
 <201405081303.17079.jhb@freebsd.org>
 <E97C3027-79CF-45F9-B5ED-3339D7AE0B5F@jnielsen.net>
 <af0f4c6348d64ab0b5ea56d2ea777e99@BY2PR05MB582.namprd05.prod.outlook.com>
 <2CCD4068-A9CB-442C-BB91-ADBF62FF22C6@jnielsen.net>
 <83DA2398-0004-49EC-8AC1-9AA64F33A194@jnielsen.net>
To: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 "freebsd-virtualization@freebsd.org" <freebsd-virtualization@freebsd.org>
X-Mailer: Apple Mail (2.1878.2)
X-DCC-MGTINTERNET-Metrics: ns1.jnielsen.net 1170; Body=2 Fuz1=2 Fuz2=2
X-Virus-Scanned: clamav-milter 0.97.8 at ns1.jnielsen.net
X-Virus-Status: Clean
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 13 Jun 2014 22:31:13 -0000

On May 13, 2014, at 9:50 AM, John Nielsen <lists@jnielsen.net> wrote:

> On May 9, 2014, at 12:41 PM, John Nielsen <lists@jnielsen.net> wrote:
>=20
>> On May 8, 2014, at 12:42 PM, Andrew Duane <aduane@juniper.net> wrote:
>>=20
>>> From: owner-freebsd-hackers@freebsd.org =
[mailto:owner-freebsd-hackers@freebsd.org] On Behalf Of John Nielsen
>>>=20
>>>> On May 8, 2014, at 11:03 AM, John Baldwin <jhb@freebsd.org> wrote:
>>>>=20
>>>>> On Wednesday, May 07, 2014 7:15:43 pm John Nielsen wrote:
>>>>>> I am trying to solve a problem with amd64 FreeBSD virtual =
machines running on a Linux+KVM hypervisor. To be honest I'm not sure if =
the problem is in FreeBSD or=20
>>>>> the hypervisor, but I'm trying to rule out the OS first.
>>>>>>=20
>>>>>> The _second_ time FreeBSD boots in a virtual machine with more =
than one core, the boot hangs just before the kernel would normally =
print e.g. "SMP: AP CPU #1=20
>>>>> Launched!" (The last line on the console is "usbus0: 12Mbps Full =
Speed USB v1.0", but the problem persists even without USB). The VM will =
boot fine a first time,=20
>>>>> but running either "shutdown -r now" OR "reboot" will lead to a =
hung second boot. Stopping and starting the host qemu-kvm process is the =
only way to continue.
>>>>>>=20
>>>>>> The problem seems to be triggered by something in the SMP portion =
of cpu_reset() (from sys/amd64/amd64/vm_machdep.c). If I hit the virtual =
"reset" button the next=20
>>>>> boot is fine. If I have 'kern.smp.disabled=3D"1"' set for the =
initial boot then subsequent boots are fine (but I can only use one CPU =
core, of course). However, if I=20
>>>>> boot normally the first time then set 'kern.smp.disabled=3D"1"' =
for the second (re)boot, the problem is triggered. Apparently something =
in the shutdown code is=20
>>>>> "poisoning the well" for the next boot.
>>>>>>=20
>>>>>> The problem is present in FreeBSD 8.4, 9.2, 10.0 and 11-CURRENT =
as of yesterday.
>>>>>>=20
>>>>>> This (heavy-handed and wrong) patch (to HEAD) lets me avoid the =
issue:
>>>>>>=20
>>>>>> --- sys/amd64/amd64/vm_machdep.c.orig	2014-05-07 =
13:19:07.400981580 -0600
>>>>>> +++ sys/amd64/amd64/vm_machdep.c	2014-05-07 17:02:52.416783795 =
-0600
>>>>>> @@ -593,7 +593,7 @@
>>>>>> void
>>>>>> cpu_reset()
>>>>>> {
>>>>>> -#ifdef SMP
>>>>>> +#if 0
>>>>>> 	cpuset_t map;
>>>>>> 	u_int cnt;
>>>>>>=20
>>>>>> I've tried skipping or disabling smaller chunks of code within =
the #if block but haven't found a consistent winner yet.
>>>>>>=20
>>>>>> I'm hoping the list will have suggestions on how I can further =
narrow down the problem, or theories on what might be going on.
>>>>>=20
>>>>> Can you try forcing the reboot to occur on the BSP (via 'cpuset -l =
0 reboot')
>>>>> or a non-BSP ('cpuset -l 1 reboot') to see if that has any effect? =
 It might
>>>>> not, but if it does it would help narrow down the code to =
consider.
>>>>=20
>>>> Hello jhb, thanks for responding.
>>>>=20
>>>> I tried your suggestion but unfortunately it does not make any =
difference. The reboot hangs regardless of which CPU I assign the =
command to.
>>>>=20
>>>> Any other suggestions?
>>>=20
>>> When I was doing some early work on some of the Octeon multi-core =
chips, I encountered something similar. If I remember correctly, there =
was an issue in the shutdown sequence that did not properly halt the =
cores and set up the "start jump" vector. So the first core would start, =
and when it tried to start the next ones it would hang waiting for the =
ACK that they were running (since they didn't have a start vector and =
hence never started). I know MIPS, not AMD, so I can't say what the =
equivalent would be, but I'm sure there is one. Check that part, setting =
up the early state.
>>>=20
>>> If Juli and/or Adrian are reading this: do you remember anything =
about that, something like 2 years ago?
>>=20
>> That does sound promising, would love more details if anyone can =
provide them.
>>=20
>> Here's another wrinkle:
>>=20
>> The KVM machine in question is part of a cluster of identical servers =
(hardware, OS, software revisions). The problem is present on all =
servers in the cluster.
>>=20
>> I also have access to a second homogenous cluster. The OS and =
software revisions on this cluster are identical to the first. The =
hardware is _nearly_ identical--slightly different mainboards from the =
same manufacturer and slightly older CPUs. The same VMs (identical disk =
image and definition, including CPU flags passed to the guest) that have =
a problem on the first cluster work flawlessly on this one.
>>=20
>> Not sure if that means the bad behavior only appears on certain CPUs =
or if it's timing-related or something else entirely. I'd welcome =
speculation at this point.
>>=20
>> CPU details below in case it makes a difference.
>>=20
>> =3D=3D Problem Host =3D=3D
>> model name      : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
>> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr =
pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe =
syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good =
nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 =
monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 =
sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand =
lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi =
flexpriority ept vpid fsgsbase smep erms
>>=20
>> =3D=3D Good Host =3D=3D
>> model name      : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
>> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr =
pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe =
syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good =
nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 =
monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 =
sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat =
epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
>=20
> Still haven't found a solution but I did learn something else =
interesting: an ACPI reboot allows the system to come back up =
successfully. What is different from the system or CPU point of view =
about an ACPI reboot versus running "reboot" or "shutdown" from =
userland?

Following up on the off chance anyone else is interested. I installed =
-HEAD on a host that was having the problem ("v2" Xeon CPU) and ran a =
FreeBSD 9 VM under bhyve. The problem did _not_ persist. That's not =
entirely conclusive but it does point the finger at Qemu a bit more =
strongly. I have filed a bug with them:
  https://bugs.launchpad.net/qemu/+bug/1329956

Still, if anyone has any ideas on what could be going on I'd love to =
hear them.

JN