From owner-freebsd-hackers@FreeBSD.ORG Fri Jun 13 22:31:13 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0EE22B7; Fri, 13 Jun 2014 22:31:13 +0000 (UTC) Received: from secure.freebsdsolutions.net (secure.freebsdsolutions.net [69.55.234.48]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E1A952D1F; Fri, 13 Jun 2014 22:31:12 +0000 (UTC) Received: from [10.10.1.198] (office.betterlinux.com [199.58.199.60]) (authenticated bits=0) by secure.freebsdsolutions.net (8.14.4/8.14.4) with ESMTP id s5DMNBR8098589 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Fri, 13 Jun 2014 18:23:12 -0400 (EDT) (envelope-from lists@jnielsen.net) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.2\)) Subject: Re: consistent VM hang during reboot From: John Nielsen In-Reply-To: <83DA2398-0004-49EC-8AC1-9AA64F33A194@jnielsen.net> Date: Fri, 13 Jun 2014 16:23:13 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: <0238084D-FD0F-42A5-85F5-597A590E666C@jnielsen.net> References: <201405081303.17079.jhb@freebsd.org> <2CCD4068-A9CB-442C-BB91-ADBF62FF22C6@jnielsen.net> <83DA2398-0004-49EC-8AC1-9AA64F33A194@jnielsen.net> To: "freebsd-hackers@freebsd.org" , "freebsd-virtualization@freebsd.org" X-Mailer: Apple Mail (2.1878.2) X-DCC-MGTINTERNET-Metrics: ns1.jnielsen.net 1170; Body=2 Fuz1=2 Fuz2=2 X-Virus-Scanned: clamav-milter 0.97.8 at ns1.jnielsen.net X-Virus-Status: Clean X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Jun 2014 22:31:13 -0000 On May 13, 2014, at 9:50 AM, John Nielsen wrote: > On May 9, 2014, at 12:41 PM, John Nielsen wrote: >=20 >> On May 8, 2014, at 12:42 PM, Andrew Duane wrote: >>=20 >>> From: owner-freebsd-hackers@freebsd.org = [mailto:owner-freebsd-hackers@freebsd.org] On Behalf Of John Nielsen >>>=20 >>>> On May 8, 2014, at 11:03 AM, John Baldwin wrote: >>>>=20 >>>>> On Wednesday, May 07, 2014 7:15:43 pm John Nielsen wrote: >>>>>> I am trying to solve a problem with amd64 FreeBSD virtual = machines running on a Linux+KVM hypervisor. To be honest I'm not sure if = the problem is in FreeBSD or=20 >>>>> the hypervisor, but I'm trying to rule out the OS first. >>>>>>=20 >>>>>> The _second_ time FreeBSD boots in a virtual machine with more = than one core, the boot hangs just before the kernel would normally = print e.g. "SMP: AP CPU #1=20 >>>>> Launched!" (The last line on the console is "usbus0: 12Mbps Full = Speed USB v1.0", but the problem persists even without USB). The VM will = boot fine a first time,=20 >>>>> but running either "shutdown -r now" OR "reboot" will lead to a = hung second boot. Stopping and starting the host qemu-kvm process is the = only way to continue. >>>>>>=20 >>>>>> The problem seems to be triggered by something in the SMP portion = of cpu_reset() (from sys/amd64/amd64/vm_machdep.c). If I hit the virtual = "reset" button the next=20 >>>>> boot is fine. If I have 'kern.smp.disabled=3D"1"' set for the = initial boot then subsequent boots are fine (but I can only use one CPU = core, of course). However, if I=20 >>>>> boot normally the first time then set 'kern.smp.disabled=3D"1"' = for the second (re)boot, the problem is triggered. Apparently something = in the shutdown code is=20 >>>>> "poisoning the well" for the next boot. >>>>>>=20 >>>>>> The problem is present in FreeBSD 8.4, 9.2, 10.0 and 11-CURRENT = as of yesterday. >>>>>>=20 >>>>>> This (heavy-handed and wrong) patch (to HEAD) lets me avoid the = issue: >>>>>>=20 >>>>>> --- sys/amd64/amd64/vm_machdep.c.orig 2014-05-07 = 13:19:07.400981580 -0600 >>>>>> +++ sys/amd64/amd64/vm_machdep.c 2014-05-07 17:02:52.416783795 = -0600 >>>>>> @@ -593,7 +593,7 @@ >>>>>> void >>>>>> cpu_reset() >>>>>> { >>>>>> -#ifdef SMP >>>>>> +#if 0 >>>>>> cpuset_t map; >>>>>> u_int cnt; >>>>>>=20 >>>>>> I've tried skipping or disabling smaller chunks of code within = the #if block but haven't found a consistent winner yet. >>>>>>=20 >>>>>> I'm hoping the list will have suggestions on how I can further = narrow down the problem, or theories on what might be going on. >>>>>=20 >>>>> Can you try forcing the reboot to occur on the BSP (via 'cpuset -l = 0 reboot') >>>>> or a non-BSP ('cpuset -l 1 reboot') to see if that has any effect? = It might >>>>> not, but if it does it would help narrow down the code to = consider. >>>>=20 >>>> Hello jhb, thanks for responding. >>>>=20 >>>> I tried your suggestion but unfortunately it does not make any = difference. The reboot hangs regardless of which CPU I assign the = command to. >>>>=20 >>>> Any other suggestions? >>>=20 >>> When I was doing some early work on some of the Octeon multi-core = chips, I encountered something similar. If I remember correctly, there = was an issue in the shutdown sequence that did not properly halt the = cores and set up the "start jump" vector. So the first core would start, = and when it tried to start the next ones it would hang waiting for the = ACK that they were running (since they didn't have a start vector and = hence never started). I know MIPS, not AMD, so I can't say what the = equivalent would be, but I'm sure there is one. Check that part, setting = up the early state. >>>=20 >>> If Juli and/or Adrian are reading this: do you remember anything = about that, something like 2 years ago? >>=20 >> That does sound promising, would love more details if anyone can = provide them. >>=20 >> Here's another wrinkle: >>=20 >> The KVM machine in question is part of a cluster of identical servers = (hardware, OS, software revisions). The problem is present on all = servers in the cluster. >>=20 >> I also have access to a second homogenous cluster. The OS and = software revisions on this cluster are identical to the first. The = hardware is _nearly_ identical--slightly different mainboards from the = same manufacturer and slightly older CPUs. The same VMs (identical disk = image and definition, including CPU flags passed to the guest) that have = a problem on the first cluster work flawlessly on this one. >>=20 >> Not sure if that means the bad behavior only appears on certain CPUs = or if it's timing-related or something else entirely. I'd welcome = speculation at this point. >>=20 >> CPU details below in case it makes a difference. >>=20 >> =3D=3D Problem Host =3D=3D >> model name : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz >> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr = pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe = syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good = nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 = monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 = sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand = lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi = flexpriority ept vpid fsgsbase smep erms >>=20 >> =3D=3D Good Host =3D=3D >> model name : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz >> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr = pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe = syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good = nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 = monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 = sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat = epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid >=20 > Still haven't found a solution but I did learn something else = interesting: an ACPI reboot allows the system to come back up = successfully. What is different from the system or CPU point of view = about an ACPI reboot versus running "reboot" or "shutdown" from = userland? Following up on the off chance anyone else is interested. I installed = -HEAD on a host that was having the problem ("v2" Xeon CPU) and ran a = FreeBSD 9 VM under bhyve. The problem did _not_ persist. That's not = entirely conclusive but it does point the finger at Qemu a bit more = strongly. I have filed a bug with them: https://bugs.launchpad.net/qemu/+bug/1329956 Still, if anyone has any ideas on what could be going on I'd love to = hear them. JN