From owner-freebsd-virtualization@FreeBSD.ORG Thu May 8 17:55:57 2014 Return-Path: Delivered-To: freebsd-virtualization@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 58B1461B; Thu, 8 May 2014 17:55:57 +0000 (UTC) Received: from secure.freebsdsolutions.net (secure.freebsdsolutions.net [69.55.234.48]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 216DDE61; Thu, 8 May 2014 17:55:56 +0000 (UTC) Received: from [10.10.1.198] (office.betterlinux.com [199.58.199.60]) (authenticated bits=0) by secure.freebsdsolutions.net (8.14.4/8.14.4) with ESMTP id s48HtqfL029562 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Thu, 8 May 2014 13:55:53 -0400 (EDT) (envelope-from lists@jnielsen.net) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 7.2 \(1874\)) Subject: Re: consistent VM hang during reboot From: John Nielsen In-Reply-To: <201405081303.17079.jhb@freebsd.org> Date: Thu, 8 May 2014 11:55:53 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: References: <201405081303.17079.jhb@freebsd.org> To: John Baldwin X-Mailer: Apple Mail (2.1874) X-DCC-sonic.net-Metrics: ns1.jnielsen.net 1156; Body=3 Fuz1=3 Fuz2=3 X-Virus-Scanned: clamav-milter 0.97.8 at ns1.jnielsen.net X-Virus-Status: Clean Cc: freebsd-hackers@freebsd.org, freebsd-virtualization@freebsd.org X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 May 2014 17:55:57 -0000 On May 8, 2014, at 11:03 AM, John Baldwin wrote: > On Wednesday, May 07, 2014 7:15:43 pm John Nielsen wrote: >> I am trying to solve a problem with amd64 FreeBSD virtual machines = running on a Linux+KVM hypervisor. To be honest I'm not sure if the = problem is in FreeBSD or=20 > the hypervisor, but I'm trying to rule out the OS first. >>=20 >> The _second_ time FreeBSD boots in a virtual machine with more than = one core, the boot hangs just before the kernel would normally print = e.g. "SMP: AP CPU #1=20 > Launched!" (The last line on the console is "usbus0: 12Mbps Full Speed = USB v1.0", but the problem persists even without USB). The VM will boot = fine a first time,=20 > but running either "shutdown -r now" OR "reboot" will lead to a hung = second boot. Stopping and starting the host qemu-kvm process is the only = way to continue. >>=20 >> The problem seems to be triggered by something in the SMP portion of = cpu_reset() (from sys/amd64/amd64/vm_machdep.c). If I hit the virtual = "reset" button the next=20 > boot is fine. If I have 'kern.smp.disabled=3D"1"' set for the initial = boot then subsequent boots are fine (but I can only use one CPU core, of = course). However, if I=20 > boot normally the first time then set 'kern.smp.disabled=3D"1"' for = the second (re)boot, the problem is triggered. Apparently something in = the shutdown code is=20 > "poisoning the well" for the next boot. >>=20 >> The problem is present in FreeBSD 8.4, 9.2, 10.0 and 11-CURRENT as of = yesterday. >>=20 >> This (heavy-handed and wrong) patch (to HEAD) lets me avoid the = issue: >>=20 >> --- sys/amd64/amd64/vm_machdep.c.orig 2014-05-07 = 13:19:07.400981580 -0600 >> +++ sys/amd64/amd64/vm_machdep.c 2014-05-07 17:02:52.416783795 = -0600 >> @@ -593,7 +593,7 @@ >> void >> cpu_reset() >> { >> -#ifdef SMP >> +#if 0 >> cpuset_t map; >> u_int cnt; >>=20 >> I've tried skipping or disabling smaller chunks of code within the = #if block but haven't found a consistent winner yet. >>=20 >> I'm hoping the list will have suggestions on how I can further narrow = down the problem, or theories on what might be going on. >=20 > Can you try forcing the reboot to occur on the BSP (via 'cpuset -l 0 = reboot') > or a non-BSP ('cpuset -l 1 reboot') to see if that has any effect? It = might > not, but if it does it would help narrow down the code to consider. Hello jhb, thanks for responding. I tried your suggestion but unfortunately it does not make any = difference. The reboot hangs regardless of which CPU I assign the = command to. Any other suggestions? JN