From owner-freebsd-virtualization@FreeBSD.ORG Wed May 7 23:15:52 2014 Return-Path: Delivered-To: freebsd-virtualization@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 859D4956; Wed, 7 May 2014 23:15:52 +0000 (UTC) Received: from secure.freebsdsolutions.net (secure.freebsdsolutions.net [69.55.234.48]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 50978BDC; Wed, 7 May 2014 23:15:51 +0000 (UTC) Received: from [10.10.1.198] (office.betterlinux.com [199.58.199.60]) (authenticated bits=0) by secure.freebsdsolutions.net (8.14.4/8.14.4) with ESMTP id s47NFfkm021464 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Wed, 7 May 2014 19:15:41 -0400 (EDT) (envelope-from lists@jnielsen.net) From: John Nielsen Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: consistent VM hang during reboot Date: Wed, 7 May 2014 17:15:43 -0600 Message-Id: To: freebsd-hackers@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 7.2 \(1874\)) X-Mailer: Apple Mail (2.1874) X-DCC-x.dcc-servers-Metrics: ns1.jnielsen.net 104; Body=2 Fuz1=2 Fuz2=2 X-Virus-Scanned: clamav-milter 0.97.8 at ns1.jnielsen.net X-Virus-Status: Clean Cc: freebsd-virtualization@freebsd.org X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 May 2014 23:15:52 -0000 I am trying to solve a problem with amd64 FreeBSD virtual machines = running on a Linux+KVM hypervisor. To be honest I'm not sure if the = problem is in FreeBSD or the hypervisor, but I'm trying to rule out the = OS first. The _second_ time FreeBSD boots in a virtual machine with more than one = core, the boot hangs just before the kernel would normally print e.g. = "SMP: AP CPU #1 Launched!" (The last line on the console is "usbus0: = 12Mbps Full Speed USB v1.0", but the problem persists even without USB). = The VM will boot fine a first time, but running either "shutdown -r now" = OR "reboot" will lead to a hung second boot. Stopping and starting the = host qemu-kvm process is the only way to continue. The problem seems to be triggered by something in the SMP portion of = cpu_reset() (from sys/amd64/amd64/vm_machdep.c). If I hit the virtual = "reset" button the next boot is fine. If I have 'kern.smp.disabled=3D"1"' = set for the initial boot then subsequent boots are fine (but I can only = use one CPU core, of course). However, if I boot normally the first time = then set 'kern.smp.disabled=3D"1"' for the second (re)boot, the problem = is triggered. Apparently something in the shutdown code is "poisoning = the well" for the next boot. The problem is present in FreeBSD 8.4, 9.2, 10.0 and 11-CURRENT as of = yesterday. This (heavy-handed and wrong) patch (to HEAD) lets me avoid the issue: --- sys/amd64/amd64/vm_machdep.c.orig 2014-05-07 13:19:07.400981580 = -0600 +++ sys/amd64/amd64/vm_machdep.c 2014-05-07 17:02:52.416783795 = -0600 @@ -593,7 +593,7 @@ void cpu_reset() { -#ifdef SMP +#if 0 cpuset_t map; u_int cnt; I've tried skipping or disabling smaller chunks of code within the #if = block but haven't found a consistent winner yet. I'm hoping the list will have suggestions on how I can further narrow = down the problem, or theories on what might be going on. Thanks! JN