From owner-freebsd-hackers@FreeBSD.ORG Thu May 8 17:26:10 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1318B80E; Thu, 8 May 2014 17:26:10 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id DFFA7BE3; Thu, 8 May 2014 17:26:09 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 69D71B97C; Thu, 8 May 2014 13:26:08 -0400 (EDT) From: John Baldwin To: freebsd-virtualization@freebsd.org Subject: Re: consistent VM hang during reboot Date: Thu, 8 May 2014 13:03:16 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; ) References: In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201405081303.17079.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 08 May 2014 13:26:08 -0400 (EDT) Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 May 2014 17:26:10 -0000 On Wednesday, May 07, 2014 7:15:43 pm John Nielsen wrote: > I am trying to solve a problem with amd64 FreeBSD virtual machines running on a Linux+KVM hypervisor. To be honest I'm not sure if the problem is in FreeBSD or the hypervisor, but I'm trying to rule out the OS first. > > The _second_ time FreeBSD boots in a virtual machine with more than one core, the boot hangs just before the kernel would normally print e.g. "SMP: AP CPU #1 Launched!" (The last line on the console is "usbus0: 12Mbps Full Speed USB v1.0", but the problem persists even without USB). The VM will boot fine a first time, but running either "shutdown -r now" OR "reboot" will lead to a hung second boot. Stopping and starting the host qemu-kvm process is the only way to continue. > > The problem seems to be triggered by something in the SMP portion of cpu_reset() (from sys/amd64/amd64/vm_machdep.c). If I hit the virtual "reset" button the next boot is fine. If I have 'kern.smp.disabled="1"' set for the initial boot then subsequent boots are fine (but I can only use one CPU core, of course). However, if I boot normally the first time then set 'kern.smp.disabled="1"' for the second (re)boot, the problem is triggered. Apparently something in the shutdown code is "poisoning the well" for the next boot. > > The problem is present in FreeBSD 8.4, 9.2, 10.0 and 11-CURRENT as of yesterday. > > This (heavy-handed and wrong) patch (to HEAD) lets me avoid the issue: > > --- sys/amd64/amd64/vm_machdep.c.orig 2014-05-07 13:19:07.400981580 -0600 > +++ sys/amd64/amd64/vm_machdep.c 2014-05-07 17:02:52.416783795 -0600 > @@ -593,7 +593,7 @@ > void > cpu_reset() > { > -#ifdef SMP > +#if 0 > cpuset_t map; > u_int cnt; > > I've tried skipping or disabling smaller chunks of code within the #if block but haven't found a consistent winner yet. > > I'm hoping the list will have suggestions on how I can further narrow down the problem, or theories on what might be going on. Can you try forcing the reboot to occur on the BSP (via 'cpuset -l 0 reboot') or a non-BSP ('cpuset -l 1 reboot') to see if that has any effect? It might not, but if it does it would help narrow down the code to consider. -- John Baldwin