From owner-freebsd-emulation@FreeBSD.ORG Wed Apr 30 21:40:02 2014 Return-Path: Delivered-To: freebsd-emulation@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5EECEB5D for ; Wed, 30 Apr 2014 21:40:02 +0000 (UTC) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 405D419D6 for ; Wed, 30 Apr 2014 21:40:02 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.8/8.14.8) with ESMTP id s3ULe2hc074645 for ; Wed, 30 Apr 2014 21:40:02 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.8/8.14.8/Submit) id s3ULe2x4074644; Wed, 30 Apr 2014 21:40:02 GMT (envelope-from gnats) Date: Wed, 30 Apr 2014 21:40:02 GMT Message-Id: <201404302140.s3ULe2x4074644@freefall.freebsd.org> To: freebsd-emulation@FreeBSD.org Cc: From: John Baldwin Subject: Re: kern/186051: [vmware] [panic] FreeBSD 8.4+, 9.x+, 10.0 guest panic with VMWare Server on boot X-BeenThere: freebsd-emulation@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list Reply-To: John Baldwin List-Id: Development of Emulators of other operating systems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Apr 2014 21:40:02 -0000 The following reply was made to PR kern/186051; it has been noted by GNATS. From: John Baldwin To: Steven Spence Cc: bug-followup@freebsd.org Subject: Re: kern/186051: [vmware] [panic] FreeBSD 8.4+, 9.x+, 10.0 guest panic with VMWare Server on boot Date: Wed, 30 Apr 2014 17:34:28 -0400 On Wednesday, April 30, 2014 1:58:35 pm Steven Spence wrote: > On 04/30/2014 11:17 AM, John Baldwin wrote: > > On Wednesday, April 30, 2014 12:47:31 pm Steven Spence wrote: > >> On 04/30/2014 10:09 AM, John Baldwin wrote: > >>> On Tuesday, April 29, 2014 10:13:20 pm Steven Spence wrote: > >>>> On 04/29/2014 01:43 PM, John Baldwin wrote: > >>>>> On Monday, April 28, 2014 11:04:40 pm Steven Spence wrote: > >>>>>> On 04/28/2014 08:32 AM, John Baldwin wrote: > >>>>>>> On Monday, April 21, 2014 01:45:10 PM Steven Spence wrote: > >>>>>>> > >>>>>>>> Output of "sysctl machdep.idle" > >>>>>>>> machdep.idle: amdc1e > >>>>>>>> This is from a 8.3-RELEASE-p15 box. > >>>>>>> Hummm. We really shouldn't be doing anything differently. However, we do a > >>>>>>> > >>>>>>> bit more (including a wrmsr) during idle halt on your machine. Can you > >>>>>>> build > >>>>>>> > >>>>>>> a stable/8 kernel with debug symbols in an 8.3 guest and capture the panic > >>>>>>> > >>>>>>> messages from booting that kernel? > >>>>>>> > >>>>>>> > >>>>>> Here is a capture of the panic from a stable/8 kernel. Is the only > >>>>>> debugging option you are looking for in the kernel config > >>>>>> "makeoptions DEBUG=-g"? I still have the 8.3 kernel on there I can > >>>>>> boot if I need to get in and recompile the stable/8 kernel differently. > >>>>>> I am not sure how much use the information below will be to you. > >>>>>> > >>>>>> kernel trap 1 with interrupts disabled > >>>>>> Fatal trap 1: privileged instruction fault while in kernel mode > >>>>>> cpuid = 0; apic id = 00 > >>>>>> instruction pointer = 0x20:0xffffffff809c342e > >>>>>> stack pointer = 0x28:0xffffff8000211b40 > >>>>>> acd0: CDROM at ata1-master UDMA33 > >>>>>> frame pointer = 0x28:0xffffff8000211b60 > >>>>>> code segment = base 0x0, limit 0xfffff, type 0x1b > >>>>>> = DPL 0, pres 1, long 1, def32 0, gran 1 > >>>>>> processor eflags = resume, IOPL = 0 > >>>>>> current process = 11 (idle: cpu0) > >>>>>> trap number = 1 > >>>>>> panic: privileged instruction fault > >>>>>> cpuid = 0 > >>>>>> KDB: stack backtrace: > >>>>>> #0 0xffffffff8067c0b6 at kdb_backtrace+0x66 > >>>>>> #1 0xffffffff8064861e at panic+0x1ce > >>>>>> #2 0xffffffff809d3750 at trap_fatal+0x290 > >>>>>> #3 0xffffffff809d3ce5 at trap+0x105 > >>>>>> #4 0xffffffff809ba944 at calltrap+0x8 > >>>>>> #5 0xffffffff8066e08f at sched_idletd+0x11f > >>>>>> #6 0xffffffff8061ceaf at fork_exit+0x11f > >>>>>> #7 0xffffffff809bae8e at fork_trampoline+0xe > >>>>>> Uptime: 1s > >>>>>> Cannot dump. Device not defined or unavailable. > >>>>>> Automatic reboot in 15 seconds - press a key on the console to abort > >>>>>> > >>>>>> I have also tried to dump the panic to a swap device but I don't think > >>>>>> it is getting far enough in the kernel boot to initialize any hard drive > >>>>>> storage devices. > >>>>>> > >>>>>> If there is anything else I can try to get more information out of this > >>>>>> let me know. > >>>>> If you have the result of this kernel build, can you find the kernel.debug > >>>>> file it generated and run 'gdb kernel.debug' and then 'l *0xffffffff809c342e'? > >>>>> That will (hopefully) identify the exact line it panic'd on. It might also > >>>>> be useful to do 'x/i 0xffffffff809c342e' in gdb as well. > >>>>> > >>>> Below are the results of the two gdb commands: > >>>> > >>>> (gdb) l *0xffffffff809c342e > >>>> 0xffffffff809c342e is in cpu_idle_mwait (cpufunc.h:470). > >>>> 465 } > >>>> 466 > >>>> 467 static __inline void > >>>> 468 cpu_monitor(const void *addr, int extensions, int hints) > >>>> 469 { > >>>> 470 __asm __volatile("monitor;" > >>>> 471 : :"a" (addr), "c" (extensions), "d"(hints)); > >>>> 472 } > >>>> 473 > >>>> 474 static __inline void > >>>> > >>>> (gdb) x/i 0xffffffff809c342e > >>>> 0xffffffff809c342e : monitor %eax,%ecx,%edx > >>> That's interesting. It's dying on monitor, not hlt. > >>> > >>> Can you capture the CPU lines from dmesg from a working kernel? I want to see > >>> if VMWare is advertising the ability to use monitor via cpuid. > >>> > >>> Also, try setting 'machdep.idle_mwait=0' at the loader prompt before booting to > >>> see if that fixes the panic. > >>> > >> Here is the requested information: > >> > >> CPU: Quad-Core AMD Opteron(tm) Processor 2384 (2726.06-MHz K8-class CPU) > >> Origin = "AuthenticAMD" Id = 0x100f42 Family = 10 Model = 4 > >> Stepping = 2 > >> Features=0x783fbff > >> Features2=0x802009 > > Looks like it is telling the guest here it is ok to use montior ("MON" > > feature). > > > >> AMD > >> Features=0xee500800 > >> AMD > >> Features2=0x37e9 > >> TSC: P-state invariant > >> > >> Setting 'machdep.idle_mwait=0' did fix the panic. It successfully > >> booted into 8.4-STABLE with this option set. I am not sure what (if > >> any) ramifications this option causes but if there are little to none I > >> am fine with sticking this in my /boot/loader.conf and running with it. > >> If you feel there is a deeper/generic problem that still needs to be > >> worked out I can try to provide whatever information you need. > > It should be fine as a workaround. The remaining issues I can see are: > > > > 1) Should we disable monitor automatically for VMWare? > > I am not sure on this one. Did FreeBSD start using or change how it was > using this feature with kernels > 8.3? Everything worked good up to > that kernel version, even with VMWare falsely advertising that it > supports the monitor flag. I went looking at the flags the host (CentOS > 5) reports for the physical CPU and I don't see the 'monitor' flag in > there either so I am not sure where VMWare is getting the idea it is > supported. I think most CPUs support monitor nowadays. It was added in the Pentium III IIRC. I think FreeBSD did not use it by default in 8.3 and earlier. > > 2) This should be reported to the VMWare folks as it is ultimately their > > bug. If they don't support usage of 'monitor' by guest OS's, then they > > should hide it from the cpuid information. > > > > Would you be able to handle 2)? I would like to see what they say before > > adventuring too much further down the path of 1). > > I don't mind contacting VMWare about it but I am almost positive they > are going to tell me that is not a product they support any more and > that I should upgrade to ESX, vSphere, or whatever their latest > incarnation is. Newer FreeBSDs appear to work with newer VMWare > products as I didn't run across anyone else having this problem when I > first went searching for a solution. I don't think disabling a feature > that appears to work for others just because of some old corner case is > a good idea. Doubly so since there is an option to bypass the problem > for people with older VMWare installs like mine. Let me know if you > still think contacting VMWare is worth pursuing. Ahhh, ok. So it sounds like it's probably a bug that they might have already fixed. I think in that case I agree that it's probably best to document this in the PR so Google searches can find the workaround. :) -- John Baldwin