From owner-svn-src-all@FreeBSD.ORG Sun Dec 28 21:27:17 2014 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EC550AD1; Sun, 28 Dec 2014 21:27:17 +0000 (UTC) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:1900:2254:2068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D4E3D184B; Sun, 28 Dec 2014 21:27:17 +0000 (UTC) Received: from svn.freebsd.org ([127.0.1.70]) by svn.freebsd.org (8.14.9/8.14.9) with ESMTP id sBSLRHVa087210; Sun, 28 Dec 2014 21:27:17 GMT (envelope-from neel@FreeBSD.org) Received: (from neel@localhost) by svn.freebsd.org (8.14.9/8.14.9/Submit) id sBSLRE5n087198; Sun, 28 Dec 2014 21:27:14 GMT (envelope-from neel@FreeBSD.org) Message-Id: <201412282127.sBSLRE5n087198@svn.freebsd.org> X-Authentication-Warning: svn.freebsd.org: neel set sender to neel@FreeBSD.org using -f From: Neel Natu Date: Sun, 28 Dec 2014 21:27:14 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-10@freebsd.org Subject: svn commit: r276349 - in stable/10: sys/amd64/include sys/amd64/vmm sys/amd64/vmm/intel sys/amd64/vmm/io sys/modules/vmm sys/x86/include usr.sbin/bhyve usr.sbin/bhyvectl X-SVN-Group: stable-10 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Dec 2014 21:27:18 -0000 Author: neel Date: Sun Dec 28 21:27:13 2014 New Revision: 276349 URL: https://svnweb.freebsd.org/changeset/base/276349 Log: MFC r270326 Fix a recursive lock acquisition in vi_reset_dev(). MFC r270434 Return the spurious interrupt vector (IRQ7 or IRQ15) if the atpic cannot find any unmasked pin with an interrupt asserted. MFC r270436 Fix a bug in the emulation of CPUID leaf 0x4. MFC r270437 Add "hw.vmm.topology.threads_per_core" and "hw.vmm.topology.cores_per_package" tunables to modify the default cpu topology advertised by bhyve. MFC r270855 Set the 'inst_length' to '0' early on before any error conditions are detected in the emulation of the task switch. If any exceptions are triggered then the guest %rip should point to instruction that caused the task switch as opposed to the one after it. MFC r270857 The "SUB" instruction used in getcc() actually does 'x -= y' so use the proper constraint for 'x'. The "+r" constraint indicates that 'x' is an input and output register operand. While here generate code for different variants of getcc() using a macro GETCC(sz) where 'sz' indicates the operand size. Update the status bits in %rflags when emulating AND and OR opcodes. MFC r271439 Initialize 'bc_rdonly' to the right value. MFC r271451 Optimize the common case of injecting an interrupt into a vcpu after a HLT by explicitly moving it out of the interrupt shadow. MFC r271888 Restructure the MSR handling so it is entirely handled by processor-specific code. MFC r271890 MSR_KGSBASE is no longer saved and restored from the guest MSR save area. This behavior was changed in r271888 so update the comment block to reflect this. MFC r271891 Add some more KTR events to help debugging. MFC r272197 mmap(2) requires either MAP_PRIVATE or MAP_SHARED for non-anonymous mappings. MFC r272395 Get rid of code that dealt with the hardware not being able to save/restore the PAT MSR on guest exit/entry. This workaround was done for a beta release of VMware Fusion 5 but is no longer needed in later versions. All Intel CPUs since Nehalem have supported saving and restoring MSR_PAT in the VM exit and entry controls. MFC r272670 Inject #UD into the guest when it executes either 'MONITOR' or 'MWAIT'. MFC r272710 Implement the FLUSH operation in the virtio-block emulation. MFC r272838 iasl(8) expects integer fields in data tables to be specified as hexadecimal values. Therefore the bit width of the "PM Timer Block" was actually being interpreted as 50-bits instead of the expected 32-bit. This eliminates an error message emitted by a Linux 3.17 guest during boot: "Invalid length for FADT/PmTimerBlock: 50, using default 32" MFC r272839 Support Intel-specific MSRs that are accessed when booting up a linux in bhyve: - MSR_PLATFORM_INFO - MSR_TURBO_RATIO_LIMITx - MSR_RAPL_POWER_UNIT MFC r273108 Emulate "POP r/m". This is needed to boot OpenBSD/i386 MP kernel in bhyve. MFC r273212 Support stopping and restarting the AHCI command list via toggling PxCMD.ST from '1' to '0' and back. This allows the driver a chance to recover if for instance a timeout occurred due to activity on the host. Deleted: stable/10/sys/amd64/vmm/vmm_msr.c stable/10/sys/amd64/vmm/vmm_msr.h Modified: stable/10/sys/amd64/include/vmm.h stable/10/sys/amd64/vmm/intel/ept.c stable/10/sys/amd64/vmm/intel/vmcs.h stable/10/sys/amd64/vmm/intel/vmx.c stable/10/sys/amd64/vmm/intel/vmx.h stable/10/sys/amd64/vmm/intel/vmx_msr.c stable/10/sys/amd64/vmm/intel/vmx_msr.h stable/10/sys/amd64/vmm/io/vatpic.c stable/10/sys/amd64/vmm/io/vlapic.c stable/10/sys/amd64/vmm/vmm.c stable/10/sys/amd64/vmm/vmm_instruction_emul.c stable/10/sys/amd64/vmm/x86.c stable/10/sys/modules/vmm/Makefile stable/10/sys/x86/include/specialreg.h stable/10/usr.sbin/bhyve/acpi.c stable/10/usr.sbin/bhyve/bhyverun.c stable/10/usr.sbin/bhyve/block_if.c stable/10/usr.sbin/bhyve/pci_ahci.c stable/10/usr.sbin/bhyve/pci_virtio_block.c stable/10/usr.sbin/bhyve/task_switch.c stable/10/usr.sbin/bhyve/virtio.c stable/10/usr.sbin/bhyve/xmsr.c stable/10/usr.sbin/bhyve/xmsr.h stable/10/usr.sbin/bhyvectl/bhyvectl.c Directory Properties: stable/10/ (props changed) Modified: stable/10/sys/amd64/include/vmm.h ============================================================================== --- stable/10/sys/amd64/include/vmm.h Sun Dec 28 21:13:55 2014 (r276348) +++ stable/10/sys/amd64/include/vmm.h Sun Dec 28 21:27:13 2014 (r276349) @@ -82,6 +82,7 @@ enum vm_reg_name { VM_REG_GUEST_PDPTE1, VM_REG_GUEST_PDPTE2, VM_REG_GUEST_PDPTE3, + VM_REG_GUEST_INTR_SHADOW, VM_REG_LAST }; @@ -194,7 +195,6 @@ void vm_nmi_clear(struct vm *vm, int vcp int vm_inject_extint(struct vm *vm, int vcpu); int vm_extint_pending(struct vm *vm, int vcpuid); void vm_extint_clear(struct vm *vm, int vcpuid); -uint64_t *vm_guest_msrs(struct vm *vm, int cpu); struct vlapic *vm_lapic(struct vm *vm, int cpu); struct vioapic *vm_ioapic(struct vm *vm); struct vhpet *vm_hpet(struct vm *vm); @@ -485,6 +485,8 @@ enum vm_exitcode { VM_EXITCODE_SUSPENDED, VM_EXITCODE_INOUT_STR, VM_EXITCODE_TASK_SWITCH, + VM_EXITCODE_MONITOR, + VM_EXITCODE_MWAIT, VM_EXITCODE_MAX }; Modified: stable/10/sys/amd64/vmm/intel/ept.c ============================================================================== --- stable/10/sys/amd64/vmm/intel/ept.c Sun Dec 28 21:13:55 2014 (r276348) +++ stable/10/sys/amd64/vmm/intel/ept.c Sun Dec 28 21:27:13 2014 (r276349) @@ -44,7 +44,6 @@ __FBSDID("$FreeBSD$"); #include "vmx_cpufunc.h" #include "vmm_ipi.h" -#include "vmx_msr.h" #include "ept.h" #define EPT_SUPPORTS_EXEC_ONLY(cap) ((cap) & (1UL << 0)) Modified: stable/10/sys/amd64/vmm/intel/vmcs.h ============================================================================== --- stable/10/sys/amd64/vmm/intel/vmcs.h Sun Dec 28 21:13:55 2014 (r276348) +++ stable/10/sys/amd64/vmm/intel/vmcs.h Sun Dec 28 21:27:13 2014 (r276349) @@ -54,6 +54,10 @@ int vmcs_getdesc(struct vmcs *vmcs, int int vmcs_setdesc(struct vmcs *vmcs, int running, int ident, struct seg_desc *desc); +/* + * Avoid header pollution caused by inline use of 'vtophys()' in vmx_cpufunc.h + */ +#ifdef _VMX_CPUFUNC_H_ static __inline uint64_t vmcs_read(uint32_t encoding) { @@ -73,6 +77,7 @@ vmcs_write(uint32_t encoding, uint64_t v error = vmwrite(encoding, val); KASSERT(error == 0, ("vmcs_write(%u) error %d", encoding, error)); } +#endif /* _VMX_CPUFUNC_H_ */ #define vmexit_instruction_length() vmcs_read(VMCS_EXIT_INSTRUCTION_LENGTH) #define vmcs_guest_rip() vmcs_read(VMCS_GUEST_RIP) Modified: stable/10/sys/amd64/vmm/intel/vmx.c ============================================================================== --- stable/10/sys/amd64/vmm/intel/vmx.c Sun Dec 28 21:13:55 2014 (r276348) +++ stable/10/sys/amd64/vmm/intel/vmx.c Sun Dec 28 21:27:13 2014 (r276349) @@ -52,20 +52,20 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include "vmm_lapic.h" #include "vmm_host.h" #include "vmm_ioport.h" #include "vmm_ipi.h" -#include "vmm_msr.h" #include "vmm_ktr.h" #include "vmm_stat.h" #include "vatpic.h" #include "vlapic.h" #include "vlapic_priv.h" -#include "vmx_msr.h" #include "ept.h" #include "vmx_cpufunc.h" #include "vmx.h" +#include "vmx_msr.h" #include "x86.h" #include "vmx_controls.h" @@ -81,6 +81,8 @@ __FBSDID("$FreeBSD$"); #define PROCBASED_CTLS_ONE_SETTING \ (PROCBASED_SECONDARY_CONTROLS | \ + PROCBASED_MWAIT_EXITING | \ + PROCBASED_MONITOR_EXITING | \ PROCBASED_IO_EXITING | \ PROCBASED_MSR_BITMAPS | \ PROCBASED_CTLS_WINDOW_SETTING | \ @@ -94,34 +96,23 @@ __FBSDID("$FreeBSD$"); #define PROCBASED_CTLS2_ONE_SETTING PROCBASED2_ENABLE_EPT #define PROCBASED_CTLS2_ZERO_SETTING 0 -#define VM_EXIT_CTLS_ONE_SETTING_NO_PAT \ +#define VM_EXIT_CTLS_ONE_SETTING \ (VM_EXIT_HOST_LMA | \ VM_EXIT_SAVE_EFER | \ - VM_EXIT_LOAD_EFER) - -#define VM_EXIT_CTLS_ONE_SETTING \ - (VM_EXIT_CTLS_ONE_SETTING_NO_PAT | \ + VM_EXIT_LOAD_EFER | \ VM_EXIT_ACKNOWLEDGE_INTERRUPT | \ VM_EXIT_SAVE_PAT | \ VM_EXIT_LOAD_PAT) + #define VM_EXIT_CTLS_ZERO_SETTING VM_EXIT_SAVE_DEBUG_CONTROLS -#define VM_ENTRY_CTLS_ONE_SETTING_NO_PAT VM_ENTRY_LOAD_EFER +#define VM_ENTRY_CTLS_ONE_SETTING (VM_ENTRY_LOAD_EFER | VM_ENTRY_LOAD_PAT) -#define VM_ENTRY_CTLS_ONE_SETTING \ - (VM_ENTRY_CTLS_ONE_SETTING_NO_PAT | \ - VM_ENTRY_LOAD_PAT) #define VM_ENTRY_CTLS_ZERO_SETTING \ (VM_ENTRY_LOAD_DEBUG_CONTROLS | \ VM_ENTRY_INTO_SMM | \ VM_ENTRY_DEACTIVATE_DUAL_MONITOR) -#define guest_msr_rw(vmx, msr) \ - msr_bitmap_change_access((vmx)->msr_bitmap, (msr), MSR_BITMAP_ACCESS_RW) - -#define guest_msr_ro(vmx, msr) \ - msr_bitmap_change_access((vmx)->msr_bitmap, (msr), MSR_BITMAP_ACCESS_READ) - #define HANDLED 1 #define UNHANDLED 0 @@ -158,10 +149,6 @@ SYSCTL_INT(_hw_vmm_vmx, OID_AUTO, initia */ static SYSCTL_NODE(_hw_vmm_vmx, OID_AUTO, cap, CTLFLAG_RW, NULL, NULL); -static int vmx_patmsr; -SYSCTL_INT(_hw_vmm_vmx_cap, OID_AUTO, patmsr, CTLFLAG_RD, &vmx_patmsr, 0, - "PAT MSR saved and restored in VCMS"); - static int cap_halt_exit; SYSCTL_INT(_hw_vmm_vmx_cap, OID_AUTO, halt_exit, CTLFLAG_RD, &cap_halt_exit, 0, "HLT triggers a VM-exit"); @@ -208,6 +195,7 @@ SYSCTL_UINT(_hw_vmm_vmx, OID_AUTO, vpid_ static int vmx_getdesc(void *arg, int vcpu, int reg, struct seg_desc *desc); static int vmx_getreg(void *arg, int vcpu, int reg, uint64_t *retval); +static int vmxctx_setreg(struct vmxctx *vmxctx, int reg, uint64_t val); static void vmx_inject_pir(struct vlapic *vlapic); #ifdef KTR @@ -475,22 +463,6 @@ vpid_init(void) } static void -msr_save_area_init(struct msr_entry *g_area, int *g_count) -{ - int cnt; - - static struct msr_entry guest_msrs[] = { - { MSR_KGSBASE, 0, 0 }, - }; - - cnt = sizeof(guest_msrs) / sizeof(guest_msrs[0]); - if (cnt > GUEST_MSR_MAX_ENTRIES) - panic("guest msr save area overrun"); - bcopy(guest_msrs, g_area, sizeof(guest_msrs)); - *g_count = cnt; -} - -static void vmx_disable(void *arg __unused) { struct invvpid_desc invvpid_desc = { 0 }; @@ -636,49 +608,24 @@ vmx_init(int ipinum) } /* Check support for VM-exit controls */ - vmx_patmsr = 1; error = vmx_set_ctlreg(MSR_VMX_EXIT_CTLS, MSR_VMX_TRUE_EXIT_CTLS, VM_EXIT_CTLS_ONE_SETTING, VM_EXIT_CTLS_ZERO_SETTING, &exit_ctls); if (error) { - /* Try again without the PAT MSR bits */ - error = vmx_set_ctlreg(MSR_VMX_EXIT_CTLS, - MSR_VMX_TRUE_EXIT_CTLS, - VM_EXIT_CTLS_ONE_SETTING_NO_PAT, - VM_EXIT_CTLS_ZERO_SETTING, - &exit_ctls); - if (error) { - printf("vmx_init: processor does not support desired " - "exit controls\n"); - return (error); - } else { - if (bootverbose) - printf("vmm: PAT MSR access not supported\n"); - guest_msr_valid(MSR_PAT); - vmx_patmsr = 0; - } + printf("vmx_init: processor does not support desired " + "exit controls\n"); + return (error); } /* Check support for VM-entry controls */ - if (vmx_patmsr) { - error = vmx_set_ctlreg(MSR_VMX_ENTRY_CTLS, - MSR_VMX_TRUE_ENTRY_CTLS, - VM_ENTRY_CTLS_ONE_SETTING, - VM_ENTRY_CTLS_ZERO_SETTING, - &entry_ctls); - } else { - error = vmx_set_ctlreg(MSR_VMX_ENTRY_CTLS, - MSR_VMX_TRUE_ENTRY_CTLS, - VM_ENTRY_CTLS_ONE_SETTING_NO_PAT, - VM_ENTRY_CTLS_ZERO_SETTING, - &entry_ctls); - } - + error = vmx_set_ctlreg(MSR_VMX_ENTRY_CTLS, MSR_VMX_TRUE_ENTRY_CTLS, + VM_ENTRY_CTLS_ONE_SETTING, VM_ENTRY_CTLS_ZERO_SETTING, + &entry_ctls); if (error) { printf("vmx_init: processor does not support desired " - "entry controls\n"); - return (error); + "entry controls\n"); + return (error); } /* @@ -800,6 +747,8 @@ vmx_init(int ipinum) vpid_init(); + vmx_msr_init(); + /* enable VMX operation */ smp_rendezvous(NULL, vmx_enable, NULL, NULL); @@ -869,7 +818,7 @@ static void * vmx_vminit(struct vm *vm, pmap_t pmap) { uint16_t vpid[VM_MAXCPU]; - int i, error, guest_msr_count; + int i, error; struct vmx *vmx; struct vmcs *vmcs; @@ -905,16 +854,14 @@ vmx_vminit(struct vm *vm, pmap_t pmap) * how they are saved/restored so can be directly accessed by the * guest. * - * Guest KGSBASE is saved and restored in the guest MSR save area. - * Host KGSBASE is restored before returning to userland from the pcb. - * There will be a window of time when we are executing in the host - * kernel context with a value of KGSBASE from the guest. This is ok - * because the value of KGSBASE is inconsequential in kernel context. - * * MSR_EFER is saved and restored in the guest VMCS area on a * VM exit and entry respectively. It is also restored from the * host VMCS area on a VM exit. * + * MSR_PAT is saved and restored in the guest VMCS are on a VM exit + * and entry respectively. It is also restored from the host VMCS + * area on a VM exit. + * * The TSC MSR is exposed read-only. Writes are disallowed as that * will impact the host TSC. * XXX Writes would be implemented with a wrmsr trap, and @@ -925,21 +872,11 @@ vmx_vminit(struct vm *vm, pmap_t pmap) guest_msr_rw(vmx, MSR_SYSENTER_CS_MSR) || guest_msr_rw(vmx, MSR_SYSENTER_ESP_MSR) || guest_msr_rw(vmx, MSR_SYSENTER_EIP_MSR) || - guest_msr_rw(vmx, MSR_KGSBASE) || guest_msr_rw(vmx, MSR_EFER) || + guest_msr_rw(vmx, MSR_PAT) || guest_msr_ro(vmx, MSR_TSC)) panic("vmx_vminit: error setting guest msr access"); - /* - * MSR_PAT is saved and restored in the guest VMCS are on a VM exit - * and entry respectively. It is also restored from the host VMCS - * area on a VM exit. However, if running on a system with no - * MSR_PAT save/restore support, leave access disabled so accesses - * will be trapped. - */ - if (vmx_patmsr && guest_msr_rw(vmx, MSR_PAT)) - panic("vmx_vminit: error setting guest pat msr access"); - vpid_alloc(vpid, VM_MAXCPU); if (virtual_interrupt_delivery) { @@ -958,6 +895,8 @@ vmx_vminit(struct vm *vm, pmap_t pmap) error, i); } + vmx_msr_guest_init(vmx, i); + error = vmcs_init(vmcs); KASSERT(error == 0, ("vmcs_init error %d", error)); @@ -996,13 +935,6 @@ vmx_vminit(struct vm *vm, pmap_t pmap) vmx->state[i].lastcpu = NOCPU; vmx->state[i].vpid = vpid[i]; - msr_save_area_init(vmx->guest_msrs[i], &guest_msr_count); - - error = vmcs_set_msr_save(vmcs, vtophys(vmx->guest_msrs[i]), - guest_msr_count); - if (error != 0) - panic("vmcs_set_msr_save error %d", error); - /* * Set up the CR0/4 shadows, and init the read shadow * to the power-on register value from the Intel Sys Arch. @@ -2078,6 +2010,46 @@ vmx_task_switch_reason(uint64_t qual) } static int +emulate_wrmsr(struct vmx *vmx, int vcpuid, u_int num, uint64_t val, bool *retu) +{ + int error; + + if (lapic_msr(num)) + error = lapic_wrmsr(vmx->vm, vcpuid, num, val, retu); + else + error = vmx_wrmsr(vmx, vcpuid, num, val, retu); + + return (error); +} + +static int +emulate_rdmsr(struct vmx *vmx, int vcpuid, u_int num, bool *retu) +{ + struct vmxctx *vmxctx; + uint64_t result; + uint32_t eax, edx; + int error; + + if (lapic_msr(num)) + error = lapic_rdmsr(vmx->vm, vcpuid, num, &result, retu); + else + error = vmx_rdmsr(vmx, vcpuid, num, &result, retu); + + if (error == 0) { + eax = result; + vmxctx = &vmx->ctx[vcpuid]; + error = vmxctx_setreg(vmxctx, VM_REG_GUEST_RAX, eax); + KASSERT(error == 0, ("vmxctx_setreg(rax) error %d", error)); + + edx = result >> 32; + error = vmxctx_setreg(vmxctx, VM_REG_GUEST_RDX, edx); + KASSERT(error == 0, ("vmxctx_setreg(rdx) error %d", error)); + } + + return (error); +} + +static int vmx_exit_process(struct vmx *vmx, int vcpu, struct vm_exit *vmexit) { int error, handled, in; @@ -2215,7 +2187,7 @@ vmx_exit_process(struct vmx *vmx, int vc retu = false; ecx = vmxctx->guest_rcx; VCPU_CTR1(vmx->vm, vcpu, "rdmsr 0x%08x", ecx); - error = emulate_rdmsr(vmx->vm, vcpu, ecx, &retu); + error = emulate_rdmsr(vmx, vcpu, ecx, &retu); if (error) { vmexit->exitcode = VM_EXITCODE_RDMSR; vmexit->u.msr.code = ecx; @@ -2224,7 +2196,7 @@ vmx_exit_process(struct vmx *vmx, int vc } else { /* Return to userspace with a valid exitcode */ KASSERT(vmexit->exitcode != VM_EXITCODE_BOGUS, - ("emulate_wrmsr retu with bogus exitcode")); + ("emulate_rdmsr retu with bogus exitcode")); } break; case EXIT_REASON_WRMSR: @@ -2235,7 +2207,7 @@ vmx_exit_process(struct vmx *vmx, int vc edx = vmxctx->guest_rdx; VCPU_CTR2(vmx->vm, vcpu, "wrmsr 0x%08x value 0x%016lx", ecx, (uint64_t)edx << 32 | eax); - error = emulate_wrmsr(vmx->vm, vcpu, ecx, + error = emulate_wrmsr(vmx, vcpu, ecx, (uint64_t)edx << 32 | eax, &retu); if (error) { vmexit->exitcode = VM_EXITCODE_WRMSR; @@ -2403,6 +2375,12 @@ vmx_exit_process(struct vmx *vmx, int vc case EXIT_REASON_XSETBV: handled = vmx_emulate_xsetbv(vmx, vcpu, vmexit); break; + case EXIT_REASON_MONITOR: + vmexit->exitcode = VM_EXITCODE_MONITOR; + break; + case EXIT_REASON_MWAIT: + vmexit->exitcode = VM_EXITCODE_MWAIT; + break; default: vmm_stat_incr(vmx->vm, vcpu, VMEXIT_UNKNOWN, 1); break; @@ -2523,6 +2501,8 @@ vmx_run(void *arg, int vcpu, register_t KASSERT(vmxctx->pmap == pmap, ("pmap %p different than ctx pmap %p", pmap, vmxctx->pmap)); + vmx_msr_guest_enter(vmx, vcpu); + VMPTRLD(vmcs); /* @@ -2624,6 +2604,8 @@ vmx_run(void *arg, int vcpu, register_t vmexit->exitcode); VMCLEAR(vmcs); + vmx_msr_guest_exit(vmx, vcpu); + return (0); } @@ -2712,6 +2694,46 @@ vmxctx_setreg(struct vmxctx *vmxctx, int } static int +vmx_get_intr_shadow(struct vmx *vmx, int vcpu, int running, uint64_t *retval) +{ + uint64_t gi; + int error; + + error = vmcs_getreg(&vmx->vmcs[vcpu], running, + VMCS_IDENT(VMCS_GUEST_INTERRUPTIBILITY), &gi); + *retval = (gi & HWINTR_BLOCKING) ? 1 : 0; + return (error); +} + +static int +vmx_modify_intr_shadow(struct vmx *vmx, int vcpu, int running, uint64_t val) +{ + struct vmcs *vmcs; + uint64_t gi; + int error, ident; + + /* + * Forcing the vcpu into an interrupt shadow is not supported. + */ + if (val) { + error = EINVAL; + goto done; + } + + vmcs = &vmx->vmcs[vcpu]; + ident = VMCS_IDENT(VMCS_GUEST_INTERRUPTIBILITY); + error = vmcs_getreg(vmcs, running, ident, &gi); + if (error == 0) { + gi &= ~HWINTR_BLOCKING; + error = vmcs_setreg(vmcs, running, ident, gi); + } +done: + VCPU_CTR2(vmx->vm, vcpu, "Setting intr_shadow to %#lx %s", val, + error ? "failed" : "succeeded"); + return (error); +} + +static int vmx_shadow_reg(int reg) { int shreg; @@ -2742,6 +2764,9 @@ vmx_getreg(void *arg, int vcpu, int reg, if (running && hostcpu != curcpu) panic("vmx_getreg: %s%d is running", vm_name(vmx->vm), vcpu); + if (reg == VM_REG_GUEST_INTR_SHADOW) + return (vmx_get_intr_shadow(vmx, vcpu, running, retval)); + if (vmxctx_getreg(&vmx->ctx[vcpu], reg, retval) == 0) return (0); @@ -2760,6 +2785,9 @@ vmx_setreg(void *arg, int vcpu, int reg, if (running && hostcpu != curcpu) panic("vmx_setreg: %s%d is running", vm_name(vmx->vm), vcpu); + if (reg == VM_REG_GUEST_INTR_SHADOW) + return (vmx_modify_intr_shadow(vmx, vcpu, running, val)); + if (vmxctx_setreg(&vmx->ctx[vcpu], reg, val) == 0) return (0); Modified: stable/10/sys/amd64/vmm/intel/vmx.h ============================================================================== --- stable/10/sys/amd64/vmm/intel/vmx.h Sun Dec 28 21:13:55 2014 (r276348) +++ stable/10/sys/amd64/vmm/intel/vmx.h Sun Dec 28 21:27:13 2014 (r276349) @@ -33,8 +33,6 @@ struct pmap; -#define GUEST_MSR_MAX_ENTRIES 64 /* arbitrary */ - struct vmxctx { register_t guest_rdi; /* Guest state */ register_t guest_rsi; @@ -97,13 +95,23 @@ struct pir_desc { } __aligned(64); CTASSERT(sizeof(struct pir_desc) == 64); +/* Index into the 'guest_msrs[]' array */ +enum { + IDX_MSR_LSTAR, + IDX_MSR_CSTAR, + IDX_MSR_STAR, + IDX_MSR_SF_MASK, + IDX_MSR_KGSBASE, + GUEST_MSR_NUM /* must be the last enumeration */ +}; + /* virtual machine softc */ struct vmx { struct vmcs vmcs[VM_MAXCPU]; /* one vmcs per virtual cpu */ struct apic_page apic_page[VM_MAXCPU]; /* one apic page per vcpu */ char msr_bitmap[PAGE_SIZE]; struct pir_desc pir_desc[VM_MAXCPU]; - struct msr_entry guest_msrs[VM_MAXCPU][GUEST_MSR_MAX_ENTRIES]; + uint64_t guest_msrs[VM_MAXCPU][GUEST_MSR_NUM]; struct vmxctx ctx[VM_MAXCPU]; struct vmxcap cap[VM_MAXCPU]; struct vmxstate state[VM_MAXCPU]; @@ -113,7 +121,6 @@ struct vmx { }; CTASSERT((offsetof(struct vmx, vmcs) & PAGE_MASK) == 0); CTASSERT((offsetof(struct vmx, msr_bitmap) & PAGE_MASK) == 0); -CTASSERT((offsetof(struct vmx, guest_msrs) & 15) == 0); CTASSERT((offsetof(struct vmx, pir_desc[0]) & 63) == 0); #define VMX_GUEST_VMEXIT 0 Modified: stable/10/sys/amd64/vmm/intel/vmx_msr.c ============================================================================== --- stable/10/sys/amd64/vmm/intel/vmx_msr.c Sun Dec 28 21:13:55 2014 (r276348) +++ stable/10/sys/amd64/vmm/intel/vmx_msr.c Sun Dec 28 21:27:13 2014 (r276349) @@ -31,10 +31,15 @@ __FBSDID("$FreeBSD$"); #include #include +#include +#include #include +#include #include +#include +#include "vmx.h" #include "vmx_msr.h" static boolean_t @@ -171,3 +176,213 @@ msr_bitmap_change_access(char *bitmap, u return (0); } + +static uint64_t misc_enable; +static uint64_t platform_info; +static uint64_t turbo_ratio_limit; +static uint64_t host_msrs[GUEST_MSR_NUM]; + +static bool +nehalem_cpu(void) +{ + u_int family, model; + + /* + * The family:model numbers belonging to the Nehalem microarchitecture + * are documented in Section 35.5, Intel SDM dated Feb 2014. + */ + family = CPUID_TO_FAMILY(cpu_id); + model = CPUID_TO_MODEL(cpu_id); + if (family == 0x6) { + switch (model) { + case 0x1A: + case 0x1E: + case 0x1F: + case 0x2E: + return (true); + default: + break; + } + } + return (false); +} + +static bool +westmere_cpu(void) +{ + u_int family, model; + + /* + * The family:model numbers belonging to the Westmere microarchitecture + * are documented in Section 35.6, Intel SDM dated Feb 2014. + */ + family = CPUID_TO_FAMILY(cpu_id); + model = CPUID_TO_MODEL(cpu_id); + if (family == 0x6) { + switch (model) { + case 0x25: + case 0x2C: + return (true); + default: + break; + } + } + return (false); +} + +void +vmx_msr_init(void) +{ + uint64_t bus_freq, ratio; + int i; + + /* + * It is safe to cache the values of the following MSRs because + * they don't change based on curcpu, curproc or curthread. + */ + host_msrs[IDX_MSR_LSTAR] = rdmsr(MSR_LSTAR); + host_msrs[IDX_MSR_CSTAR] = rdmsr(MSR_CSTAR); + host_msrs[IDX_MSR_STAR] = rdmsr(MSR_STAR); + host_msrs[IDX_MSR_SF_MASK] = rdmsr(MSR_SF_MASK); + + /* + * Initialize emulated MSRs + */ + misc_enable = rdmsr(MSR_IA32_MISC_ENABLE); + /* + * Set mandatory bits + * 11: branch trace disabled + * 12: PEBS unavailable + * Clear unsupported features + * 16: SpeedStep enable + * 18: enable MONITOR FSM + */ + misc_enable |= (1 << 12) | (1 << 11); + misc_enable &= ~((1 << 18) | (1 << 16)); + + if (nehalem_cpu() || westmere_cpu()) + bus_freq = 133330000; /* 133Mhz */ + else + bus_freq = 100000000; /* 100Mhz */ + + /* + * XXXtime + * The ratio should really be based on the virtual TSC frequency as + * opposed to the host TSC. + */ + ratio = (tsc_freq / bus_freq) & 0xff; + + /* + * The register definition is based on the micro-architecture + * but the following bits are always the same: + * [15:8] Maximum Non-Turbo Ratio + * [28] Programmable Ratio Limit for Turbo Mode + * [29] Programmable TDC-TDP Limit for Turbo Mode + * [47:40] Maximum Efficiency Ratio + * + * The other bits can be safely set to 0 on all + * micro-architectures up to Haswell. + */ + platform_info = (ratio << 8) | (ratio << 40); + + /* + * The number of valid bits in the MSR_TURBO_RATIO_LIMITx register is + * dependent on the maximum cores per package supported by the micro- + * architecture. For e.g., Westmere supports 6 cores per package and + * uses the low 48 bits. Sandybridge support 8 cores per package and + * uses up all 64 bits. + * + * However, the unused bits are reserved so we pretend that all bits + * in this MSR are valid. + */ + for (i = 0; i < 8; i++) + turbo_ratio_limit = (turbo_ratio_limit << 8) | ratio; +} + +void +vmx_msr_guest_init(struct vmx *vmx, int vcpuid) +{ + /* + * The permissions bitmap is shared between all vcpus so initialize it + * once when initializing the vBSP. + */ + if (vcpuid == 0) { + guest_msr_rw(vmx, MSR_LSTAR); + guest_msr_rw(vmx, MSR_CSTAR); + guest_msr_rw(vmx, MSR_STAR); + guest_msr_rw(vmx, MSR_SF_MASK); + guest_msr_rw(vmx, MSR_KGSBASE); + } + return; +} + +void +vmx_msr_guest_enter(struct vmx *vmx, int vcpuid) +{ + uint64_t *guest_msrs = vmx->guest_msrs[vcpuid]; + + /* Save host MSRs (if any) and restore guest MSRs */ + wrmsr(MSR_LSTAR, guest_msrs[IDX_MSR_LSTAR]); + wrmsr(MSR_CSTAR, guest_msrs[IDX_MSR_CSTAR]); + wrmsr(MSR_STAR, guest_msrs[IDX_MSR_STAR]); + wrmsr(MSR_SF_MASK, guest_msrs[IDX_MSR_SF_MASK]); + wrmsr(MSR_KGSBASE, guest_msrs[IDX_MSR_KGSBASE]); +} + +void +vmx_msr_guest_exit(struct vmx *vmx, int vcpuid) +{ + uint64_t *guest_msrs = vmx->guest_msrs[vcpuid]; + + /* Save guest MSRs */ + guest_msrs[IDX_MSR_LSTAR] = rdmsr(MSR_LSTAR); + guest_msrs[IDX_MSR_CSTAR] = rdmsr(MSR_CSTAR); + guest_msrs[IDX_MSR_STAR] = rdmsr(MSR_STAR); + guest_msrs[IDX_MSR_SF_MASK] = rdmsr(MSR_SF_MASK); + guest_msrs[IDX_MSR_KGSBASE] = rdmsr(MSR_KGSBASE); + + /* Restore host MSRs */ + wrmsr(MSR_LSTAR, host_msrs[IDX_MSR_LSTAR]); + wrmsr(MSR_CSTAR, host_msrs[IDX_MSR_CSTAR]); + wrmsr(MSR_STAR, host_msrs[IDX_MSR_STAR]); + wrmsr(MSR_SF_MASK, host_msrs[IDX_MSR_SF_MASK]); + + /* MSR_KGSBASE will be restored on the way back to userspace */ +} + +int +vmx_rdmsr(struct vmx *vmx, int vcpuid, u_int num, uint64_t *val, bool *retu) +{ + int error = 0; + + switch (num) { + case MSR_IA32_MISC_ENABLE: + *val = misc_enable; + break; + case MSR_PLATFORM_INFO: + *val = platform_info; + break; + case MSR_TURBO_RATIO_LIMIT: + case MSR_TURBO_RATIO_LIMIT1: + *val = turbo_ratio_limit; + break; + default: + error = EINVAL; + break; + } + return (error); +} + +int +vmx_wrmsr(struct vmx *vmx, int vcpuid, u_int num, uint64_t val, bool *retu) +{ + int error = 0; + + switch (num) { + default: + error = EINVAL; + break; + } + + return (error); +} Modified: stable/10/sys/amd64/vmm/intel/vmx_msr.h ============================================================================== --- stable/10/sys/amd64/vmm/intel/vmx_msr.h Sun Dec 28 21:13:55 2014 (r276348) +++ stable/10/sys/amd64/vmm/intel/vmx_msr.h Sun Dec 28 21:27:13 2014 (r276349) @@ -29,6 +29,15 @@ #ifndef _VMX_MSR_H_ #define _VMX_MSR_H_ +struct vmx; + +void vmx_msr_init(void); +void vmx_msr_guest_init(struct vmx *vmx, int vcpuid); +void vmx_msr_guest_enter(struct vmx *vmx, int vcpuid); +void vmx_msr_guest_exit(struct vmx *vmx, int vcpuid); +int vmx_rdmsr(struct vmx *, int vcpuid, u_int num, uint64_t *val, bool *retu); +int vmx_wrmsr(struct vmx *, int vcpuid, u_int num, uint64_t val, bool *retu); + uint32_t vmx_revision(void); int vmx_set_ctlreg(int ctl_reg, int true_ctl_reg, uint32_t ones_mask, @@ -52,4 +61,10 @@ int vmx_set_ctlreg(int ctl_reg, int true void msr_bitmap_initialize(char *bitmap); int msr_bitmap_change_access(char *bitmap, u_int msr, int access); +#define guest_msr_rw(vmx, msr) \ + msr_bitmap_change_access((vmx)->msr_bitmap, (msr), MSR_BITMAP_ACCESS_RW) + +#define guest_msr_ro(vmx, msr) \ + msr_bitmap_change_access((vmx)->msr_bitmap, (msr), MSR_BITMAP_ACCESS_READ) + #endif Modified: stable/10/sys/amd64/vmm/io/vatpic.c ============================================================================== --- stable/10/sys/amd64/vmm/io/vatpic.c Sun Dec 28 21:13:55 2014 (r276348) +++ stable/10/sys/amd64/vmm/io/vatpic.c Sun Dec 28 21:27:13 2014 (r276349) @@ -500,13 +500,19 @@ vatpic_pending_intr(struct vm *vm, int * VATPIC_LOCK(vatpic); pin = vatpic_get_highest_irrpin(atpic); - if (pin == -1) - pin = 7; if (pin == 2) { atpic = &vatpic->atpic[1]; pin = vatpic_get_highest_irrpin(atpic); } + /* + * If there are no pins active at this moment then return the spurious + * interrupt vector instead. + */ + if (pin == -1) + pin = 7; + + KASSERT(pin >= 0 && pin <= 7, ("%s: invalid pin %d", __func__, pin)); *vecptr = atpic->irq_base + pin; VATPIC_UNLOCK(vatpic); Modified: stable/10/sys/amd64/vmm/io/vlapic.c ============================================================================== --- stable/10/sys/amd64/vmm/io/vlapic.c Sun Dec 28 21:13:55 2014 (r276348) +++ stable/10/sys/amd64/vmm/io/vlapic.c Sun Dec 28 21:27:13 2014 (r276349) @@ -633,6 +633,7 @@ vlapic_fire_timer(struct vlapic *vlapic) // The timer LVT always uses the fixed delivery mode. lvt = vlapic_get_lvt(vlapic, APIC_OFFSET_TIMER_LVT); if (vlapic_fire_lvt(vlapic, lvt | APIC_LVT_DM_FIXED)) { + VLAPIC_CTR0(vlapic, "vlapic timer fired"); vmm_stat_incr(vlapic->vm, vlapic->vcpuid, VLAPIC_INTR_TIMER, 1); } } Modified: stable/10/sys/amd64/vmm/vmm.c ============================================================================== --- stable/10/sys/amd64/vmm/vmm.c Sun Dec 28 21:13:55 2014 (r276348) +++ stable/10/sys/amd64/vmm/vmm.c Sun Dec 28 21:27:13 2014 (r276349) @@ -74,7 +74,6 @@ __FBSDID("$FreeBSD$"); #include "vhpet.h" #include "vioapic.h" #include "vlapic.h" -#include "vmm_msr.h" #include "vmm_ipi.h" #include "vmm_stat.h" #include "vmm_lapic.h" @@ -105,7 +104,6 @@ struct vcpu { struct savefpu *guestfpu; /* (a,i) guest fpu state */ uint64_t guest_xcr0; /* (i) guest %xcr0 register */ void *stats; /* (a,i) statistics */ - uint64_t guest_msrs[VMM_MSR_NUM]; /* (i) emulated MSRs */ struct vm_exit exitinfo; /* (x) exit reason and collateral */ }; @@ -188,7 +186,6 @@ static struct vmm_ops *ops; #define fpu_stop_emulating() clts() static MALLOC_DEFINE(M_VM, "vm", "vm"); -CTASSERT(VMM_MSR_NUM <= 64); /* msr_mask can keep track of up to 64 msrs */ /* statistics */ static VMM_STAT(VCPU_TOTAL_RUNTIME, "vcpu total runtime"); @@ -250,7 +247,6 @@ vcpu_init(struct vm *vm, int vcpu_id, bo vcpu->guest_xcr0 = XFEATURE_ENABLED_X87; fpu_save_area_reset(vcpu->guestfpu); vmm_stat_init(vcpu->stats); - guest_msrs_init(vm, vcpu_id); } struct vm_exit * @@ -294,7 +290,6 @@ vmm_init(void) else return (ENXIO); - vmm_msr_init(); vmm_resume_p = vmm_resume; return (VMM_INIT(vmm_ipinum)); @@ -1091,7 +1086,7 @@ vm_handle_hlt(struct vm *vm, int vcpuid, { struct vcpu *vcpu; const char *wmesg; - int t, vcpu_halted, vm_halted; + int error, t, vcpu_halted, vm_halted; KASSERT(!CPU_ISSET(vcpuid, &vm->halted_cpus), ("vcpu already halted")); @@ -1099,6 +1094,22 @@ vm_handle_hlt(struct vm *vm, int vcpuid, vcpu_halted = 0; vm_halted = 0; + /* + * The typical way to halt a cpu is to execute: "sti; hlt" + * + * STI sets RFLAGS.IF to enable interrupts. However, the processor + * remains in an "interrupt shadow" for an additional instruction + * following the STI. This guarantees that "sti; hlt" sequence is + * atomic and a pending interrupt will be recognized after the HLT. + * + * After the HLT emulation is done the vcpu is no longer in an + * interrupt shadow and a pending interrupt can be injected on + * the next entry into the guest. + */ + error = vm_set_register(vm, vcpuid, VM_REG_GUEST_INTR_SHADOW, 0); + KASSERT(error == 0, ("%s: error %d clearing interrupt shadow", + __func__, error)); + vcpu_lock(vcpu); while (1) { /* @@ -1187,8 +1198,12 @@ vm_handle_paging(struct vm *vm, int vcpu if (ftype == VM_PROT_READ || ftype == VM_PROT_WRITE) { rv = pmap_emulate_accessed_dirty(vmspace_pmap(vm->vmspace), vme->u.paging.gpa, ftype); - if (rv == 0) + if (rv == 0) { + VCPU_CTR2(vm, vcpuid, "%s bit emulation for gpa %#lx", + ftype == VM_PROT_READ ? "accessed" : "dirty", + vme->u.paging.gpa); goto done; + } } map = &vm->vmspace->vm_map; @@ -1229,6 +1244,8 @@ vm_handle_inst_emul(struct vm *vm, int v paging = &vme->u.inst_emul.paging; cpu_mode = paging->cpu_mode; + VCPU_CTR1(vm, vcpuid, "inst_emul fault accessing gpa %#lx", gpa); + vie_init(vie); /* Fetch, decode and emulate the faulting instruction */ @@ -1425,7 +1442,6 @@ restart: pcb = PCPU_GET(curpcb); set_pcb_flags(pcb, PCB_FULL_IRET); - restore_guest_msrs(vm, vcpuid); restore_guest_fpustate(vcpu); vcpu_require_state(vm, vcpuid, VCPU_RUNNING); @@ -1433,7 +1449,6 @@ restart: vcpu_require_state(vm, vcpuid, VCPU_FROZEN); save_guest_fpustate(vcpu); - restore_host_msrs(vm, vcpuid); vmm_stat_incr(vm, vcpuid, VCPU_TOTAL_RUNTIME, rdtsc() - tscval); @@ -1467,6 +1482,10 @@ restart: case VM_EXITCODE_INOUT_STR: error = vm_handle_inout(vm, vcpuid, vme, &retu); break; + case VM_EXITCODE_MONITOR: + case VM_EXITCODE_MWAIT: + vm_inject_ud(vm, vcpuid); + break; default: retu = true; /* handled in userland */ break; @@ -1875,12 +1894,6 @@ vm_set_capability(struct vm *vm, int vcp return (VMSETCAP(vm->cookie, vcpu, type, val)); } -uint64_t * -vm_guest_msrs(struct vm *vm, int cpu) -{ - return (vm->vcpu[cpu].guest_msrs); -} - struct vlapic * vm_lapic(struct vm *vm, int cpu) { Modified: stable/10/sys/amd64/vmm/vmm_instruction_emul.c ============================================================================== --- stable/10/sys/amd64/vmm/vmm_instruction_emul.c Sun Dec 28 21:13:55 2014 (r276348) +++ stable/10/sys/amd64/vmm/vmm_instruction_emul.c Sun Dec 28 21:27:13 2014 (r276349) @@ -69,6 +69,7 @@ enum { VIE_OP_TYPE_TWO_BYTE, VIE_OP_TYPE_PUSH, VIE_OP_TYPE_CMP, + VIE_OP_TYPE_POP, VIE_OP_TYPE_LAST }; @@ -159,6 +160,11 @@ static const struct vie_op one_byte_opco .op_type = VIE_OP_TYPE_OR, .op_flags = VIE_OP_F_IMM8, }, + [0x8F] = { + /* XXX Group 1A extended opcode - not just POP */ + .op_byte = 0x8F, + .op_type = VIE_OP_TYPE_POP, + }, [0xFF] = { /* XXX Group 5 extended opcode - not just PUSH */ .op_byte = 0xFF, @@ -316,46 +322,36 @@ vie_update_register(void *vm, int vcpuid return (error); } +#define RFLAGS_STATUS_BITS (PSL_C | PSL_PF | PSL_AF | PSL_Z | PSL_N | PSL_V) + /* * Return the status flags that would result from doing (x - y). */ -static u_long -getcc16(uint16_t x, uint16_t y) -{ - u_long rflags; - *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***