From owner-svn-src-stable@freebsd.org Sun Jun 28 03:22:32 2015 Return-Path: Delivered-To: svn-src-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5837D98DB61; Sun, 28 Jun 2015 03:22:32 +0000 (UTC) (envelope-from neel@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:1900:2254:2068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 458091026; Sun, 28 Jun 2015 03:22:32 +0000 (UTC) (envelope-from neel@FreeBSD.org) Received: from svn.freebsd.org ([127.0.1.70]) by svn.freebsd.org (8.14.9/8.14.9) with ESMTP id t5S3MWDk090168; Sun, 28 Jun 2015 03:22:32 GMT (envelope-from neel@FreeBSD.org) Received: (from neel@localhost) by svn.freebsd.org (8.14.9/8.14.9/Submit) id t5S3MRaN090136; Sun, 28 Jun 2015 03:22:27 GMT (envelope-from neel@FreeBSD.org) Message-Id: <201506280322.t5S3MRaN090136@svn.freebsd.org> X-Authentication-Warning: svn.freebsd.org: neel set sender to neel@FreeBSD.org using -f From: Neel Natu Date: Sun, 28 Jun 2015 03:22:27 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-10@freebsd.org Subject: svn commit: r284900 - in stable/10: lib/libvmmapi sys/amd64/include sys/amd64/vmm sys/amd64/vmm/amd sys/amd64/vmm/intel sys/amd64/vmm/io sys/x86/include usr.sbin/bhyve usr.sbin/bhyvectl usr.sbin/bh... X-SVN-Group: stable-10 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-stable@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: SVN commit messages for all the -stable branches of the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Jun 2015 03:22:32 -0000 Author: neel Date: Sun Jun 28 03:22:26 2015 New Revision: 284900 URL: https://svnweb.freebsd.org/changeset/base/284900 Log: MFC r282209: Emulate the 'bit test' instruction. MFC r282259: Re-implement RTC current time calculation to eliminate the possibility of losing time. MFC r282281: Advertise the MTRR feature via CPUID and emulate the minimal set of MTRR MSRs. MFC r282284: When an instruction cannot be decoded just return to userspace so bhyve(8) can dump the instruction bytes. MFC r282287: Don't require to be always included before . MFC r282296: Emulate MSR_SYSCFG which is accessed by Linux on AMD cpus when MTRRs are enabled. MFC r282301: Relax limits when transitioning a vector from the IRR to the ISR and also when extinguishing it from the ISR in response to an EOI. MFC r282335: Advertise an additional memory BAR in the "dummy" device emulation. MFC r282336: Emulate machine check related MSRs to allow guest OSes like Windows to boot. MFC r282351: Don't advertise the Intel SMX capability to the guest. MFC r282407: Emulate the 'CMP r/m8, imm8' instruction. MFC r282519: Add macros for AMD-specific bits in MSR_EFER: LMSLE, FFXSR and TCE. MFC r282520: Emulate guest writes to EFER_MSR properly. MFC r282558: Deprecate the 3-way return values from vm_gla2gpa() and vm_copy_setup(). MFC r282571: Check 'td_owepreempt' and yield the vcpu thread if it is set. MFC r282595: Allow byte reads of AHCI registers. MFC r282784: Handling indirect descriptors is a capability of the host and not one that needs to be negotiated. Use the host capabilities field and not the negotiated field when verifying that indirect descriptors are supported. MFC r282788: Allow configuration of the sector size advertised to the guest. MFC r282865: Set the subvendor field in config space to the vendor ID. This is required by the Windows virtio drivers to correctly match a device. MFC r282922: Bump the size of the blockif scatter-gather list to 67. MFC r283075: Fix off-by-one in array index bounds check. bhyveload would allow you to create 33 entries on an array that only has 32 slots MFC r283168: Temporarily revert r282922 which bumped the max descriptors. MFC r283255: Emulate the "CMP r/m, reg" instruction (opcode 39H). MFC r283256: Add an option "--get-vmcs-exit-inst-length" to display the instruction length of the instruction that caused the VM-exit. MFC r283264: Change the header type of the emulated host-bridge from type 1 to type 0. MFC r283293: Don't rely on the 'VM-exit instruction length' field in the VMCS to always have an accurate length on an EPT violation. MFC r283299: Remove bogus verification of instruction length after instruction decode. MFC r283308: Exceptions don't deliver an error code in real mode. MFC r283657: Fix non-deterministic delays when accessing a vcpu that was in "running" or "sleeping" state. MFC r283973: Use tunable 'hw.vmm.svm.features' to disable specific SVM features even though they might be available in hardware. Use tunable 'hw.vmm.svm.num_asids' to limit the number of ASIDs used by the hypervisor. MFC r284046: Fix regression in 'verify_gla()' with the RIP-relative addressing mode. MFC r284174: Support guest writes to the TSC by enabling the "use TSC offsetting" execution control. Modified: stable/10/lib/libvmmapi/vmmapi.c stable/10/lib/libvmmapi/vmmapi.h stable/10/sys/amd64/include/vmm.h stable/10/sys/amd64/include/vmm_instruction_emul.h stable/10/sys/amd64/vmm/amd/amdv.c stable/10/sys/amd64/vmm/amd/svm.c stable/10/sys/amd64/vmm/amd/svm_msr.c stable/10/sys/amd64/vmm/amd/vmcb.c stable/10/sys/amd64/vmm/intel/vmx.c stable/10/sys/amd64/vmm/intel/vmx.h stable/10/sys/amd64/vmm/intel/vmx_msr.c stable/10/sys/amd64/vmm/io/vatpic.c stable/10/sys/amd64/vmm/io/vatpit.c stable/10/sys/amd64/vmm/io/vhpet.c stable/10/sys/amd64/vmm/io/vioapic.c stable/10/sys/amd64/vmm/io/vlapic.c stable/10/sys/amd64/vmm/io/vpmtmr.c stable/10/sys/amd64/vmm/io/vrtc.c stable/10/sys/amd64/vmm/vmm.c stable/10/sys/amd64/vmm/vmm_dev.c stable/10/sys/amd64/vmm/vmm_instruction_emul.c stable/10/sys/amd64/vmm/vmm_ioport.c stable/10/sys/amd64/vmm/vmm_stat.c stable/10/sys/amd64/vmm/vmm_stat.h stable/10/sys/amd64/vmm/x86.c stable/10/sys/amd64/vmm/x86.h stable/10/sys/x86/include/specialreg.h stable/10/usr.sbin/bhyve/bhyve.8 stable/10/usr.sbin/bhyve/bhyverun.c stable/10/usr.sbin/bhyve/block_if.c stable/10/usr.sbin/bhyve/inout.c stable/10/usr.sbin/bhyve/pci_ahci.c stable/10/usr.sbin/bhyve/pci_emul.c stable/10/usr.sbin/bhyve/pci_hostbridge.c stable/10/usr.sbin/bhyve/pci_virtio_block.c stable/10/usr.sbin/bhyve/pci_virtio_net.c stable/10/usr.sbin/bhyve/pci_virtio_rnd.c stable/10/usr.sbin/bhyve/task_switch.c stable/10/usr.sbin/bhyve/virtio.c stable/10/usr.sbin/bhyvectl/bhyvectl.c stable/10/usr.sbin/bhyveload/bhyveload.c Directory Properties: stable/10/ (props changed) Modified: stable/10/lib/libvmmapi/vmmapi.c ============================================================================== --- stable/10/lib/libvmmapi/vmmapi.c Sun Jun 28 01:21:55 2015 (r284899) +++ stable/10/lib/libvmmapi/vmmapi.c Sun Jun 28 03:22:26 2015 (r284900) @@ -40,6 +40,7 @@ __FBSDID("$FreeBSD$"); #include #include +#include #include #include #include @@ -958,9 +959,9 @@ vm_get_hpet_capabilities(struct vmctx *c return (error); } -static int -gla2gpa(struct vmctx *ctx, int vcpu, struct vm_guest_paging *paging, - uint64_t gla, int prot, int *fault, uint64_t *gpa) +int +vm_gla2gpa(struct vmctx *ctx, int vcpu, struct vm_guest_paging *paging, + uint64_t gla, int prot, uint64_t *gpa, int *fault) { struct vm_gla2gpa gg; int error; @@ -979,29 +980,18 @@ gla2gpa(struct vmctx *ctx, int vcpu, str return (error); } -int -vm_gla2gpa(struct vmctx *ctx, int vcpu, struct vm_guest_paging *paging, - uint64_t gla, int prot, uint64_t *gpa) -{ - int error, fault; - - error = gla2gpa(ctx, vcpu, paging, gla, prot, &fault, gpa); - if (fault) - error = fault; - return (error); -} - #ifndef min #define min(a,b) (((a) < (b)) ? (a) : (b)) #endif int vm_copy_setup(struct vmctx *ctx, int vcpu, struct vm_guest_paging *paging, - uint64_t gla, size_t len, int prot, struct iovec *iov, int iovcnt) + uint64_t gla, size_t len, int prot, struct iovec *iov, int iovcnt, + int *fault) { void *va; uint64_t gpa; - int error, fault, i, n, off; + int error, i, n, off; for (i = 0; i < iovcnt; i++) { iov[i].iov_base = 0; @@ -1010,18 +1000,16 @@ vm_copy_setup(struct vmctx *ctx, int vcp while (len) { assert(iovcnt > 0); - error = gla2gpa(ctx, vcpu, paging, gla, prot, &fault, &gpa); - if (error) - return (-1); - if (fault) - return (1); + error = vm_gla2gpa(ctx, vcpu, paging, gla, prot, &gpa, fault); + if (error || *fault) + return (error); off = gpa & PAGE_MASK; n = min(len, PAGE_SIZE - off); va = vm_map_gpa(ctx, gpa, n); if (va == NULL) - return (-1); + return (EFAULT); iov->iov_base = va; iov->iov_len = n; Modified: stable/10/lib/libvmmapi/vmmapi.h ============================================================================== --- stable/10/lib/libvmmapi/vmmapi.h Sun Jun 28 01:21:55 2015 (r284899) +++ stable/10/lib/libvmmapi/vmmapi.h Sun Jun 28 03:22:26 2015 (r284900) @@ -64,7 +64,7 @@ int vm_setup_memory(struct vmctx *ctx, s void *vm_map_gpa(struct vmctx *ctx, vm_paddr_t gaddr, size_t len); int vm_get_gpa_pmap(struct vmctx *, uint64_t gpa, uint64_t *pte, int *num); int vm_gla2gpa(struct vmctx *, int vcpuid, struct vm_guest_paging *paging, - uint64_t gla, int prot, uint64_t *gpa); + uint64_t gla, int prot, uint64_t *gpa, int *fault); uint32_t vm_get_lowmem_limit(struct vmctx *ctx); void vm_set_lowmem_limit(struct vmctx *ctx, uint32_t limit); void vm_set_memflags(struct vmctx *ctx, int flags); @@ -131,10 +131,15 @@ int vm_get_hpet_capabilities(struct vmct /* * Translate the GLA range [gla,gla+len) into GPA segments in 'iov'. * The 'iovcnt' should be big enough to accomodate all GPA segments. - * Returns 0 on success, 1 on a guest fault condition and -1 otherwise. + * + * retval fault Interpretation + * 0 0 Success + * 0 1 An exception was injected into the guest + * EFAULT N/A Error */ int vm_copy_setup(struct vmctx *ctx, int vcpu, struct vm_guest_paging *pg, - uint64_t gla, size_t len, int prot, struct iovec *iov, int iovcnt); + uint64_t gla, size_t len, int prot, struct iovec *iov, int iovcnt, + int *fault); void vm_copyin(struct vmctx *ctx, int vcpu, struct iovec *guest_iov, void *host_dst, size_t len); void vm_copyout(struct vmctx *ctx, int vcpu, const void *host_src, Modified: stable/10/sys/amd64/include/vmm.h ============================================================================== --- stable/10/sys/amd64/include/vmm.h Sun Jun 28 01:21:55 2015 (r284899) +++ stable/10/sys/amd64/include/vmm.h Sun Jun 28 03:22:26 2015 (r284900) @@ -120,13 +120,18 @@ struct vm_object; struct vm_guest_paging; struct pmap; +struct vm_eventinfo { + void *rptr; /* rendezvous cookie */ + int *sptr; /* suspend cookie */ + int *iptr; /* reqidle cookie */ +}; + typedef int (*vmm_init_func_t)(int ipinum); typedef int (*vmm_cleanup_func_t)(void); typedef void (*vmm_resume_func_t)(void); typedef void * (*vmi_init_func_t)(struct vm *vm, struct pmap *pmap); typedef int (*vmi_run_func_t)(void *vmi, int vcpu, register_t rip, - struct pmap *pmap, void *rendezvous_cookie, - void *suspend_cookie); + struct pmap *pmap, struct vm_eventinfo *info); typedef void (*vmi_cleanup_func_t)(void *vmi); typedef int (*vmi_get_register_t)(void *vmi, int vcpu, int num, uint64_t *retval); @@ -204,13 +209,13 @@ int vm_get_x2apic_state(struct vm *vm, i int vm_set_x2apic_state(struct vm *vm, int vcpu, enum x2apic_state state); int vm_apicid2vcpuid(struct vm *vm, int apicid); int vm_activate_cpu(struct vm *vm, int vcpu); -cpuset_t vm_active_cpus(struct vm *vm); -cpuset_t vm_suspended_cpus(struct vm *vm); struct vm_exit *vm_exitinfo(struct vm *vm, int vcpuid); void vm_exit_suspended(struct vm *vm, int vcpuid, uint64_t rip); void vm_exit_rendezvous(struct vm *vm, int vcpuid, uint64_t rip); void vm_exit_astpending(struct vm *vm, int vcpuid, uint64_t rip); +void vm_exit_reqidle(struct vm *vm, int vcpuid, uint64_t rip); +#ifdef _SYS__CPUSET_H_ /* * Rendezvous all vcpus specified in 'dest' and execute 'func(arg)'. * The rendezvous 'func(arg)' is not allowed to do anything that will @@ -228,19 +233,29 @@ void vm_exit_astpending(struct vm *vm, i typedef void (*vm_rendezvous_func_t)(struct vm *vm, int vcpuid, void *arg); void vm_smp_rendezvous(struct vm *vm, int vcpuid, cpuset_t dest, vm_rendezvous_func_t func, void *arg); +cpuset_t vm_active_cpus(struct vm *vm); +cpuset_t vm_suspended_cpus(struct vm *vm); +#endif /* _SYS__CPUSET_H_ */ static __inline int -vcpu_rendezvous_pending(void *rendezvous_cookie) +vcpu_rendezvous_pending(struct vm_eventinfo *info) { - return (*(uintptr_t *)rendezvous_cookie != 0); + return (*((uintptr_t *)(info->rptr)) != 0); } static __inline int -vcpu_suspended(void *suspend_cookie) +vcpu_suspended(struct vm_eventinfo *info) { - return (*(int *)suspend_cookie); + return (*info->sptr); +} + +static __inline int +vcpu_reqidle(struct vm_eventinfo *info) +{ + + return (*info->iptr); } /* @@ -274,7 +289,13 @@ vcpu_is_running(struct vm *vm, int vcpu, static int __inline vcpu_should_yield(struct vm *vm, int vcpu) { - return (curthread->td_flags & (TDF_ASTPENDING | TDF_NEEDRESCHED)); + + if (curthread->td_flags & (TDF_ASTPENDING | TDF_NEEDRESCHED)) + return (1); + else if (curthread->td_owepreempt) + return (1); + else + return (0); } #endif @@ -343,9 +364,10 @@ struct vm_copyinfo { * at 'gla' and 'len' bytes long. The 'prot' should be set to PROT_READ for * a copyin or PROT_WRITE for a copyout. * - * Returns 0 on success. - * Returns 1 if an exception was injected into the guest. - * Returns -1 otherwise. + * retval is_fault Intepretation + * 0 0 Success + * 0 1 An exception was injected into the guest + * EFAULT N/A Unrecoverable error * * The 'copyinfo[]' can be passed to 'vm_copyin()' or 'vm_copyout()' only if * the return value is 0. The 'copyinfo[]' resources should be freed by calling @@ -353,7 +375,7 @@ struct vm_copyinfo { */ int vm_copy_setup(struct vm *vm, int vcpuid, struct vm_guest_paging *paging, uint64_t gla, size_t len, int prot, struct vm_copyinfo *copyinfo, - int num_copyinfo); + int num_copyinfo, int *is_fault); void vm_copy_teardown(struct vm *vm, int vcpuid, struct vm_copyinfo *copyinfo, int num_copyinfo); void vm_copyin(struct vm *vm, int vcpuid, struct vm_copyinfo *copyinfo, @@ -497,6 +519,7 @@ enum vm_exitcode { VM_EXITCODE_MONITOR, VM_EXITCODE_MWAIT, VM_EXITCODE_SVM, + VM_EXITCODE_REQIDLE, VM_EXITCODE_MAX }; Modified: stable/10/sys/amd64/include/vmm_instruction_emul.h ============================================================================== --- stable/10/sys/amd64/include/vmm_instruction_emul.h Sun Jun 28 01:21:55 2015 (r284899) +++ stable/10/sys/amd64/include/vmm_instruction_emul.h Sun Jun 28 03:22:26 2015 (r284900) @@ -81,17 +81,19 @@ int vie_calculate_gla(enum vm_cpu_mode c */ int vmm_fetch_instruction(struct vm *vm, int cpuid, struct vm_guest_paging *guest_paging, - uint64_t rip, int inst_length, struct vie *vie); + uint64_t rip, int inst_length, struct vie *vie, + int *is_fault); /* * Translate the guest linear address 'gla' to a guest physical address. * - * Returns 0 on success and '*gpa' contains the result of the translation. - * Returns 1 if an exception was injected into the guest. - * Returns -1 otherwise. + * retval is_fault Interpretation + * 0 0 'gpa' contains result of the translation + * 0 1 An exception was injected into the guest + * EFAULT N/A An unrecoverable hypervisor error occurred */ int vm_gla2gpa(struct vm *vm, int vcpuid, struct vm_guest_paging *paging, - uint64_t gla, int prot, uint64_t *gpa); + uint64_t gla, int prot, uint64_t *gpa, int *is_fault); void vie_init(struct vie *vie, const char *inst_bytes, int inst_length); Modified: stable/10/sys/amd64/vmm/amd/amdv.c ============================================================================== --- stable/10/sys/amd64/vmm/amd/amdv.c Sun Jun 28 01:21:55 2015 (r284899) +++ stable/10/sys/amd64/vmm/amd/amdv.c Sun Jun 28 03:22:26 2015 (r284900) @@ -32,7 +32,6 @@ __FBSDID("$FreeBSD$"); #include #include #include -#include #include #include "io/iommu.h" Modified: stable/10/sys/amd64/vmm/amd/svm.c ============================================================================== --- stable/10/sys/amd64/vmm/amd/svm.c Sun Jun 28 01:21:55 2015 (r284899) +++ stable/10/sys/amd64/vmm/amd/svm.c Sun Jun 28 03:22:26 2015 (r284900) @@ -102,8 +102,8 @@ static MALLOC_DEFINE(M_SVM_VLAPIC, "svm- /* Per-CPU context area. */ extern struct pcpu __pcpu[]; -static uint32_t svm_feature; /* AMD SVM features. */ -SYSCTL_UINT(_hw_vmm_svm, OID_AUTO, features, CTLFLAG_RD, &svm_feature, 0, +static uint32_t svm_feature = ~0U; /* AMD SVM features. */ +SYSCTL_UINT(_hw_vmm_svm, OID_AUTO, features, CTLFLAG_RDTUN, &svm_feature, 0, "SVM features advertised by CPUID.8000000AH:EDX"); static int disable_npf_assist; @@ -112,7 +112,7 @@ SYSCTL_INT(_hw_vmm_svm, OID_AUTO, disabl /* Maximum ASIDs supported by the processor */ static uint32_t nasid; -SYSCTL_UINT(_hw_vmm_svm, OID_AUTO, num_asids, CTLFLAG_RD, &nasid, 0, +SYSCTL_UINT(_hw_vmm_svm, OID_AUTO, num_asids, CTLFLAG_RDTUN, &nasid, 0, "Number of ASIDs supported by this processor"); /* Current ASID generation for each host cpu */ @@ -174,9 +174,14 @@ check_svm_features(void) /* CPUID Fn8000_000A is for SVM */ do_cpuid(0x8000000A, regs); - svm_feature = regs[3]; + svm_feature &= regs[3]; - nasid = regs[1]; + /* + * The number of ASIDs can be configured to be less than what is + * supported by the hardware but not more. + */ + if (nasid == 0 || nasid > regs[1]) + nasid = regs[1]; KASSERT(nasid > 1, ("Insufficient ASIDs for guests: %#x", nasid)); /* bhyve requires the Nested Paging feature */ @@ -564,6 +569,19 @@ svm_vminit(struct vm *vm, pmap_t pmap) return (svm_sc); } +/* + * Collateral for a generic SVM VM-exit. + */ +static void +vm_exit_svm(struct vm_exit *vme, uint64_t code, uint64_t info1, uint64_t info2) +{ + + vme->exitcode = VM_EXITCODE_SVM; + vme->u.svm.exitcode = code; + vme->u.svm.exitinfo1 = info1; + vme->u.svm.exitinfo2 = info2; +} + static int svm_cpl(struct vmcb_state *state) { @@ -1080,6 +1098,76 @@ clear_nmi_blocking(struct svm_softc *sc, KASSERT(!error, ("%s: error %d setting intr_shadow", __func__, error)); } +#define EFER_MBZ_BITS 0xFFFFFFFFFFFF0200UL + +static int +svm_write_efer(struct svm_softc *sc, int vcpu, uint64_t newval, bool *retu) +{ + struct vm_exit *vme; + struct vmcb_state *state; + uint64_t changed, lma, oldval; + int error; + + state = svm_get_vmcb_state(sc, vcpu); + + oldval = state->efer; + VCPU_CTR2(sc->vm, vcpu, "wrmsr(efer) %#lx/%#lx", oldval, newval); + + newval &= ~0xFE; /* clear the Read-As-Zero (RAZ) bits */ + changed = oldval ^ newval; + + if (newval & EFER_MBZ_BITS) + goto gpf; + + /* APMv2 Table 14-5 "Long-Mode Consistency Checks" */ + if (changed & EFER_LME) { + if (state->cr0 & CR0_PG) + goto gpf; + } + + /* EFER.LMA = EFER.LME & CR0.PG */ + if ((newval & EFER_LME) != 0 && (state->cr0 & CR0_PG) != 0) + lma = EFER_LMA; + else + lma = 0; + + if ((newval & EFER_LMA) != lma) + goto gpf; + + if (newval & EFER_NXE) { + if (!vm_cpuid_capability(sc->vm, vcpu, VCC_NO_EXECUTE)) + goto gpf; + } + + /* + * XXX bhyve does not enforce segment limits in 64-bit mode. Until + * this is fixed flag guest attempt to set EFER_LMSLE as an error. + */ + if (newval & EFER_LMSLE) { + vme = vm_exitinfo(sc->vm, vcpu); + vm_exit_svm(vme, VMCB_EXIT_MSR, 1, 0); + *retu = true; + return (0); + } + + if (newval & EFER_FFXSR) { + if (!vm_cpuid_capability(sc->vm, vcpu, VCC_FFXSR)) + goto gpf; + } + + if (newval & EFER_TCE) { + if (!vm_cpuid_capability(sc->vm, vcpu, VCC_TCE)) + goto gpf; + } + + error = svm_setreg(sc, vcpu, VM_REG_GUEST_EFER, newval); + KASSERT(error == 0, ("%s: error %d updating efer", __func__, error)); + return (0); +gpf: + vm_inject_gp(sc->vm, vcpu); + return (0); +} + static int emulate_wrmsr(struct svm_softc *sc, int vcpu, u_int num, uint64_t val, bool *retu) @@ -1089,7 +1177,7 @@ emulate_wrmsr(struct svm_softc *sc, int if (lapic_msr(num)) error = lapic_wrmsr(sc->vm, vcpu, num, val, retu); else if (num == MSR_EFER) - error = svm_setreg(sc, vcpu, VM_REG_GUEST_EFER, val); + error = svm_write_efer(sc, vcpu, val, retu); else error = svm_wrmsr(sc, vcpu, num, val, retu); @@ -1189,19 +1277,6 @@ nrip_valid(uint64_t exitcode) } } -/* - * Collateral for a generic SVM VM-exit. - */ -static void -vm_exit_svm(struct vm_exit *vme, uint64_t code, uint64_t info1, uint64_t info2) -{ - - vme->exitcode = VM_EXITCODE_SVM; - vme->u.svm.exitcode = code; - vme->u.svm.exitinfo1 = info1; - vme->u.svm.exitinfo2 = info2; -} - static int svm_vmexit(struct svm_softc *svm_sc, int vcpu, struct vm_exit *vmexit) { @@ -1830,7 +1905,7 @@ enable_gintr(void) */ static int svm_vmrun(void *arg, int vcpu, register_t rip, pmap_t pmap, - void *rend_cookie, void *suspended_cookie) + struct vm_eventinfo *evinfo) { struct svm_regctx *gctx; struct svm_softc *svm_sc; @@ -1905,18 +1980,24 @@ svm_vmrun(void *arg, int vcpu, register_ */ disable_gintr(); - if (vcpu_suspended(suspended_cookie)) { + if (vcpu_suspended(evinfo)) { enable_gintr(); vm_exit_suspended(vm, vcpu, state->rip); break; } - if (vcpu_rendezvous_pending(rend_cookie)) { + if (vcpu_rendezvous_pending(evinfo)) { enable_gintr(); vm_exit_rendezvous(vm, vcpu, state->rip); break; } + if (vcpu_reqidle(evinfo)) { + enable_gintr(); + vm_exit_reqidle(vm, vcpu, state->rip); + break; + } + /* We are asked to give the cpu by scheduler. */ if (vcpu_should_yield(vm, vcpu)) { enable_gintr(); Modified: stable/10/sys/amd64/vmm/amd/svm_msr.c ============================================================================== --- stable/10/sys/amd64/vmm/amd/svm_msr.c Sun Jun 28 01:21:55 2015 (r284899) +++ stable/10/sys/amd64/vmm/amd/svm_msr.c Sun Jun 28 03:22:26 2015 (r284900) @@ -27,12 +27,17 @@ #include __FBSDID("$FreeBSD$"); -#include +#include #include +#include #include #include +#include +#include "svm.h" +#include "vmcb.h" +#include "svm_softc.h" #include "svm_msr.h" #ifndef MSR_AMDK8_IPM @@ -105,6 +110,18 @@ svm_rdmsr(struct svm_softc *sc, int vcpu int error = 0; switch (num) { + case MSR_MCG_CAP: + case MSR_MCG_STATUS: + *result = 0; + break; + case MSR_MTRRcap: + case MSR_MTRRdefType: + case MSR_MTRR4kBase ... MSR_MTRR4kBase + 8: + case MSR_MTRR16kBase ... MSR_MTRR16kBase + 1: + case MSR_MTRR64kBase: + case MSR_SYSCFG: + *result = 0; + break; case MSR_AMDK8_IPM: *result = 0; break; @@ -122,6 +139,18 @@ svm_wrmsr(struct svm_softc *sc, int vcpu int error = 0; switch (num) { + case MSR_MCG_CAP: + case MSR_MCG_STATUS: + break; /* ignore writes */ + case MSR_MTRRcap: + vm_inject_gp(sc->vm, vcpu); + break; + case MSR_MTRRdefType: + case MSR_MTRR4kBase ... MSR_MTRR4kBase + 8: + case MSR_MTRR16kBase ... MSR_MTRR16kBase + 1: + case MSR_MTRR64kBase: + case MSR_SYSCFG: + break; /* Ignore writes */ case MSR_AMDK8_IPM: /* * Ignore writes to the "Interrupt Pending Message" MSR. Modified: stable/10/sys/amd64/vmm/amd/vmcb.c ============================================================================== --- stable/10/sys/amd64/vmm/amd/vmcb.c Sun Jun 28 01:21:55 2015 (r284899) +++ stable/10/sys/amd64/vmm/amd/vmcb.c Sun Jun 28 03:22:26 2015 (r284900) @@ -29,7 +29,6 @@ __FBSDID("$FreeBSD$"); #include #include -#include #include #include Modified: stable/10/sys/amd64/vmm/intel/vmx.c ============================================================================== --- stable/10/sys/amd64/vmm/intel/vmx.c Sun Jun 28 01:21:55 2015 (r284899) +++ stable/10/sys/amd64/vmm/intel/vmx.c Sun Jun 28 03:22:26 2015 (r284900) @@ -857,10 +857,11 @@ vmx_vminit(struct vm *vm, pmap_t pmap) * VM exit and entry respectively. It is also restored from the * host VMCS area on a VM exit. * - * The TSC MSR is exposed read-only. Writes are disallowed as that - * will impact the host TSC. - * XXX Writes would be implemented with a wrmsr trap, and - * then modifying the TSC offset in the VMCS. + * The TSC MSR is exposed read-only. Writes are disallowed as + * that will impact the host TSC. If the guest does a write + * the "use TSC offsetting" execution control is enabled and the + * difference between the host TSC and the guest TSC is written + * into the TSC offset in the VMCS. */ if (guest_msr_rw(vmx, MSR_GSBASE) || guest_msr_rw(vmx, MSR_FSBASE) || @@ -1131,6 +1132,22 @@ vmx_clear_nmi_window_exiting(struct vmx VCPU_CTR0(vmx->vm, vcpu, "Disabling NMI window exiting"); } +int +vmx_set_tsc_offset(struct vmx *vmx, int vcpu, uint64_t offset) +{ + int error; + + if ((vmx->cap[vcpu].proc_ctls & PROCBASED_TSC_OFFSET) == 0) { + vmx->cap[vcpu].proc_ctls |= PROCBASED_TSC_OFFSET; + vmcs_write(VMCS_PRI_PROC_BASED_CTLS, vmx->cap[vcpu].proc_ctls); + VCPU_CTR0(vmx->vm, vcpu, "Enabling TSC offsetting"); + } + + error = vmwrite(VMCS_TSC_OFFSET, offset); + + return (error); +} + #define NMI_BLOCKING (VMCS_INTERRUPTIBILITY_NMI_BLOCKING | \ VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING) #define HWINTR_BLOCKING (VMCS_INTERRUPTIBILITY_STI_BLOCKING | \ @@ -1781,6 +1798,7 @@ vmexit_inst_emul(struct vm_exit *vmexit, paging = &vmexit->u.inst_emul.paging; vmexit->exitcode = VM_EXITCODE_INST_EMUL; + vmexit->inst_length = 0; vmexit->u.inst_emul.gpa = gpa; vmexit->u.inst_emul.gla = gla; vmx_paging_info(paging); @@ -2554,7 +2572,7 @@ vmx_exit_handle_nmi(struct vmx *vmx, int static int vmx_run(void *arg, int vcpu, register_t rip, pmap_t pmap, - void *rendezvous_cookie, void *suspend_cookie) + struct vm_eventinfo *evinfo) { int rc, handled, launched; struct vmx *vmx; @@ -2623,18 +2641,24 @@ vmx_run(void *arg, int vcpu, register_t * vmx_inject_interrupts() can suspend the vcpu due to a * triple fault. */ - if (vcpu_suspended(suspend_cookie)) { + if (vcpu_suspended(evinfo)) { enable_intr(); vm_exit_suspended(vmx->vm, vcpu, rip); break; } - if (vcpu_rendezvous_pending(rendezvous_cookie)) { + if (vcpu_rendezvous_pending(evinfo)) { enable_intr(); vm_exit_rendezvous(vmx->vm, vcpu, rip); break; } + if (vcpu_reqidle(evinfo)) { + enable_intr(); + vm_exit_reqidle(vmx->vm, vcpu, rip); + break; + } + if (vcpu_should_yield(vm, vcpu)) { enable_intr(); vm_exit_astpending(vmx->vm, vcpu, rip); Modified: stable/10/sys/amd64/vmm/intel/vmx.h ============================================================================== --- stable/10/sys/amd64/vmm/intel/vmx.h Sun Jun 28 01:21:55 2015 (r284899) +++ stable/10/sys/amd64/vmm/intel/vmx.h Sun Jun 28 03:22:26 2015 (r284900) @@ -135,6 +135,8 @@ void vmx_call_isr(uintptr_t entry); u_long vmx_fix_cr0(u_long cr0); u_long vmx_fix_cr4(u_long cr4); +int vmx_set_tsc_offset(struct vmx *vmx, int vcpu, uint64_t offset); + extern char vmx_exit_guest[]; #endif Modified: stable/10/sys/amd64/vmm/intel/vmx_msr.c ============================================================================== --- stable/10/sys/amd64/vmm/intel/vmx_msr.c Sun Jun 28 01:21:55 2015 (r284899) +++ stable/10/sys/amd64/vmm/intel/vmx_msr.c Sun Jun 28 03:22:26 2015 (r284900) @@ -31,7 +31,6 @@ __FBSDID("$FreeBSD$"); #include #include -#include #include #include @@ -396,6 +395,17 @@ vmx_rdmsr(struct vmx *vmx, int vcpuid, u error = 0; switch (num) { + case MSR_MCG_CAP: + case MSR_MCG_STATUS: + *val = 0; + break; + case MSR_MTRRcap: + case MSR_MTRRdefType: + case MSR_MTRR4kBase ... MSR_MTRR4kBase + 8: + case MSR_MTRR16kBase ... MSR_MTRR16kBase + 1: + case MSR_MTRR64kBase: + *val = 0; + break; case MSR_IA32_MISC_ENABLE: *val = misc_enable; break; @@ -427,6 +437,17 @@ vmx_wrmsr(struct vmx *vmx, int vcpuid, u error = 0; switch (num) { + case MSR_MCG_CAP: + case MSR_MCG_STATUS: + break; /* ignore writes */ + case MSR_MTRRcap: + vm_inject_gp(vmx->vm, vcpuid); + break; + case MSR_MTRRdefType: + case MSR_MTRR4kBase ... MSR_MTRR4kBase + 8: + case MSR_MTRR16kBase ... MSR_MTRR16kBase + 1: + case MSR_MTRR64kBase: + break; /* Ignore writes */ case MSR_IA32_MISC_ENABLE: changed = val ^ misc_enable; /* @@ -453,6 +474,9 @@ vmx_wrmsr(struct vmx *vmx, int vcpuid, u else vm_inject_gp(vmx->vm, vcpuid); break; + case MSR_TSC: + error = vmx_set_tsc_offset(vmx, vcpuid, val - rdtsc()); + break; default: error = EINVAL; break; Modified: stable/10/sys/amd64/vmm/io/vatpic.c ============================================================================== --- stable/10/sys/amd64/vmm/io/vatpic.c Sun Jun 28 01:21:55 2015 (r284899) +++ stable/10/sys/amd64/vmm/io/vatpic.c Sun Jun 28 03:22:26 2015 (r284900) @@ -30,7 +30,6 @@ __FBSDID("$FreeBSD$"); #include #include #include -#include #include #include #include Modified: stable/10/sys/amd64/vmm/io/vatpit.c ============================================================================== --- stable/10/sys/amd64/vmm/io/vatpit.c Sun Jun 28 01:21:55 2015 (r284899) +++ stable/10/sys/amd64/vmm/io/vatpit.c Sun Jun 28 03:22:26 2015 (r284900) @@ -31,7 +31,6 @@ __FBSDID("$FreeBSD$"); #include #include #include -#include #include #include #include Modified: stable/10/sys/amd64/vmm/io/vhpet.c ============================================================================== --- stable/10/sys/amd64/vmm/io/vhpet.c Sun Jun 28 01:21:55 2015 (r284899) +++ stable/10/sys/amd64/vmm/io/vhpet.c Sun Jun 28 03:22:26 2015 (r284900) @@ -36,7 +36,6 @@ __FBSDID("$FreeBSD$"); #include #include #include -#include #include Modified: stable/10/sys/amd64/vmm/io/vioapic.c ============================================================================== --- stable/10/sys/amd64/vmm/io/vioapic.c Sun Jun 28 01:21:55 2015 (r284899) +++ stable/10/sys/amd64/vmm/io/vioapic.c Sun Jun 28 03:22:26 2015 (r284900) @@ -32,7 +32,6 @@ __FBSDID("$FreeBSD$"); #include #include -#include #include #include #include Modified: stable/10/sys/amd64/vmm/io/vlapic.c ============================================================================== --- stable/10/sys/amd64/vmm/io/vlapic.c Sun Jun 28 01:21:55 2015 (r284899) +++ stable/10/sys/amd64/vmm/io/vlapic.c Sun Jun 28 03:22:26 2015 (r284900) @@ -548,6 +548,8 @@ vlapic_update_ppr(struct vlapic *vlapic) VLAPIC_CTR1(vlapic, "vlapic_update_ppr 0x%02x", ppr); } +static VMM_STAT(VLAPIC_GRATUITOUS_EOI, "EOI without any in-service interrupt"); + static void vlapic_process_eoi(struct vlapic *vlapic) { @@ -558,11 +560,7 @@ vlapic_process_eoi(struct vlapic *vlapic isrptr = &lapic->isr0; tmrptr = &lapic->tmr0; - /* - * The x86 architecture reserves the the first 32 vectors for use - * by the processor. - */ - for (i = 7; i > 0; i--) { + for (i = 7; i >= 0; i--) { idx = i * 4; bitpos = fls(isrptr[idx]); if (bitpos-- != 0) { @@ -571,17 +569,21 @@ vlapic_process_eoi(struct vlapic *vlapic vlapic->isrvec_stk_top); } isrptr[idx] &= ~(1 << bitpos); + vector = i * 32 + bitpos; + VCPU_CTR1(vlapic->vm, vlapic->vcpuid, "EOI vector %d", + vector); VLAPIC_CTR_ISR(vlapic, "vlapic_process_eoi"); vlapic->isrvec_stk_top--; vlapic_update_ppr(vlapic); if ((tmrptr[idx] & (1 << bitpos)) != 0) { - vector = i * 32 + bitpos; vioapic_process_eoi(vlapic->vm, vlapic->vcpuid, vector); } return; } } + VCPU_CTR0(vlapic->vm, vlapic->vcpuid, "Gratuitous EOI"); + vmm_stat_incr(vlapic->vm, vlapic->vcpuid, VLAPIC_GRATUITOUS_EOI, 1); } static __inline int @@ -1093,11 +1095,7 @@ vlapic_pending_intr(struct vlapic *vlapi irrptr = &lapic->irr0; - /* - * The x86 architecture reserves the the first 32 vectors for use - * by the processor. - */ - for (i = 7; i > 0; i--) { + for (i = 7; i >= 0; i--) { idx = i * 4; val = atomic_load_acq_int(&irrptr[idx]); bitpos = fls(val); Modified: stable/10/sys/amd64/vmm/io/vpmtmr.c ============================================================================== --- stable/10/sys/amd64/vmm/io/vpmtmr.c Sun Jun 28 01:21:55 2015 (r284899) +++ stable/10/sys/amd64/vmm/io/vpmtmr.c Sun Jun 28 03:22:26 2015 (r284900) @@ -29,7 +29,6 @@ __FBSDID("$FreeBSD$"); #include #include -#include #include #include #include Modified: stable/10/sys/amd64/vmm/io/vrtc.c ============================================================================== --- stable/10/sys/amd64/vmm/io/vrtc.c Sun Jun 28 01:21:55 2015 (r284899) +++ stable/10/sys/amd64/vmm/io/vrtc.c Sun Jun 28 03:22:26 2015 (r284900) @@ -30,7 +30,6 @@ __FBSDID("$FreeBSD$"); #include #include #include -#include #include #include #include @@ -142,20 +141,23 @@ update_enabled(struct vrtc *vrtc) } static time_t -vrtc_curtime(struct vrtc *vrtc) +vrtc_curtime(struct vrtc *vrtc, sbintime_t *basetime) { sbintime_t now, delta; - time_t t; + time_t t, secs; KASSERT(VRTC_LOCKED(vrtc), ("%s: vrtc not locked", __func__)); t = vrtc->base_rtctime; + *basetime = vrtc->base_uptime; if (update_enabled(vrtc)) { now = sbinuptime(); delta = now - vrtc->base_uptime; KASSERT(delta >= 0, ("vrtc_curtime: uptime went backwards: " "%#lx to %#lx", vrtc->base_uptime, now)); - t += delta / SBT_1S; + secs = delta / SBT_1S; + t += secs; + *basetime += secs * SBT_1S; } return (t); } @@ -390,9 +392,10 @@ fail: } static int -vrtc_time_update(struct vrtc *vrtc, time_t newtime) +vrtc_time_update(struct vrtc *vrtc, time_t newtime, sbintime_t newbase) { struct rtcdev *rtc; + sbintime_t oldbase; time_t oldtime; uint8_t alarm_sec, alarm_min, alarm_hour; @@ -404,16 +407,21 @@ vrtc_time_update(struct vrtc *vrtc, time alarm_hour = rtc->alarm_hour; oldtime = vrtc->base_rtctime; - VM_CTR2(vrtc->vm, "Updating RTC time from %#lx to %#lx", + VM_CTR2(vrtc->vm, "Updating RTC secs from %#lx to %#lx", oldtime, newtime); + oldbase = vrtc->base_uptime; + VM_CTR2(vrtc->vm, "Updating RTC base uptime from %#lx to %#lx", + oldbase, newbase); + vrtc->base_uptime = newbase; + if (newtime == oldtime) return (0); /* * If 'newtime' indicates that RTC updates are disabled then just * record that and return. There is no need to do alarm interrupt - * processing or update 'base_uptime' in this case. + * processing in this case. */ if (newtime == VRTC_BROKEN_TIME) { vrtc->base_rtctime = VRTC_BROKEN_TIME; @@ -459,8 +467,6 @@ vrtc_time_update(struct vrtc *vrtc, time if (uintr_enabled(vrtc)) vrtc_set_reg_c(vrtc, rtc->reg_c | RTCIR_UPDATE); - vrtc->base_uptime = sbinuptime(); - return (0); } @@ -531,7 +537,7 @@ static void vrtc_callout_handler(void *arg) { struct vrtc *vrtc = arg; - sbintime_t freqsbt; + sbintime_t freqsbt, basetime; time_t rtctime; int error; @@ -553,8 +559,8 @@ vrtc_callout_handler(void *arg) vrtc_set_reg_c(vrtc, vrtc->rtcdev.reg_c | RTCIR_PERIOD); if (aintr_enabled(vrtc) || uintr_enabled(vrtc)) { - rtctime = vrtc_curtime(vrtc); - error = vrtc_time_update(vrtc, rtctime); + rtctime = vrtc_curtime(vrtc, &basetime); + error = vrtc_time_update(vrtc, rtctime, basetime); KASSERT(error == 0, ("%s: vrtc_time_update error %d", __func__, error)); } @@ -619,7 +625,7 @@ static int vrtc_set_reg_b(struct vrtc *vrtc, uint8_t newval) { struct rtcdev *rtc; - sbintime_t oldfreq, newfreq; + sbintime_t oldfreq, newfreq, basetime; time_t curtime, rtctime; int error; uint8_t oldval, changed; @@ -640,12 +646,13 @@ vrtc_set_reg_b(struct vrtc *vrtc, uint8_ if (changed & RTCSB_HALT) { if ((newval & RTCSB_HALT) == 0) { rtctime = rtc_to_secs(vrtc); + basetime = sbinuptime(); if (rtctime == VRTC_BROKEN_TIME) { if (rtc_flag_broken_time) return (-1); } } else { - curtime = vrtc_curtime(vrtc); + curtime = vrtc_curtime(vrtc, &basetime); KASSERT(curtime == vrtc->base_rtctime, ("%s: mismatch " "between vrtc basetime (%#lx) and curtime (%#lx)", __func__, vrtc->base_rtctime, curtime)); @@ -664,7 +671,7 @@ vrtc_set_reg_b(struct vrtc *vrtc, uint8_ rtctime = VRTC_BROKEN_TIME; rtc->reg_b &= ~RTCSB_UINTR; } - error = vrtc_time_update(vrtc, rtctime); + error = vrtc_time_update(vrtc, rtctime, basetime); KASSERT(error == 0, ("vrtc_time_update error %d", error)); } @@ -744,7 +751,7 @@ vrtc_set_time(struct vm *vm, time_t secs vrtc = vm_rtc(vm); VRTC_LOCK(vrtc); - error = vrtc_time_update(vrtc, secs); + error = vrtc_time_update(vrtc, secs, sbinuptime()); VRTC_UNLOCK(vrtc); if (error) { @@ -761,11 +768,12 @@ time_t vrtc_get_time(struct vm *vm) { struct vrtc *vrtc; + sbintime_t basetime; time_t t; vrtc = vm_rtc(vm); VRTC_LOCK(vrtc); - t = vrtc_curtime(vrtc); + t = vrtc_curtime(vrtc, &basetime); VRTC_UNLOCK(vrtc); return (t); @@ -802,6 +810,7 @@ int vrtc_nvram_read(struct vm *vm, int offset, uint8_t *retval) { struct vrtc *vrtc; + sbintime_t basetime; time_t curtime; uint8_t *ptr; @@ -818,7 +827,7 @@ vrtc_nvram_read(struct vm *vm, int offse * Update RTC date/time fields if necessary. */ if (offset < 10 || offset == RTC_CENTURY) { - curtime = vrtc_curtime(vrtc); + curtime = vrtc_curtime(vrtc, &basetime); secs_to_rtc(curtime, vrtc, 0); } @@ -858,6 +867,7 @@ vrtc_data_handler(struct vm *vm, int vcp { struct vrtc *vrtc; struct rtcdev *rtc; + sbintime_t basetime; time_t curtime; int error, offset; @@ -875,8 +885,8 @@ vrtc_data_handler(struct vm *vm, int vcp } error = 0; - curtime = vrtc_curtime(vrtc); - vrtc_time_update(vrtc, curtime); + curtime = vrtc_curtime(vrtc, &basetime); + vrtc_time_update(vrtc, curtime, basetime); /* * Update RTC date/time fields if necessary. @@ -939,7 +949,7 @@ vrtc_data_handler(struct vm *vm, int vcp */ if (offset == RTC_CENTURY && !rtc_halted(vrtc)) { curtime = rtc_to_secs(vrtc); - error = vrtc_time_update(vrtc, curtime); + error = vrtc_time_update(vrtc, curtime, sbinuptime()); KASSERT(!error, ("vrtc_time_update error %d", error)); if (curtime == VRTC_BROKEN_TIME && rtc_flag_broken_time) error = -1; @@ -993,7 +1003,7 @@ vrtc_init(struct vm *vm) *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***