From owner-freebsd-virtualization@freebsd.org Sat Dec 28 10:03:57 2019 Return-Path: Delivered-To: freebsd-virtualization@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id DDBC81E4223 for ; Sat, 28 Dec 2019 10:03:57 +0000 (UTC) (envelope-from Michael@reifenberger.com) Received: from app.eeeit.de (app.eeeit.de [188.68.43.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 47lK642hrmz4DMq for ; Sat, 28 Dec 2019 10:03:55 +0000 (UTC) (envelope-from Michael@reifenberger.com) Received: from localhost (localhost [127.0.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: mike@reifenberger.com) by app.eeeit.de (Postfix) with ESMTPSA id D9B9249B3A; Sat, 28 Dec 2019 11:03:46 +0100 (CET) Received: from ip-109-41-193-170.web.vodafone.de (ip-109-41-193-170.web.vodafone.de [109.41.193.170]) by app.eeeit.de (Horde Framework) with HTTPS; Sat, 28 Dec 2019 10:03:46 +0000 Date: Sat, 28 Dec 2019 10:03:46 +0000 Message-ID: <20191228100346.Horde.4Vi4FztkxxFnoeK7oGlIZe-@app.eeeit.de> From: Michael Reifenberger To: Yamagi Cc: freebsd-virtualization@freebsd.org Subject: Re: [PATCH] Untangle TPR shadowing and APIC virtualization / Make Win guests on Bhyve _fast_ In-Reply-To: <20191221202546.caca1f242a907cf50b5562e3@yamagi.org> User-Agent: Horde Application Framework 5 Content-Type: text/plain; charset=utf-8; format=flowed; DelSp=Yes MIME-Version: 1.0 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 47lK642hrmz4DMq X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of Michael@reifenberger.com has no SPF policy when checking 188.68.43.176) smtp.mailfrom=Michael@reifenberger.com X-Spamd-Result: default: False [-1.18 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-0.98)[-0.978,0]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; NEURAL_HAM_LONG(-0.94)[-0.942,0]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[reifenberger.com]; AUTH_NA(1.00)[]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; R_SPF_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:197540, ipnet:188.68.32.0/20, country:DE]; RCVD_TLS_ALL(0.00)[]; IP_SCORE(-0.16)[asn: 197540(-0.80), country: DE(-0.02)]; RECEIVED_SPAMHAUS_PBL(0.00)[170.193.41.109.khpj7ygk5idzvmvt5x4ziurxhy.zen.dq.spamhaus.net : 127.0.0.11] X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Dec 2019 10:03:57 -0000 Hi, did you already get a reply from an developer for review? Could you open a PR/DR for this patch. I would like to review and commit your patch after further tests. Thanks! --- mike (also mr@freebsd.org) Zitat von Yamagi : > Hi, > a long known problem with Bhyve is that Windows guests are rather slow. > With Windows 10 1903 this became much worse, to the point that the > guest is unusable. I have found the reason for this: Windows hammers on > the %cr8 control register. For example, Windows 10 1909 on an i7-2620M > has about 68,000 %cr8 accesses per second. Each of them triggers a vm > exit. > > The most common solution is TPR shadowing. Many thanks to royger in > #bhyve for getting me on the right track. Bhyve already implements TPR > shadowing. On AMD SVM it just works, but the implementation for Intel > VT-x is bound to APIC virtualization. And APIC virtualization is a Xeon > feature that is missing on most (all?) desktop CPUs. > > The patch - further down inline or under [0] - separates TPR shadowing > from APIC virtualization, so TPR shadowing can be used on desktop CPUs > as well. The patch doesn't just give a small speed boost, it's a > difference like day and night. As an example, without the patch, the > installation of Windows 10 1909 takes about 2280 seconds from start to > first reboot. With the patch, only 370 seconds. On an old Thinkpad > X220, Windows 10 guests were previously unusable, now they are resonable > fast. > > The patch does: > > * Add a new tuneable 'hw.vmm.vmx.use_tpr_shadowing' to disable TLP > shadowing. Also add 'hw.vmm.vmx.cap.tpr_shadowing' to be able to query > if TPR shadowing is used. > > * Detach the initialization of TPR shadowing from the initialization of > APIC virtualization. APIC virtualization still needs TPR shadowing, > but not vice versa. Any CPU that supports APIC virtualization should > also support TPR shadowing. > > * When TPR shadowing is used, the APIC page of each vCPU is written to > the VMCS_VIRTUAL_APIC field of the VMCS so that the CPU can write > directly to the page without intercept. > > * On vm exit, vlapic_update_ppr() is called to update the PPR. > > The patch was tested on an i7-2620M, an i7-6700k and a Xeon Silver > 4110. Both Windows and FreeBSD guests work correctly. > > Regards, > Yamagi > > 0: https://gist.github.com/Yamagi/de70c08eadeeef14eec4cb42aeb5957f > > ---- > > diff --git a/sys/amd64/vmm/intel/vmx.c b/sys/amd64/vmm/intel/vmx.c > index 605fd0bda766..324a1e9d0c3c 100644 > --- a/sys/amd64/vmm/intel/vmx.c > +++ b/sys/amd64/vmm/intel/vmx.c > @@ -172,6 +172,10 @@ static int cap_invpcid; > SYSCTL_INT(_hw_vmm_vmx_cap, OID_AUTO, invpcid, CTLFLAG_RD, &cap_invpcid, > 0, "Guests are allowed to use INVPCID"); > > +static int tpr_shadowing; > +SYSCTL_INT(_hw_vmm_vmx_cap, OID_AUTO, tpr_shadowing, CTLFLAG_RD, > + &tpr_shadowing, 0, "TPR shadowin support"); > + > static int virtual_interrupt_delivery; > SYSCTL_INT(_hw_vmm_vmx_cap, OID_AUTO, virtual_interrupt_delivery, > CTLFLAG_RD, > &virtual_interrupt_delivery, 0, "APICv virtual interrupt > delivery support"); > @@ -627,7 +631,7 @@ vmx_restore(void) > static int > vmx_init(int ipinum) > { > - int error, use_tpr_shadow; > + int error; > uint64_t basic, fixed0, fixed1, feature_control; > uint32_t tmp, procbased2_vid_bits; > > @@ -750,6 +754,24 @@ vmx_init(int ipinum) > MSR_VMX_PROCBASED_CTLS2, PROCBASED2_ENABLE_INVPCID, 0, > &tmp) == 0); > > + /* > + * Check support for TPR shadow. > + */ > + error = vmx_set_ctlreg(MSR_VMX_PROCBASED_CTLS, > + MSR_VMX_TRUE_PROCBASED_CTLS, PROCBASED_USE_TPR_SHADOW, 0, > + &tmp); > + if (error == 0) { > + tpr_shadowing = 1; > + TUNABLE_INT_FETCH("hw.vmm.vmx.use_tpr_shadowing", > + &tpr_shadowing); > + } > + > + if (tpr_shadowing) { > + procbased_ctls |= PROCBASED_USE_TPR_SHADOW; > + procbased_ctls &= ~PROCBASED_CR8_LOAD_EXITING; > + procbased_ctls &= ~PROCBASED_CR8_STORE_EXITING; > + } > + > /* > * Check support for virtual interrupt delivery. > */ > @@ -758,13 +780,9 @@ vmx_init(int ipinum) > PROCBASED2_APIC_REGISTER_VIRTUALIZATION | > PROCBASED2_VIRTUAL_INTERRUPT_DELIVERY); > > - use_tpr_shadow = (vmx_set_ctlreg(MSR_VMX_PROCBASED_CTLS, > - MSR_VMX_TRUE_PROCBASED_CTLS, PROCBASED_USE_TPR_SHADOW, 0, > - &tmp) == 0); > - > error = vmx_set_ctlreg(MSR_VMX_PROCBASED_CTLS2, MSR_VMX_PROCBASED_CTLS2, > procbased2_vid_bits, 0, &tmp); > - if (error == 0 && use_tpr_shadow) { > + if (error == 0 && tpr_shadowing) { > virtual_interrupt_delivery = 1; > TUNABLE_INT_FETCH("hw.vmm.vmx.use_apic_vid", > &virtual_interrupt_delivery); > @@ -775,13 +793,6 @@ vmx_init(int ipinum) > procbased_ctls2 |= procbased2_vid_bits; > procbased_ctls2 &= ~PROCBASED2_VIRTUALIZE_X2APIC_MODE; > > - /* > - * No need to emulate accesses to %CR8 if virtual > - * interrupt delivery is enabled. > - */ > - procbased_ctls &= ~PROCBASED_CR8_LOAD_EXITING; > - procbased_ctls &= ~PROCBASED_CR8_STORE_EXITING; > - > /* > * Check for Posted Interrupts only if Virtual Interrupt > * Delivery is enabled. > @@ -1051,10 +1062,13 @@ vmx_vminit(struct vm *vm, pmap_t pmap) > vmx->ctx[i].guest_dr6 = DBREG_DR6_RESERVED1; > error += vmwrite(VMCS_GUEST_DR7, DBREG_DR7_RESERVED1); > > - if (virtual_interrupt_delivery) { > - error += vmwrite(VMCS_APIC_ACCESS, APIC_ACCESS_ADDRESS); > + if (tpr_shadowing) { > error += vmwrite(VMCS_VIRTUAL_APIC, > vtophys(&vmx->apic_page[i])); > + } > + > + if (virtual_interrupt_delivery) { > + error += vmwrite(VMCS_APIC_ACCESS, APIC_ACCESS_ADDRESS); > error += vmwrite(VMCS_EOI_EXIT0, 0); > error += vmwrite(VMCS_EOI_EXIT1, 0); > error += vmwrite(VMCS_EOI_EXIT2, 0); > @@ -2313,6 +2327,14 @@ vmx_exit_process(struct vmx *vmx, int vcpu, > struct vm_exit *vmexit) > } > } > > + /* > + * If 'TPR shadowing' is used, update the local APICs PPR. > + */ > + if (tpr_shadowing) { > + vlapic = vm_lapic(vmx->vm, vcpu); > + vlapic_update_ppr(vlapic); > + } > + > switch (reason) { > case EXIT_REASON_TASK_SWITCH: > ts = &vmexit->u.task_switch; > diff --git a/sys/amd64/vmm/io/vlapic.c b/sys/amd64/vmm/io/vlapic.c > index 74e6cd967396..289fdb7e077d 100644 > --- a/sys/amd64/vmm/io/vlapic.c > +++ b/sys/amd64/vmm/io/vlapic.c > @@ -490,7 +490,7 @@ dump_isrvec_stk(struct vlapic *vlapic) > * Algorithm adopted from section "Interrupt, Task and Processor Priority" > * in Intel Architecture Manual Vol 3a. > */ > -static void > +void > vlapic_update_ppr(struct vlapic *vlapic) > { > int isrvec, tpr, ppr; > diff --git a/sys/amd64/vmm/io/vlapic.h b/sys/amd64/vmm/io/vlapic.h > index 2a5f54003253..71b97feab6bc 100644 > --- a/sys/amd64/vmm/io/vlapic.h > +++ b/sys/amd64/vmm/io/vlapic.h > @@ -74,6 +74,8 @@ void vlapic_post_intr(struct vlapic *vlapic, int > hostcpu, int ipinum); > void vlapic_fire_cmci(struct vlapic *vlapic); > int vlapic_trigger_lvt(struct vlapic *vlapic, int vector); > > +void vlapic_update_ppr(struct vlapic *vlapic); > + > uint64_t vlapic_get_apicbase(struct vlapic *vlapic); > int vlapic_set_apicbase(struct vlapic *vlapic, uint64_t val); > void vlapic_set_x2apic_state(struct vm *vm, int vcpuid, enum > x2apic_state s); > > -- > Homepage: https://www.yamagi.org > Github: https://github.com/yamagi > GPG: 0x1D502515 Gruß --- Michael Reifenberger