Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 20 May 2014 22:11:43 +0200
From:      Willem Jan Withagen <wjw@digiware.nl>
To:        Anish <akgupt3@gmail.com>, Nils Beyer <nbe@renzel.net>
Cc:        FreeBSD virtualization <freebsd-virtualization@freebsd.org>
Subject:   Re: bhyve: svm (amd-v) update
Message-ID:  <537BB6FF.5080909@digiware.nl>
In-Reply-To: <CALnRwMRYgimm5Yr7HgGMjw1NF3kwGcexw4i%2BWs_LnpQAH81NAg@mail.gmail.com>
References:  <045ce77ed17da4bd515bcc3cafe9c7f8@webmail.renzel.net.local> <CALnRwMRYgimm5Yr7HgGMjw1NF3kwGcexw4i%2BWs_LnpQAH81NAg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------030600060806040702030601
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

On 18-5-2014 16:44, Anish wrote:
> Thanks for testing it.
>> Your patch applied cleanly to the working copy of the "bhyve_svm"-project.
> I was then able to merge with HEAD
> (using "theirs-full" on one file) and compile the kernel. So, to me it
> looks OK to commit.
> Yes, that's correct. You have to retain changes in sys/amd64/vmm/amd/amdv.c
> from bhyve_svm branch.
> 
>> Unfortunately, I am still not able to boot CentOS 6.5 using my Phenom
> 1055T. It produces 200% load on the
> host CPU, and the emulated machine generates endlessly:
> Its 200% load because of 2 vcpus to guest. It stuck in loop even with
> single processor(1 vcpu) after PCI probing[debug messages with linux
> .....earlyprintk=serial debug]
> 
> [    3.684243] UDP hash table entries: 1024 (order: 3, 32768 bytes)
> 
> [    3.686484] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes)
> 
> [    3.691987] NET: Registered protocol family 1
> 
> [    3.693382] pci 0000:00:01.0: Activating ISA DMA hang workarounds
> 
> [    3.695214] PCI: CLS 64 bytes, default 64
> 
> [    3.698176] Trying to unpack rootfs image as initramfs...
> 
> [   30.595279] BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:1]
> 
> [    3.505631] pnp: PnP ACPI: found 5 devices
> 
> [    3.506417] ACPI: bus type PNP unregistered
> 
> [    3.635781] pci 0000:00:06.0: no compatible bridge window for [mem
> 0xfe440000
> 
> -0xfe45ffff pref]
> 
> [    3.637555] pci 0000:00:06.0: BAR 6: assigned [mem 0x80000000-0x8001ffff
> pref
> 
> ]
> 
> [    3.638986] pci 0000:00:01.0: BAR 6: assigned [mem 0x80020000-0x800207ff
> pref
> 
> ]
> 
> [    3.640416] pci 0000:00:04.0: BAR 6: assigned [mem 0x80020800-0x80020fff
> pref
> 
> ]
> 
> [    3.641864] pci 0000:00:05.0: BAR 6: assigned [mem 0x80021000-0x800217ff
> pref
> 
> ]
> 
> [    3.643259] pci 0000:00:00.0: not setting up bridge for bus 0000:01
> 
> [    3.644550] pci_bus 0000:00: resource 4 [io  0x0000-0x0cf7]
> 
> [    3.645670] pci_bus 0000:00: resource 5 [io  0x0d00-0xffff]
> 
> [    3.646795] pci_bus 0000:00: resource 6 [mem 0x80000000-0xdfffffff]
> 
> [    3.648031] pci_bus 0000:00: resource 7 [mem 0xd000000000-0xfcffffffff]
> 
> [    3.650970] NET: Registered protocol family 2
> 
> [    3.661491] TCP established hash table entries: 16384 (order: 6, 262144
> bytes
> 
> )
> 
> [    3.671854] TCP bind hash table entries: 16384 (order: 6, 262144 bytes)
> 
> [    3.681116] TCP: Hash tables configured (established 16384 bind 16384)
> 
> [    3.683335] TCP: reno registered
> 
> [    3.684243] UDP hash table entries: 1024 (order: 3, 32768 bytes)
> 
> [    3.686484] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes)
> 
> [    3.691987] NET: Registered protocol family 1
> 
> [    3.693382] pci 0000:00:01.0: Activating ISA DMA hang workarounds
> 
> [    3.695214] PCI: CLS 64 bytes, default 64
> 
> [    3.698176] Trying to unpack rootfs image as initramfs...
> 
> [   30.595279] BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:1]
> 
> [   30.596366] Modules linked in:
>> Additionally, It produces a lot of MSR requests:
> Yes, on AMD Linux is touching more MSRs( AMD specific -address 0xC00XXXX)
> compared to Intel.
> 
> Thanks and regards,
> Anish
> 
> 
> On Fri, May 16, 2014 at 2:17 PM, Nils Beyer <nbe@renzel.net> wrote:
> 
>> Hi Anish,
>>
>> Anish wrote:
>>> If patches looks good to you, we can submit it. I have been testing it on
>>> Phenom box which lacks some of newer SVM features.
>>
>> Your patch applied cleanly to the working copy of the "bhyve_svm"-project.
>> I was then able to merge with HEAD
>> (using "theirs-full" on one file) and compile the kernel. So, to me it
>> looks OK to commit.
>>
>> Unfortunately, I am still not able to boot CentOS 6.5 using my Phenom
>> 1055T. It produces 200% load on the
>> host CPU, and the emulated machine generates endlessly:
>>
>> =======================================================================================
>> BUG: soft lockup - CPU#0 stuck for 67s! [swapper:1]
>> Modules linked in:
>> CPU 0
>> Modules linked in:
>>
>> Pid: 1, comm: swapper Not tainted 2.6.32-431.el6.x86_64 #1   BHYVE

And more...


>> I'd love to see CentOS perfectly running on my Phenom as it runs perfectly
>> on an Intel i3.
>>
>> If you need any further information/debug, please let me know...

I've been trying to get Ubuntu, CentOS and like to run on AMDs, and
currently I'm compiling a kernel, but it goes dirt slow.

Attached a patch I have to debug more of the MSRs and it does what I do
to get the TSC running.... It helps, but things are still like molases.

For Ubuntu I also needed to fix part of the AHCI code since it bails out
on ATA FLUSH.

I'm going to take a look at the recently posted diff which should get
bhyve_svm in line with head. And see if that speeds up my Ubuntu kernels.

--WjW


--------------030600060806040702030601
Content-Type: text/plain; charset=windows-1252;
 name="msr-tsc.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="msr-tsc.patch"

Index: sys/amd64/vmm/amd/svm.c
===================================================================
--- sys/amd64/vmm/amd/svm.c	(revision 264582)
+++ sys/amd64/vmm/amd/svm.c	(working copy)
@@ -82,6 +82,8 @@
 static bool svm_vmexit(struct svm_softc *svm_sc, int vcpu,
 			struct vm_exit *vmexit);
 static int svm_msr_rw_ok(uint8_t *btmap, uint64_t msr);
+static int svm_msr_ro_ok(uint8_t *btmap, uint64_t msr);
+static int svm_msr_rw_ro_ok(uint8_t *btmap, uint64_t msr, int mask);
 static int svm_msr_index(uint64_t msr, int *index, int *bit);
 
 static uint32_t svm_feature; /* AMD SVM features. */
@@ -315,9 +317,24 @@
 /*
  * Give virtual cpu the complete access to MSR(read & write).
  */
+#define MSR_RO 1
+#define MSR_RW 3
+
 static int
 svm_msr_rw_ok(uint8_t *perm_bitmap, uint64_t msr)
 {
+	return svm_msr_rw_ro_ok(perm_bitmap, msr, MSR_RW);
+}
+
+static int
+svm_msr_ro_ok(uint8_t *perm_bitmap, uint64_t msr)
+{
+	return svm_msr_rw_ro_ok(perm_bitmap, msr, MSR_RO);
+}
+
+static int
+svm_msr_rw_ro_ok(uint8_t *perm_bitmap, uint64_t msr, int mask)
+{
 	int index, bit, err;
 
 	err = svm_msr_index(msr, &index, &bit);
@@ -336,8 +353,12 @@
 	}
 
 	/* Disable intercept for read and write. */
-	perm_bitmap[index] &= ~(3 << bit);
-	CTR1(KTR_VMM, "Guest has full control on SVM:MSR(0x%lx).\n", msr);
+	perm_bitmap[index] &= ~(mask << bit);
+	if (mask==MSR_RW) {
+		CTR1(KTR_VMM, "Guest has Read/Write  control on SVM:MSR(0x%lx).\n", msr );
+	} else {
+		CTR1(KTR_VMM, "Guest has Read/Write  control on SVM:MSR(0x%lx).\n", msr );
+	}
 	
 	return (0);
 }
@@ -415,10 +436,26 @@
 	svm_msr_rw_ok(svm_sc->msr_bitmap, MSR_SYSENTER_CS_MSR);
 	svm_msr_rw_ok(svm_sc->msr_bitmap, MSR_SYSENTER_ESP_MSR);
 	svm_msr_rw_ok(svm_sc->msr_bitmap, MSR_SYSENTER_EIP_MSR);
-	
+
+#define AMD_MSR_TSEG_BASE   	0xc0010112
+#define AMD_MSR_OSVW_ID_LENGTH  0xc0010140      /* read */
+#define AMD_MSR_OSVW_STATUS     0xc0010141      /* read */
+#define AMD_MSR_MC4_CTL_MASK    0xc0010048
+
 	/* For Nested Paging/RVI only. */
 	svm_msr_rw_ok(svm_sc->msr_bitmap, MSR_PAT);
+	svm_msr_rw_ok(svm_sc->msr_bitmap, AMD_MSR_OSVW_ID_LENGTH);
+	svm_msr_rw_ok(svm_sc->msr_bitmap, AMD_MSR_OSVW_STATUS);
 
+	/*
+	 * MSRs that are allowed to be read.
+	 * most obvious one is the TSC read which could be time critical
+	 */
+	svm_msr_ro_ok(svm_sc->msr_bitmap, MSR_TSC);
+	svm_msr_ro_ok(svm_sc->msr_bitmap, MSR_HWCR);
+	svm_msr_ro_ok(svm_sc->msr_bitmap, AMD_MSR_TSEG_BASE);
+	svm_msr_ro_ok(svm_sc->msr_bitmap, AMD_MSR_MC4_CTL_MASK);
+	
 	 /* Intercept access to all I/O ports. */
 	memset(svm_sc->iopm_bitmap, 0xFF, sizeof(svm_sc->iopm_bitmap));
 
@@ -566,6 +603,13 @@
 				svm_efer(svm_sc, vcpu, info1);
 				break;
 			}
+			if (ecx == MSR_TSC) {
+				uint64_t tscval = rdtsc();
+				VCPU_CTR0(svm_sc->vm, vcpu,"VMEXIT TSC MSR\n");
+				state->rax = tscval & 0xffffffff;
+				ctx->e.g.sctx_rdx = tscval >> 32;
+				break;
+			} 
 		
 			retu = false;	
 			if (info1) {
Index: sys/amd64/vmm/intel/vmx.c
===================================================================
--- sys/amd64/vmm/intel/vmx.c	(revision 264582)
+++ sys/amd64/vmm/intel/vmx.c	(working copy)
@@ -109,6 +109,9 @@
 #define	guest_msr_rw(vmx, msr) \
 	msr_bitmap_change_access((vmx)->msr_bitmap, (msr), MSR_BITMAP_ACCESS_RW)
 
+#define guest_msr_ro(vmx, msr) \
+    msr_bitmap_change_access((vmx)->msr_bitmap, (msr), MSR_BITMAP_ACCESS_READ)
+
 #define	HANDLED		1
 #define	UNHANDLED	0
 
@@ -786,6 +789,11 @@
 	 * MSR_EFER is saved and restored in the guest VMCS area on a
 	 * VM exit and entry respectively. It is also restored from the
 	 * host VMCS area on a VM exit.
+	 *
+	 * The TSC MSR is exposed read-only. Writes are disallowed as that
+	 * will impact the host TSC.
+	 * XXX Writes would be implemented with a wrmsr trap, and
+	 * then modifying the TSC offset in the VMCS.
 	 */
 	if (guest_msr_rw(vmx, MSR_GSBASE) ||
 	    guest_msr_rw(vmx, MSR_FSBASE) ||
@@ -793,7 +801,8 @@
 	    guest_msr_rw(vmx, MSR_SYSENTER_ESP_MSR) ||
 	    guest_msr_rw(vmx, MSR_SYSENTER_EIP_MSR) ||
 	    guest_msr_rw(vmx, MSR_KGSBASE) ||
-	    guest_msr_rw(vmx, MSR_EFER))
+	    guest_msr_rw(vmx, MSR_EFER) ||
+	    guest_msr_ro(vmx, MSR_TSC))
 		panic("vmx_vminit: error setting guest msr access");
 
 	/*
Index: sys/amd64/vmm/io/vlapic.c
===================================================================
--- sys/amd64/vmm/io/vlapic.c	(revision 264582)
+++ sys/amd64/vmm/io/vlapic.c	(working copy)
@@ -143,7 +143,7 @@
 #define	VLAPIC_TIMER_UNLOCK(vlapic)	mtx_unlock_spin(&((vlapic)->timer_mtx))
 #define	VLAPIC_TIMER_LOCKED(vlapic)	mtx_owned(&((vlapic)->timer_mtx))
 
-#define VLAPIC_BUS_FREQ	tsc_freq
+#define VLAPIC_BUS_FREQ	(128*1024*1024)
 
 static __inline uint32_t
 vlapic_get_id(struct vlapic *vlapic)
Index: sys/amd64/vmm/vmm_msr.c
===================================================================
--- sys/amd64/vmm/vmm_msr.c	(revision 264582)
+++ sys/amd64/vmm/vmm_msr.c	(working copy)
@@ -113,6 +113,9 @@
 		case MSR_MCG_CAP:
 			guest_msrs[i] = 0;
 			break;
+		case MSR_TSC:
+			guest_msrs[i] = rdtsc();
+			break;
 		case MSR_PAT:
 			guest_msrs[i] = PAT_VALUE(0, PAT_WRITE_BACK)      |
 				PAT_VALUE(1, PAT_WRITE_THROUGH)   |
Index: sys/amd64/vmm/vmm_msr.h
===================================================================
--- sys/amd64/vmm/vmm_msr.h	(revision 264582)
+++ sys/amd64/vmm/vmm_msr.h	(working copy)
@@ -29,7 +29,7 @@
 #ifndef	_VMM_MSR_H_
 #define	_VMM_MSR_H_
 
-#define	VMM_MSR_NUM	16
+#define	VMM_MSR_NUM	17
 struct vm;
 
 void	vmm_msr_init(void);
Index: usr.sbin/bhyve/bhyverun.c
===================================================================
--- usr.sbin/bhyve/bhyverun.c	(revision 264582)
+++ usr.sbin/bhyve/bhyverun.c	(working copy)
@@ -52,6 +52,7 @@
 #include <vmmapi.h>
 
 #include "bhyverun.h"
+#include "compiledate.h"
 #include "acpi.h"
 #include "inout.h"
 #include "dbgport.h"
@@ -75,6 +76,8 @@
 
 #define MB		(1024UL * 1024)
 #define GB		(1024UL * MB)
+#define FALSE		0
+#define	TRUE		(!FALSE)
 
 typedef int (*vmexit_handler_t)(struct vmctx *, struct vm_exit *, int *vcpu);
 
@@ -139,8 +142,8 @@
 		"       -S: <slot,driver,configinfo> legacy PCI slot config\n"
 		"       -l: LPC device configuration\n"
 		"       -m: memory size in MB\n"
-		"       -w: ignore unimplemented MSRs\n",
-		progname, (int)strlen(progname), "");
+		"       -w: ignore unimplemented MSRs\n"
+		,progname, (int)strlen(progname), "");
 
 	exit(code);
 }
@@ -287,10 +290,6 @@
 	if (vme->u.inout.string || vme->u.inout.rep)
 		return (VMEXIT_ABORT);
 
-	/* Special case of guest reset */
-	if (out && port == 0x64 && (uint8_t)eax == 0xFE)
-		return (vmexit_catch_reset());
-
         /* Extra-special case of host notifications */
         if (out && port == GUEST_NIO_PORT)
                 return (vmexit_handle_notify(ctx, vme, pvcpu, eax));
@@ -315,16 +314,16 @@
 	uint64_t val;
 	uint32_t eax, edx;
 	int error;
+	val = 0;
 
-	val = 0;
 	error = emulate_rdmsr(ctx, *pvcpu, vme->u.msr.code, &val);
+
 	if (error != 0) {
-		fprintf(stderr, "rdmsr to register %#x on vcpu %d\n",
+		fprintf(stderr, "rdmsr to register %#x ignored on vcpu %d\n\r",
 		    vme->u.msr.code, *pvcpu);
 		if (strictmsr)
 			return (VMEXIT_ABORT);
 	}
-
 	eax = val;
 	error = vm_set_register(ctx, *pvcpu, VM_REG_GUEST_RAX, eax);
 	assert(error == 0);
@@ -332,7 +331,6 @@
 	edx = val >> 32;
 	error = vm_set_register(ctx, *pvcpu, VM_REG_GUEST_RDX, edx);
 	assert(error == 0);
-
 	return (VMEXIT_CONTINUE);
 }
 
@@ -343,7 +341,7 @@
 
 	error = emulate_wrmsr(ctx, *pvcpu, vme->u.msr.code, vme->u.msr.wval);
 	if (error != 0) {
-		fprintf(stderr, "wrmsr to register %#x(%#lx) on vcpu %d\n",
+		fprintf(stderr, "wrmsr to register %#x(%#lx) ignored on vcpu %d\n\r",
 		    vme->u.msr.code, vme->u.msr.wval, *pvcpu);
 		if (strictmsr)
 			return (VMEXIT_ABORT);
@@ -676,6 +674,7 @@
 	argc -= optind;
 	argv += optind;
 
+	printf("BHyve compiled: %s \n\r\n\r", compiledate );
 	if (argc != 1)
 		usage(1);
 
Index: usr.sbin/bhyve/xmsr.c
===================================================================
--- usr.sbin/bhyve/xmsr.c	(revision 264582)
+++ usr.sbin/bhyve/xmsr.c	(working copy)
@@ -38,24 +38,72 @@
 #include <stdlib.h>
 
 #include "xmsr.h"
+#include "xmsr-info.h"
 
+#define BIT(b)	(1<<b)
+#define FALSE	0
+#define	TRUE	(!FALSE)
+
 int
 emulate_wrmsr(struct vmctx *ctx, int vcpu, uint32_t code, uint64_t val)
 {
+	long retval = -1;
 
-	switch (code) {
+	switch (code) {	
 	case 0xd04:			/* Sandy Bridge uncore PMC MSRs */
 	case 0xc24:
-		return (0);
+		/* simulate that these registers are written */
+		retval=(0);
+		break;
 	default:
 		break;
 	}
-	return (-1);
+	fprintf(stderr,"wrmsr: %#x, %s, val: %li(%#lx).\n\r", 
+		code, xmsr_info_mnemonic(code), val, val); 
+	return retval;
 }
 
+/*
+ *	Return: error value
+ *		0 = instruction emulated
+ *		!0 = instruction ignore 
+ */
 int
 emulate_rdmsr(struct vmctx *ctx, int vcpu, uint32_t code, uint64_t *val)
 {
+	int retval = 0;
 
-	return (-1);
+        switch (code) {
+        case 0xd04:                     /* Sandy Bridge uncore PMC MSRs */
+//               *val = (0);
+		break;
+        case 0xc24:
+//               *val = (0);
+		break;
+	case AMD_MSR_TSEG_BASE:
+//		*val = 0xcfe00000;
+		break;
+	case AMD_MSR_HWCR:
+		*val = (BIT(24)|BIT(4));
+		break;
+        case AMD_MSR_OSVW_ID_LENGTH:
+                *val = (4);
+		break;
+        case AMD_MSR_OSVW_STATUS:
+                *val = (BIT(3)|BIT(2));
+		break;
+//	case AMD_MSR_IBSCTL:
+//		*val = BIT(8);
+//		break;
+        default:
+		retval = 1;
+                break;
+        }
+	fprintf(stderr,"rdmsr(%i:%s): %#x, %s, val: %li(%#lx).\n\r",
+		retval, (retval==0?"oke":"err"), 
+		code, xmsr_info_mnemonic(code), *val, *val); 
+	return retval;
+
 }
+
+

--------------030600060806040702030601--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?537BB6FF.5080909>