Date: Thu, 7 Nov 2013 19:10:27 +0100 From: =?ISO-8859-1?Q?Roger_Pau_Monn=E9?= <roger.pau@citrix.com> To: "freebsd-xen@freebsd.org" <freebsd-xen@freebsd.org> Cc: peter@FreeBSD.org, alc@FreeBSD.org, xen-devel <xen-devel@lists.xen.org>, freebsd-current@freebsd.org, Konstantin Belousov <kib@FreeBSD.org>, "Justin T. Gibbs" <gibbs@freebsd.org> Subject: Re: FreeBSD PVH guest support Message-ID: <527BD793.8010606@citrix.com> In-Reply-To: <526E6807.9030005@citrix.com> References: <526E6807.9030005@citrix.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--------------010605090609060304010908 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 8bit On 28/10/13 14:35, Roger Pau Monné wrote: > Hello, > > The Xen community is working on a new virtualization mode (or maybe I > should say an extension of HVM) to be able to run PV guests inside HVM > containers without requiring a device-model (Qemu). One of the > advantages of this new virtualization mode is that now it is much more > easier to port guests to run under it (as compared to pure PV guests). > > Given that FreeBSD already supports PVHVM, adding PVH support is quite > easy, we only need some glue for the PV entry point and then support > for diverging some early init functions (like fetching the e820 map or > starting the APs). > > The attached patch contains all this changes, and allows a SMP FreeBSD > guest to fully boot (and AFAIK work) under this new PVH mode. The patch > can also be found on my git repo: > > git://xenbits.xen.org/people/royger/freebsd.git pvh_v2 > > The patch touches quite a lot of the early init, so I've Cced the > persons that maintain those areas, so they can review it. > > In order to test it, and since the PVH changes are not yet merged into > upstream Xen, the use of a patched Xen is necessary. I've collected the > patches for PVH guest support from George Dunlap (v13) and fixed some > bugs on top of them, the tree can be found at: > > git://xenbits.xen.org/people/royger/xen.git fix_pvh I've updated the patch (as suggested by John Baldwin) and added a Xen Nexus, that attaches all the Xen top-level devices, this gets rid of the legacy bus. The new patch can be found at: git://xenbits.xen.org/people/royger/freebsd.git pvh_v2 And also attached on this email. Thanks for the review, Roger. --------------010605090609060304010908 Content-Type: text/plain; charset="UTF-8"; x-mac-type=0; x-mac-creator=0; name="0001-Xen-x86-PVH-support.patch" Content-Transfer-Encoding: 8bit Content-Disposition: attachment; filename="0001-Xen-x86-PVH-support.patch" >From 325c95ccd941bdb3101e9b6dd6c6a66274865fa9 Mon Sep 17 00:00:00 2001 From: Roger Pau Monne <roger.pau@citrix.com> Date: Thu, 7 Nov 2013 17:07:50 +0100 Subject: [PATCH] Xen x86 PVH support This is still very experimental, and PVH support has not yet been merged into upstream Xen. PVH mode is basically a PV guest inside an HVM container, and shares a great amount of code with PVHVM. The main difference is the way the guest is started, PVH uses the PV start sequence, jumping directly into the kernel entry point in long mode and with page tables set. The main work of this patch consists in setting the environment as similar as possible to what native FreeBSD expects, and then adding hooks to the PV ops when necessary. sys/amd64/amd64/locore.S: * Add PV entry point, hypervisor_page and the necessary elfnotes. sys/amd64/amd64/machdep.c: * Add hooks to replace bare metal operations that should use a PV helper, this includes: - Preload metadata - i8254_init and i8254_delay - Fetching the e820 memory map - Reserve of the MP bootstrap region * Create a DELAY function that uses the PV hooks. * Introduce a new hammer_time_xen that sets the necessary stuff when running in PVH mode. sys/amd64/amd64/mp_machdep.c: * Introduce a hook to replace start_all_aps. * Introduce a lapic_disabled variable to prevent polluting the code with xen specific gates. sys/amd64/include/asmacros.h: * Copy the ELFNOTE macro from the i386 Xen PV port. sys/amd64/include/clock.h: sys/i386/include/clock.h: * Prototypes for the xen early delay initialization and usage. sys/amd64/include/cpu.h: * Introduce a new cpu hook to init APs. sys/amd64/include/sysarch.h: * Declare the init_ops structure. sys/amd64/include/xen/hypercall.h: sys/i386/include/xen/hypercall.h * Switch to the PV style hypercall mechanism for HVM also. sys/conf/files: * Make the PV console available on XENHVM also. sys/conf/files.amd64: * Include the new files for the PVH port. sys/dev/xen/console/console.c: sys/dev/xen/console/xencons_ring.c: * Remove the identify method and instead add the device from nexus_xen. * Use HYPERVISOR_start_info instead of xen_start_info. * Use HYPERVISOR_event_channel_op to kick the event channel before xen interrupts are setup. sys/dev/xen/control/control.c: * Use the PV shutdown on PVH. sys/dev/xen/timer/timer.c: * Pass a vcpu_info to xen_fetch_vcpu_time, this allows using this function at very early init, before per-cpu vcpu_info is set. * Remove critical_{enter/exit} from xen_fetch_vcpu_time so it can be used at early boot, instead place them on the callers. * Introduce two new functions, xen_delay_init and xen_delay that can be used at early boot to implement the generic DELAY function. * Remove the identify method that used to add the device, now it is manually added from either xenpci (HVM) or nexus_xen (PV). sys/i386/i386/locore.s: * Reserve space for the hypercall page. sys/i386/i386/machdep.c: * Create a generic DELAY function. sys/i386/xen/xen_machdep.c: * Set HYPERVISOR_start_info. sys/x86/isa/clock.c: * Rename the generic DELAY function to i8254_delay. sys/x86/x86/delay.c: * Put generic delay helpers here, get_tsc and delay_tc. sys/x86/x86/local_apic.c: * Prevent the local apic from attaching when running on PVH mode. sys/x86/xen/hvm.c: * Set the start_all_aps hook. * Fix the setting of the hypercall page now that we are using the same mechanism as the PV port. * Initialize Xen CPU hooks for the PVH port. * Introduce the xen_early_printf debug function, which prints directly to the hypervisor console. * Initialize APs before SI_SUB_SMP (SI_SUB_SMP-1). sys/x86/xen/mptable.c: * Create a dummy PV CPU enumerator for the PVH port. sys/x86/xen/pv.c: * Implement the PV functions for the early boot hooks, parse_preload_data and fetch_e820_map. * Implement the PV function for the start_all_aps hook. sys/x86/xen/pvcpu.c: * Dummy Xen PV CPU device, that we use to set the per-cpu pc_device. sys/xen/gnttab.c: * Allocate resume_frames for the PVH port. sys/xen/interface/arch-x86/xen.h: * Interface change for the PVH port (not used on FreeBSD). sys/xen/pv.h: * Header that exports the specific PV functions. sys/xen/xen-os.h: * Declare prototypes for the newly added functions. sys/xen/xenstore/xenstore.c: * Make the xenstore driver hang from both xenpci and the nexus when running XENHVM, this is because we don't have a xenpci device on the PVH port. * Remove the identify routine that added the device, instead add it from either xenpci (HVM) or nexus_xen (PV). sys/dev/xen/xenpci/xenpci.c: * Add the xenstore and xen_et devices on succesful attach. sys/i386/xen/mp_machdep.c: * Modify cpu_initialize_context to match the changes in the Xen interface. sys/x86/xen/xen_nexus.c: * Create a specific nexus for Xen PV guests that takes care of adding the top level Xen PV devices. --- sys/amd64/amd64/locore.S | 53 ++++++++ sys/amd64/amd64/machdep.c | 179 ++++++++++++++++++++++---- sys/amd64/amd64/mp_machdep.c | 27 +++-- sys/amd64/include/asmacros.h | 26 ++++ sys/amd64/include/clock.h | 6 + sys/amd64/include/cpu.h | 1 + sys/amd64/include/sysarch.h | 19 +++ sys/amd64/include/xen/hypercall.h | 7 - sys/conf/files | 4 +- sys/conf/files.amd64 | 5 + sys/conf/files.i386 | 2 + sys/dev/xen/console/console.c | 29 ++--- sys/dev/xen/console/xencons_ring.c | 15 ++- sys/dev/xen/control/control.c | 37 +++--- sys/dev/xen/timer/timer.c | 73 +++++++---- sys/dev/xen/xenpci/xenpci.c | 8 + sys/i386/i386/locore.s | 9 ++ sys/i386/i386/machdep.c | 11 ++ sys/i386/include/clock.h | 6 + sys/i386/include/xen/hypercall.h | 7 - sys/i386/xen/mp_machdep.c | 6 +- sys/i386/xen/xen_machdep.c | 4 +- sys/x86/isa/clock.c | 53 +-------- sys/x86/isa/isa.c | 3 + sys/x86/x86/delay.c | 95 ++++++++++++++ sys/x86/x86/local_apic.c | 8 +- sys/x86/xen/hvm.c | 98 +++++++++++---- sys/x86/xen/mptable.c | 136 ++++++++++++++++++++ sys/x86/xen/pv.c | 247 ++++++++++++++++++++++++++++++++++++ sys/x86/xen/pvcpu.c | 77 +++++++++++ sys/x86/xen/xen_nexus.c | 99 ++++++++++++++ sys/xen/gnttab.c | 21 +++- sys/xen/interface/arch-x86/xen.h | 11 ++- sys/xen/pv.h | 29 ++++ sys/xen/xen-os.h | 8 + sys/xen/xenstore/xenstore.c | 24 ++-- 36 files changed, 1225 insertions(+), 218 deletions(-) create mode 100644 sys/x86/x86/delay.c create mode 100644 sys/x86/xen/mptable.c create mode 100644 sys/x86/xen/pv.c create mode 100644 sys/x86/xen/pvcpu.c create mode 100644 sys/x86/xen/xen_nexus.c create mode 100644 sys/xen/pv.h diff --git a/sys/amd64/amd64/locore.S b/sys/amd64/amd64/locore.S index 55cda3a..e04cc48 100644 --- a/sys/amd64/amd64/locore.S +++ b/sys/amd64/amd64/locore.S @@ -31,6 +31,12 @@ #include <machine/pmap.h> #include <machine/specialreg.h> +#ifdef XENHVM +#include <xen/xen-os.h> +#define __ASSEMBLY__ +#include <xen/interface/elfnote.h> +#endif + #include "assym.s" /* @@ -86,3 +92,50 @@ NON_GPROF_ENTRY(btext) ALIGN_DATA /* just to be sure */ .space 0x1000 /* space for bootstack - temporary stack */ bootstack: + +#ifdef XENHVM +/* Xen */ +.section __xen_guest + ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS, .asciz, "FreeBSD") + ELFNOTE(Xen, XEN_ELFNOTE_GUEST_VERSION, .asciz, "HEAD") + ELFNOTE(Xen, XEN_ELFNOTE_XEN_VERSION, .asciz, "xen-3.0") + ELFNOTE(Xen, XEN_ELFNOTE_VIRT_BASE, .quad, KERNBASE) + ELFNOTE(Xen, XEN_ELFNOTE_PADDR_OFFSET, .quad, KERNBASE) /* Xen honours elf->p_paddr; compensate for this */ + ELFNOTE(Xen, XEN_ELFNOTE_ENTRY, .quad, xen_start) + ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, .quad, hypercall_page) + ELFNOTE(Xen, XEN_ELFNOTE_HV_START_LOW, .quad, HYPERVISOR_VIRT_START) + ELFNOTE(Xen, XEN_ELFNOTE_FEATURES, .asciz, "writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel|hvm_callback_vector") + ELFNOTE(Xen, XEN_ELFNOTE_PAE_MODE, .asciz, "yes") + ELFNOTE(Xen, XEN_ELFNOTE_L1_MFN_VALID, .long, PG_V, PG_V) + ELFNOTE(Xen, XEN_ELFNOTE_LOADER, .asciz, "generic") + ELFNOTE(Xen, XEN_ELFNOTE_SUSPEND_CANCEL, .long, 0) + ELFNOTE(Xen, XEN_ELFNOTE_BSD_SYMTAB, .asciz, "yes") + + .text +.p2align PAGE_SHIFT, 0x90 /* Hypercall_page needs to be PAGE aligned */ + +NON_GPROF_ENTRY(hypercall_page) + .skip 0x1000, 0x90 /* Fill with "nop"s */ + +NON_GPROF_ENTRY(xen_start) + /* Don't trust what the loader gives for rflags. */ + pushq $PSL_KERNEL + popfq + + /* Parameters for the xen init function */ + movq %rsi, %rdi /* shared_info (arg 1) */ + movq %rsp, %rsi /* xenstack (arg 2) */ + + /* Use our own stack */ + movq $bootstack,%rsp + xorl %ebp, %ebp + + /* u_int64_t hammer_time_xen(start_info_t *si, u_int64_t xenstack); */ + call hammer_time_xen + movq %rax, %rsp /* set up kstack for mi_startup() */ + call mi_startup /* autoconfiguration, mountroot etc */ + + /* NOTREACHED */ +0: hlt + jmp 0b +#endif diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c index 2b2e47f..b649def 100644 --- a/sys/amd64/amd64/machdep.c +++ b/sys/amd64/amd64/machdep.c @@ -127,6 +127,7 @@ __FBSDID("$FreeBSD$"); #include <machine/reg.h> #include <machine/sigframe.h> #include <machine/specialreg.h> +#include <machine/sysarch.h> #ifdef PERFMON #include <machine/perfmon.h> #endif @@ -147,10 +148,20 @@ __FBSDID("$FreeBSD$"); #include <isa/isareg.h> #include <isa/rtc.h> +#ifdef XENHVM +/* Xen */ +#include <xen/xen-os.h> +#include <xen/hvm.h> +#include <xen/pv.h> +#endif + /* Sanity check for __curthread() */ CTASSERT(offsetof(struct pcpu, pc_curthread) == 0); extern u_int64_t hammer_time(u_int64_t, u_int64_t); +#ifdef XENHVM +extern u_int64_t hammer_time_xen(start_info_t *, u_int64_t); +#endif extern void printcpuinfo(void); /* XXX header file */ extern void identify_cpu(void); @@ -166,6 +177,23 @@ static int set_fpcontext(struct thread *td, const mcontext_t *mcp, char *xfpustate, size_t xfpustate_len); SYSINIT(cpu, SI_SUB_CPU, SI_ORDER_FIRST, cpu_startup, NULL); +/* Preload data parse function */ +static caddr_t native_parse_preload_data(u_int64_t); + +/* Native function to fetch the e820 map */ +static void native_fetch_e820_map(caddr_t, struct bios_smap **, u_int32_t *); + +/* Default init_ops implementation. */ +struct init_ops init_ops = { + .parse_preload_data = native_parse_preload_data, + .early_delay_init = i8254_init, + .early_delay = i8254_delay, + .fetch_e820_map = native_fetch_e820_map, +#ifdef SMP + .mp_bootaddress = mp_bootaddress, +#endif +}; + /* * The file "conf/ldscript.amd64" defines the symbol "kernphys". Its value is * the physical address at which the kernel is loaded. @@ -216,6 +244,15 @@ struct mem_range_softc mem_range_softc; struct mtx dt_lock; /* lock for GDT and LDT */ +void +DELAY(int n) +{ + if (delay_tc(n)) + return; + + init_ops.early_delay(n); +} + static void cpu_startup(dummy) void *dummy; @@ -1408,6 +1445,24 @@ add_smap_entry(struct bios_smap *smap, vm_paddr_t *physmap, int *physmap_idxp) return (1); } +static void +native_fetch_e820_map(caddr_t kmdp, struct bios_smap **smap, u_int32_t *size) +{ + /* + * get memory map from INT 15:E820, kindly supplied by the + * loader. + * + * subr_module.c says: + * "Consumer may safely assume that size value precedes data." + * ie: an int32_t immediately precedes smap. + */ + *smap = (struct bios_smap *)preload_search_info(kmdp, + MODINFO_METADATA | MODINFOMD_SMAP); + if (*smap == NULL) + panic("No BIOS smap info from loader!"); + *size = *((u_int32_t *)*smap - 1); +} + /* * Populate the (physmap) array with base/bound pairs describing the * available physical memory in the system, then test this memory and @@ -1433,19 +1488,8 @@ getmemsize(caddr_t kmdp, u_int64_t first) basemem = 0; physmap_idx = 0; - /* - * get memory map from INT 15:E820, kindly supplied by the loader. - * - * subr_module.c says: - * "Consumer may safely assume that size value precedes data." - * ie: an int32_t immediately precedes smap. - */ - smapbase = (struct bios_smap *)preload_search_info(kmdp, - MODINFO_METADATA | MODINFOMD_SMAP); - if (smapbase == NULL) - panic("No BIOS smap info from loader!"); + init_ops.fetch_e820_map(kmdp, &smapbase, &smapsize); - smapsize = *((u_int32_t *)smapbase - 1); smapend = (struct bios_smap *)((uintptr_t)smapbase + smapsize); for (smap = smapbase; smap < smapend; smap++) @@ -1467,7 +1511,8 @@ getmemsize(caddr_t kmdp, u_int64_t first) #ifdef SMP /* make hole for AP bootstrap code */ - physmap[1] = mp_bootaddress(physmap[1] / 1024); + if (init_ops.mp_bootaddress) + physmap[1] = init_ops.mp_bootaddress(physmap[1] / 1024); #endif /* @@ -1681,6 +1726,98 @@ do_next: msgbufp = (struct msgbuf *)PHYS_TO_DMAP(phys_avail[pa_indx]); } +static caddr_t +native_parse_preload_data(u_int64_t modulep) +{ + caddr_t kmdp; + + preload_metadata = (caddr_t)(uintptr_t)(modulep + KERNBASE); + preload_bootstrap_relocate(KERNBASE); + kmdp = preload_search_by_type("elf kernel"); + if (kmdp == NULL) + kmdp = preload_search_by_type("elf64 kernel"); + boothowto = MD_FETCH(kmdp, MODINFOMD_HOWTO, int); + kern_envp = MD_FETCH(kmdp, MODINFOMD_ENVP, char *) + KERNBASE; +#ifdef DDB + ksym_start = MD_FETCH(kmdp, MODINFOMD_SSYM, uintptr_t); + ksym_end = MD_FETCH(kmdp, MODINFOMD_ESYM, uintptr_t); +#endif + + return (kmdp); +} + +#ifdef XENHVM +/* + * First function called by the Xen PVH boot sequence. + * + * Set some Xen global variables and prepare the environment so it is + * as similar as possible to what native FreeBSD init function expects. + */ +u_int64_t +hammer_time_xen(start_info_t *si, u_int64_t xenstack) +{ + u_int64_t physfree; + u_int64_t *PT4 = (u_int64_t *)xenstack; + u_int64_t *PT3 = (u_int64_t *)(xenstack + PAGE_SIZE); + u_int64_t *PT2 = (u_int64_t *)(xenstack + 2 * PAGE_SIZE); + int i; + + KASSERT((si != NULL && xenstack != 0), + ("invalid start_info or xenstack")); + + xen_early_printf("FreeBSD PVH running on %s\n", si->magic); + + /* We use 3 pages of xen stack for the boot pagetables */ + physfree = xenstack + 3 * PAGE_SIZE - KERNBASE; + + /* Setup Xen global variables */ + HYPERVISOR_start_info = si; + HYPERVISOR_shared_info = + (shared_info_t *)(si->shared_info + KERNBASE); + + /* + * Setup some misc global variables for Xen devices + * + * XXX: devices that need this specific variables should + * be rewritten to fetch this info by themselves from the + * start_info page. + */ + console_page = + (char *)(ptoa(si->console.domU.mfn) + KERNBASE); + xen_store = (struct xenstore_domain_interface *) + (ptoa(si->store_mfn) + KERNBASE); + + xen_domain_type = XEN_PV_DOMAIN; + vm_guest = VM_GUEST_XEN; + + /* + * Use the stack Xen gives us to build the page tables + * as native FreeBSD expects to find them (created + * by the boot trampoline). + */ + for (i = 0; i < 512; i++) { + /* Each slot of the level 4 pages points to the same level 3 page */ + PT4[i] = ((u_int64_t)&PT3[0]) - KERNBASE; + PT4[i] |= PG_V | PG_RW | PG_U; + + /* Each slot of the level 3 pages points to the same level 2 page */ + PT3[i] = ((u_int64_t)&PT2[0]) - KERNBASE; + PT3[i] |= PG_V | PG_RW | PG_U; + + /* The level 2 page slots are mapped with 2MB pages for 1GB. */ + PT2[i] = i * (2 * 1024 * 1024); + PT2[i] |= PG_V | PG_RW | PG_PS | PG_U; + } + load_cr3(((u_int64_t)&PT4[0]) - KERNBASE); + + /* Set the hooks for early functions that diverge from bare metal */ + xen_pv_set_init_ops(); + + /* Now we can jump into the native init function */ + return hammer_time(0, physfree); +} +#endif + u_int64_t hammer_time(u_int64_t modulep, u_int64_t physfree) { @@ -1705,17 +1842,7 @@ hammer_time(u_int64_t modulep, u_int64_t physfree) */ proc_linkup0(&proc0, &thread0); - preload_metadata = (caddr_t)(uintptr_t)(modulep + KERNBASE); - preload_bootstrap_relocate(KERNBASE); - kmdp = preload_search_by_type("elf kernel"); - if (kmdp == NULL) - kmdp = preload_search_by_type("elf64 kernel"); - boothowto = MD_FETCH(kmdp, MODINFOMD_HOWTO, int); - kern_envp = MD_FETCH(kmdp, MODINFOMD_ENVP, char *) + KERNBASE; -#ifdef DDB - ksym_start = MD_FETCH(kmdp, MODINFOMD_SSYM, uintptr_t); - ksym_end = MD_FETCH(kmdp, MODINFOMD_ESYM, uintptr_t); -#endif + kmdp = init_ops.parse_preload_data(modulep); /* Init basic tunables, hz etc */ init_param1(); @@ -1799,10 +1926,10 @@ hammer_time(u_int64_t modulep, u_int64_t physfree) lidt(&r_idt); /* - * Initialize the i8254 before the console so that console + * Initialize the early delay before the console so that console * initialization can use DELAY(). */ - i8254_init(); + init_ops.early_delay_init(); /* * Initialize the console before we print anything out. diff --git a/sys/amd64/amd64/mp_machdep.c b/sys/amd64/amd64/mp_machdep.c index 4ef4b3d..44c2a45 100644 --- a/sys/amd64/amd64/mp_machdep.c +++ b/sys/amd64/amd64/mp_machdep.c @@ -90,7 +90,8 @@ extern struct pcpu __pcpu[]; /* AP uses this during bootstrap. Do not staticize. */ char *bootSTK; -static int bootAP; +int bootAP; +bool lapic_disabled = false; /* Free these after use */ void *bootstacks[MAXCPU]; @@ -122,9 +123,12 @@ u_long *ipi_rendezvous_counts[MAXCPU]; static u_long *ipi_hardclock_counts[MAXCPU]; #endif +int native_start_all_aps(void); + /* Default cpu_ops implementation. */ struct cpu_ops cpu_ops = { - .ipi_vectored = lapic_ipi_vectored + .ipi_vectored = lapic_ipi_vectored, + .start_all_aps = native_start_all_aps, }; extern inthand_t IDTVEC(fast_syscall), IDTVEC(fast_syscall32); @@ -138,7 +142,7 @@ extern int pmap_pcid_enabled; static volatile cpuset_t ipi_nmi_pending; /* used to hold the AP's until we are ready to release them */ -static struct mtx ap_boot_mtx; +struct mtx ap_boot_mtx; /* Set to 1 once we're ready to let the APs out of the pen. */ static volatile int aps_ready = 0; @@ -165,7 +169,6 @@ static int cpu_cores; /* cores per package */ static void assign_cpu_ids(void); static void set_interrupt_apic_ids(void); -static int start_all_aps(void); static int start_ap(int apic_id); static void release_aps(void *dummy); @@ -569,7 +572,7 @@ cpu_mp_start(void) assign_cpu_ids(); /* Start each Application Processor */ - start_all_aps(); + cpu_ops.start_all_aps(); set_interrupt_apic_ids(); } @@ -707,7 +710,8 @@ init_secondary(void) wrmsr(MSR_SF_MASK, PSL_NT|PSL_T|PSL_I|PSL_C|PSL_D); /* Disable local APIC just to be sure. */ - lapic_disable(); + if (!lapic_disabled) + lapic_disable(); /* signal our startup to the BSP. */ mp_naps++; @@ -733,7 +737,7 @@ init_secondary(void) /* A quick check from sanity claus */ cpuid = PCPU_GET(cpuid); - if (PCPU_GET(apic_id) != lapic_id()) { + if (!lapic_disabled && PCPU_GET(apic_id) != lapic_id()) { printf("SMP: cpuid = %d\n", cpuid); printf("SMP: actual apic_id = %d\n", lapic_id()); printf("SMP: correct apic_id = %d\n", PCPU_GET(apic_id)); @@ -749,7 +753,8 @@ init_secondary(void) mtx_lock_spin(&ap_boot_mtx); /* Init local apic for irq's */ - lapic_setup(1); + if (!lapic_disabled) + lapic_setup(1); /* Set memory range attributes for this CPU to match the BSP */ mem_range_AP_init(); @@ -764,7 +769,7 @@ init_secondary(void) if (cpu_logical > 1 && PCPU_GET(apic_id) % cpu_logical != 0) CPU_SET(cpuid, &logical_cpus_mask); - if (bootverbose) + if (!lapic_disabled && bootverbose) lapic_dump("AP"); if (smp_cpus == mp_ncpus) { @@ -908,8 +913,8 @@ assign_cpu_ids(void) /* * start each AP in our list */ -static int -start_all_aps(void) +int +native_start_all_aps(void) { vm_offset_t va = boot_address + KERNBASE; u_int64_t *pt4, *pt3, *pt2; diff --git a/sys/amd64/include/asmacros.h b/sys/amd64/include/asmacros.h index 1fb592a..ce8dce4 100644 --- a/sys/amd64/include/asmacros.h +++ b/sys/amd64/include/asmacros.h @@ -201,4 +201,30 @@ #endif /* LOCORE */ +#ifdef __STDC__ +#define ELFNOTE(name, type, desctype, descdata...) \ +.pushsection .note.name ; \ + .align 4 ; \ + .long 2f - 1f /* namesz */ ; \ + .long 4f - 3f /* descsz */ ; \ + .long type ; \ +1:.asciz #name ; \ +2:.align 4 ; \ +3:desctype descdata ; \ +4:.align 4 ; \ +.popsection +#else /* !__STDC__, i.e. -traditional */ +#define ELFNOTE(name, type, desctype, descdata) \ +.pushsection .note.name ; \ + .align 4 ; \ + .long 2f - 1f /* namesz */ ; \ + .long 4f - 3f /* descsz */ ; \ + .long type ; \ +1:.asciz "name" ; \ +2:.align 4 ; \ +3:desctype descdata ; \ +4:.align 4 ; \ +.popsection +#endif /* __STDC__ */ + #endif /* !_MACHINE_ASMACROS_H_ */ diff --git a/sys/amd64/include/clock.h b/sys/amd64/include/clock.h index d7f7d82..e7817ab 100644 --- a/sys/amd64/include/clock.h +++ b/sys/amd64/include/clock.h @@ -25,6 +25,12 @@ extern int smp_tsc; #endif void i8254_init(void); +void i8254_delay(int); +#ifdef XENHVM +void xen_delay_init(void); +void xen_delay(int); +#endif +int delay_tc(int); /* * Driver to clock driver interface. diff --git a/sys/amd64/include/cpu.h b/sys/amd64/include/cpu.h index 3d9ff531..ed9f1db 100644 --- a/sys/amd64/include/cpu.h +++ b/sys/amd64/include/cpu.h @@ -64,6 +64,7 @@ struct cpu_ops { void (*cpu_init)(void); void (*cpu_resume)(void); void (*ipi_vectored)(u_int, int); + int (*start_all_aps)(void); }; extern struct cpu_ops cpu_ops; diff --git a/sys/amd64/include/sysarch.h b/sys/amd64/include/sysarch.h index cd380d4..27fd3ba 100644 --- a/sys/amd64/include/sysarch.h +++ b/sys/amd64/include/sysarch.h @@ -4,3 +4,22 @@ /* $FreeBSD$ */ #include <x86/sysarch.h> + +#include <machine/pc/bios.h> +/* + * Struct containing pointers to init functions whose + * implementation is run time selectable. Selection can be made, + * for example, based on detection of a BIOS variant or + * hypervisor environment. + */ +struct init_ops { + caddr_t (*parse_preload_data)(u_int64_t); + void (*early_delay_init)(void); + void (*early_delay)(int); + void (*fetch_e820_map)(caddr_t, struct bios_smap **, u_int32_t *); +#ifdef SMP + u_int (*mp_bootaddress)(u_int); +#endif +}; + +extern struct init_ops init_ops; diff --git a/sys/amd64/include/xen/hypercall.h b/sys/amd64/include/xen/hypercall.h index a1b2a5c..499fb4d 100644 --- a/sys/amd64/include/xen/hypercall.h +++ b/sys/amd64/include/xen/hypercall.h @@ -51,15 +51,8 @@ #define CONFIG_XEN_COMPAT 0x030002 #define __must_check -#ifdef XEN #define HYPERCALL_STR(name) \ "call hypercall_page + ("STR(__HYPERVISOR_##name)" * 32)" -#else -#define HYPERCALL_STR(name) \ - "mov $("STR(__HYPERVISOR_##name)" * 32),%%eax; "\ - "add hypercall_stubs(%%rip),%%rax; " \ - "call *%%rax" -#endif #define _hypercall0(type, name) \ ({ \ diff --git a/sys/conf/files b/sys/conf/files index 3c20141..e711ddf 100644 --- a/sys/conf/files +++ b/sys/conf/files @@ -2512,8 +2512,8 @@ dev/xe/if_xe_pccard.c optional xe pccard dev/xen/balloon/balloon.c optional xen | xenhvm dev/xen/blkfront/blkfront.c optional xen | xenhvm dev/xen/blkback/blkback.c optional xen | xenhvm -dev/xen/console/console.c optional xen -dev/xen/console/xencons_ring.c optional xen +dev/xen/console/console.c optional xen | xenhvm +dev/xen/console/xencons_ring.c optional xen | xenhvm dev/xen/control/control.c optional xen | xenhvm dev/xen/netback/netback.c optional xen | xenhvm dev/xen/netfront/netfront.c optional xen | xenhvm diff --git a/sys/conf/files.amd64 b/sys/conf/files.amd64 index 33c4297..d736d84 100644 --- a/sys/conf/files.amd64 +++ b/sys/conf/files.amd64 @@ -564,5 +564,10 @@ x86/x86/mptable_pci.c optional mptable pci x86/x86/msi.c optional pci x86/x86/nexus.c standard x86/x86/tsc.c standard +x86/x86/delay.c standard x86/xen/hvm.c optional xenhvm x86/xen/xen_intr.c optional xen | xenhvm +x86/xen/mptable.c optional xenhvm +x86/xen/pvcpu.c optional xenhvm +x86/xen/pv.c optional xenhvm +x86/xen/xen_nexus.c optional xenhvm diff --git a/sys/conf/files.i386 b/sys/conf/files.i386 index 696d4e7..10a4da8 100644 --- a/sys/conf/files.i386 +++ b/sys/conf/files.i386 @@ -587,5 +587,7 @@ x86/x86/mptable_pci.c optional apic native pci x86/x86/msi.c optional apic pci x86/x86/nexus.c standard x86/x86/tsc.c standard +x86/x86/delay.c standard x86/xen/hvm.c optional xenhvm x86/xen/xen_intr.c optional xen | xenhvm +x86/xen/xen_nexus.c optional xen | xenhvm diff --git a/sys/dev/xen/console/console.c b/sys/dev/xen/console/console.c index 23eaee2..33d7cce 100644 --- a/sys/dev/xen/console/console.c +++ b/sys/dev/xen/console/console.c @@ -69,11 +69,14 @@ struct mtx cn_mtx; static char wbuf[WBUF_SIZE]; static char rbuf[RBUF_SIZE]; static int rc, rp; -static unsigned int cnsl_evt_reg; +unsigned int cnsl_evt_reg; static unsigned int wc, wp; /* write_cons, write_prod */ xen_intr_handle_t xen_intr_handle; device_t xencons_dev; +/* Virt address of the shared console page */ +char *console_page; + #ifdef KDB static int xc_altbrk; #endif @@ -113,6 +116,9 @@ static struct ttydevsw xc_ttydevsw = { static void xc_cnprobe(struct consdev *cp) { + if (!xen_pv_domain()) + return; + cp->cn_pri = CN_REMOTE; sprintf(cp->cn_name, "%s0", driver_name); } @@ -175,7 +181,7 @@ static void xc_cnputc(struct consdev *dev, int c) { - if (xen_start_info->flags & SIF_INITDOMAIN) + if (HYPERVISOR_start_info->flags & SIF_INITDOMAIN) xc_cnputc_dom0(dev, c); else xc_cnputc_domu(dev, c); @@ -206,22 +212,12 @@ xcons_putc(int c) xcons_force_flush(); #endif } - if (cnsl_evt_reg) - __xencons_tx_flush(); + __xencons_tx_flush(); /* inform start path that we're pretty full */ return ((wp - wc) >= WBUF_SIZE - 100) ? TRUE : FALSE; } -static void -xc_identify(driver_t *driver, device_t parent) -{ - device_t child; - child = BUS_ADD_CHILD(parent, 0, driver_name, 0); - device_set_driver(child, driver); - device_set_desc(child, "Xen Console"); -} - static int xc_probe(device_t dev) { @@ -245,7 +241,7 @@ xc_attach(device_t dev) cnsl_evt_reg = 1; callout_reset(&xc_callout, XC_POLLTIME, xc_timeout, xccons); - if (xen_start_info->flags & SIF_INITDOMAIN) { + if (HYPERVISOR_start_info->flags & SIF_INITDOMAIN) { error = xen_intr_bind_virq(dev, VIRQ_CONSOLE, 0, NULL, xencons_priv_interrupt, NULL, INTR_TYPE_TTY, &xen_intr_handle); @@ -309,7 +305,7 @@ __xencons_tx_flush(void) sz = wp - wc; if (sz > (WBUF_SIZE - WBUF_MASK(wc))) sz = WBUF_SIZE - WBUF_MASK(wc); - if (xen_start_info->flags & SIF_INITDOMAIN) { + if (HYPERVISOR_start_info->flags & SIF_INITDOMAIN) { HYPERVISOR_console_io(CONSOLEIO_write, sz, &wbuf[WBUF_MASK(wc)]); wc += sz; } else { @@ -405,7 +401,6 @@ xc_timeout(void *v) } static device_method_t xc_methods[] = { - DEVMETHOD(device_identify, xc_identify), DEVMETHOD(device_probe, xc_probe), DEVMETHOD(device_attach, xc_attach), @@ -424,7 +419,7 @@ xcons_force_flush(void) { int sz; - if (xen_start_info->flags & SIF_INITDOMAIN) + if (HYPERVISOR_start_info->flags & SIF_INITDOMAIN) return; /* Spin until console data is flushed through to the domain controller. */ diff --git a/sys/dev/xen/console/xencons_ring.c b/sys/dev/xen/console/xencons_ring.c index 3701551..3046498 100644 --- a/sys/dev/xen/console/xencons_ring.c +++ b/sys/dev/xen/console/xencons_ring.c @@ -32,9 +32,9 @@ __FBSDID("$FreeBSD$"); #define console_evtchn console.domU.evtchn xen_intr_handle_t console_handle; -extern char *console_page; extern struct mtx cn_mtx; extern device_t xencons_dev; +extern int cnsl_evt_reg; static inline struct xencons_interface * xencons_interface(void) @@ -60,6 +60,7 @@ xencons_ring_send(const char *data, unsigned len) struct xencons_interface *intf; XENCONS_RING_IDX cons, prod; int sent; + struct evtchn_send send = { .port = HYPERVISOR_start_info->console.domU.evtchn }; intf = xencons_interface(); cons = intf->out_cons; @@ -76,7 +77,11 @@ xencons_ring_send(const char *data, unsigned len) wmb(); intf->out_prod = prod; - xen_intr_signal(console_handle); + if (cnsl_evt_reg) + xen_intr_signal(console_handle); + else + HYPERVISOR_event_channel_op(EVTCHNOP_send, &send); + return sent; @@ -125,11 +130,11 @@ xencons_ring_init(void) { int err; - if (!xen_start_info->console_evtchn) + if (!HYPERVISOR_start_info->console_evtchn) return 0; err = xen_intr_bind_local_port(xencons_dev, - xen_start_info->console_evtchn, NULL, xencons_handle_input, NULL, + HYPERVISOR_start_info->console_evtchn, NULL, xencons_handle_input, NULL, INTR_TYPE_MISC | INTR_MPSAFE, &console_handle); if (err) { return err; @@ -145,7 +150,7 @@ void xencons_suspend(void) { - if (!xen_start_info->console_evtchn) + if (!HYPERVISOR_start_info->console_evtchn) return; xen_intr_unbind(&console_handle); diff --git a/sys/dev/xen/control/control.c b/sys/dev/xen/control/control.c index a9f8d1b..35c923d 100644 --- a/sys/dev/xen/control/control.c +++ b/sys/dev/xen/control/control.c @@ -317,21 +317,6 @@ xctrl_suspend() EVENTHANDLER_INVOKE(power_resume); } -static void -xen_pv_shutdown_final(void *arg, int howto) -{ - /* - * Inform the hypervisor that shutdown is complete. - * This is not necessary in HVM domains since Xen - * emulates ACPI in that mode and FreeBSD's ACPI - * support will request this transition. - */ - if (howto & (RB_HALT | RB_POWEROFF)) - HYPERVISOR_shutdown(SHUTDOWN_poweroff); - else - HYPERVISOR_shutdown(SHUTDOWN_reboot); -} - #else /* HVM mode suspension. */ @@ -447,6 +432,21 @@ xctrl_halt() shutdown_nice(RB_HALT); } +static void +xen_pv_shutdown_final(void *arg, int howto) +{ + /* + * Inform the hypervisor that shutdown is complete. + * This is not necessary in HVM domains since Xen + * emulates ACPI in that mode and FreeBSD's ACPI + * support will request this transition. + */ + if (howto & (RB_HALT | RB_POWEROFF)) + HYPERVISOR_shutdown(SHUTDOWN_poweroff); + else + HYPERVISOR_shutdown(SHUTDOWN_reboot); +} + /*------------------------------ Event Reception -----------------------------*/ static void xctrl_on_watch_event(struct xs_watch *watch, const char **vec, unsigned int len) @@ -529,10 +529,9 @@ xctrl_attach(device_t dev) xctrl->xctrl_watch.callback_data = (uintptr_t)xctrl; xs_register_watch(&xctrl->xctrl_watch); -#ifndef XENHVM - EVENTHANDLER_REGISTER(shutdown_final, xen_pv_shutdown_final, NULL, - SHUTDOWN_PRI_LAST); -#endif + if (xen_pv_domain()) + EVENTHANDLER_REGISTER(shutdown_final, xen_pv_shutdown_final, NULL, + SHUTDOWN_PRI_LAST); return (0); } diff --git a/sys/dev/xen/timer/timer.c b/sys/dev/xen/timer/timer.c index 354085b..333f1b0 100644 --- a/sys/dev/xen/timer/timer.c +++ b/sys/dev/xen/timer/timer.c @@ -59,6 +59,9 @@ __FBSDID("$FreeBSD$"); #include <machine/_inttypes.h> #include <machine/smp.h> +/* For the declaration of clock_lock */ +#include <isa/rtc.h> + #include "clock_if.h" static devclass_t xentimer_devclass; @@ -95,19 +98,6 @@ struct xentimer_softc { /* Last time; this guarantees a monotonically increasing clock. */ volatile uint64_t xen_timer_last_time = 0; -static void -xentimer_identify(driver_t *driver, device_t parent) -{ - if (!xen_domain()) - return; - - /* Handle all Xen PV timers in one device instance. */ - if (devclass_get_device(xentimer_devclass, 0)) - return; - - BUS_ADD_CHILD(parent, 0, "xen_et", 0); -} - static int xentimer_probe(device_t dev) { @@ -234,18 +224,16 @@ xen_fetch_vcpu_tinfo(struct vcpu_time_info *dst, struct vcpu_time_info *src) * it happens to be less than another CPU's previously determined value. */ static uint64_t -xen_fetch_vcpu_time(void) +xen_fetch_vcpu_time(struct vcpu_info *vcpu) { struct vcpu_time_info dst; struct vcpu_time_info *src; uint32_t pre_version; uint64_t now; volatile uint64_t last; - struct vcpu_info *vcpu = DPCPU_GET(vcpu_info); src = &vcpu->time; - critical_enter(); do { pre_version = xen_fetch_vcpu_tinfo(&dst, src); barrier(); @@ -266,16 +254,19 @@ xen_fetch_vcpu_time(void) } } while (!atomic_cmpset_64(&xen_timer_last_time, last, now)); - critical_exit(); - return (now); } static uint32_t xentimer_get_timecount(struct timecounter *tc) { + uint32_t xen_time; + + critical_enter(); + xen_time = (uint32_t)xen_fetch_vcpu_time(DPCPU_GET(vcpu_info)) & UINT_MAX; + critical_exit(); - return ((uint32_t)xen_fetch_vcpu_time() & UINT_MAX); + return xen_time; } /** @@ -305,7 +296,12 @@ xen_fetch_wallclock(struct timespec *ts) static void xen_fetch_uptime(struct timespec *ts) { - uint64_t uptime = xen_fetch_vcpu_time(); + uint64_t uptime; + + critical_enter(); + uptime = xen_fetch_vcpu_time(DPCPU_GET(vcpu_info)); + critical_exit(); + ts->tv_sec = uptime / NSEC_IN_SEC; ts->tv_nsec = uptime % NSEC_IN_SEC; } @@ -354,7 +350,7 @@ xentimer_intr(void *arg) struct xentimer_softc *sc = (struct xentimer_softc *)arg; struct xentimer_pcpu_data *pcpu = DPCPU_PTR(xentimer_pcpu); - pcpu->last_processed = xen_fetch_vcpu_time(); + pcpu->last_processed = xen_fetch_vcpu_time(DPCPU_GET(vcpu_info)); if (pcpu->timer != 0 && sc->et.et_active) sc->et.et_event_cb(&sc->et, sc->et.et_arg); @@ -415,7 +411,9 @@ xentimer_et_start(struct eventtimer *et, do { if (++i == 60) panic("can't schedule timer"); - next_time = xen_fetch_vcpu_time() + first_in_ns; + critical_enter(); + next_time = xen_fetch_vcpu_time(DPCPU_GET(vcpu_info)) + first_in_ns; + critical_exit(); error = xentimer_vcpu_start_timer(cpu, next_time); } while (error == -ETIME); @@ -573,8 +571,37 @@ xentimer_suspend(device_t dev) return (0); } +/* + * Xen delay early init + */ +void xen_delay_init(void) +{ + /* Init the clock lock */ + mtx_init(&clock_lock, "clk", NULL, MTX_SPIN | MTX_NOPROFILE); +} +/* + * Xen PV DELAY function + * + * When running on PVH mode we don't have an emulated i8524, so + * make use of the Xen time info in order to code a simple DELAY + * function that can be used during early boot. + */ +void xen_delay(int n) +{ + uint64_t end_ns; + uint64_t current; + + end_ns = xen_fetch_vcpu_time(&HYPERVISOR_shared_info->vcpu_info[0]); + end_ns += n * NSEC_IN_USEC; + + for (;;) { + current = xen_fetch_vcpu_time(&HYPERVISOR_shared_info->vcpu_info[0]); + if (current >= end_ns) + break; + } +} + static device_method_t xentimer_methods[] = { - DEVMETHOD(device_identify, xentimer_identify), DEVMETHOD(device_probe, xentimer_probe), DEVMETHOD(device_attach, xentimer_attach), DEVMETHOD(device_detach, xentimer_detach), diff --git a/sys/dev/xen/xenpci/xenpci.c b/sys/dev/xen/xenpci/xenpci.c index dd2ad92..a19ebcb 100644 --- a/sys/dev/xen/xenpci/xenpci.c +++ b/sys/dev/xen/xenpci/xenpci.c @@ -240,6 +240,7 @@ xenpci_attach(device_t dev) { struct xenpci_softc *scp = device_get_softc(dev); devclass_t dc; + device_t child; int error; /* @@ -270,6 +271,13 @@ xenpci_attach(device_t dev) goto errexit; } + if (BUS_ADD_CHILD(dev, 0, "xenstore", 0) == NULL) + panic("xenpci: unable to add xenstore device"); + child = BUS_ADD_CHILD(nexus, 0, "xen_et", 0); + if (child == NULL) + panic("xenpci: unable to add xen pv timer device"); + device_probe_and_attach(child); + return (bus_generic_attach(dev)); errexit: diff --git a/sys/i386/i386/locore.s b/sys/i386/i386/locore.s index 68cb430..bd136b1 100644 --- a/sys/i386/i386/locore.s +++ b/sys/i386/i386/locore.s @@ -898,3 +898,12 @@ done_pde: #endif ret + +#ifdef XENHVM +/* Xen Hypercall page */ + .text +.p2align PAGE_SHIFT, 0x90 /* Hypercall_page needs to be PAGE aligned */ + +NON_GPROF_ENTRY(hypercall_page) + .skip 0x1000, 0x90 /* Fill with "nop"s */ +#endif diff --git a/sys/i386/i386/machdep.c b/sys/i386/i386/machdep.c index c430316..af12b1d 100644 --- a/sys/i386/i386/machdep.c +++ b/sys/i386/i386/machdep.c @@ -254,6 +254,17 @@ struct mtx icu_lock; struct mem_range_softc mem_range_softc; +#ifndef XEN +void +DELAY(int n) +{ + if (delay_tc(n)) + return; + + i8254_delay(n); +} +#endif + static void cpu_startup(dummy) void *dummy; diff --git a/sys/i386/include/clock.h b/sys/i386/include/clock.h index d980ec7..287b2c8 100644 --- a/sys/i386/include/clock.h +++ b/sys/i386/include/clock.h @@ -22,6 +22,12 @@ extern int tsc_is_invariant; extern int tsc_perf_stat; void i8254_init(void); +void i8254_delay(int); +#ifdef XENHVM +void xen_delay_init(void); +void xen_delay(int); +#endif +int delay_tc(int); /* * Driver to clock driver interface. diff --git a/sys/i386/include/xen/hypercall.h b/sys/i386/include/xen/hypercall.h index edc13f4..1c15b0f 100644 --- a/sys/i386/include/xen/hypercall.h +++ b/sys/i386/include/xen/hypercall.h @@ -40,15 +40,8 @@ #define CONFIG_XEN_COMPAT 0x030002 -#if defined(XEN) #define HYPERCALL_STR(name) \ "call hypercall_page + ("STR(__HYPERVISOR_##name)" * 32)" -#else -#define HYPERCALL_STR(name) \ - "mov hypercall_stubs,%%eax; " \ - "add $("STR(__HYPERVISOR_##name)" * 32),%%eax; " \ - "call *%%eax" -#endif #define _hypercall0(type, name) \ ({ \ diff --git a/sys/i386/xen/mp_machdep.c b/sys/i386/xen/mp_machdep.c index c48fcb2..adf7627 100644 --- a/sys/i386/xen/mp_machdep.c +++ b/sys/i386/xen/mp_machdep.c @@ -928,9 +928,9 @@ cpu_initialize_context(unsigned int cpu) smp_trap_init(ctxt.trap_ctxt); ctxt.ldt_ents = 0; - ctxt.gdt_frames[0] = + ctxt.u.pv.gdt_frames[0] = (uint32_t)((uint64_t)vtomach(bootAPgdt) >> PAGE_SHIFT); - ctxt.gdt_ents = 512; + ctxt.u.pv.gdt_ents = 512; #ifdef __i386__ ctxt.user_regs.esp = boot_stack + PAGE_SIZE; @@ -959,7 +959,7 @@ cpu_initialize_context(unsigned int cpu) #endif printf("gdtpfn=%lx pdptpfn=%lx\n", - ctxt.gdt_frames[0], + ctxt.u.pv.gdt_frames[0], ctxt.ctrlreg[3] >> PAGE_SHIFT); PANIC_IF(HYPERVISOR_vcpu_op(VCPUOP_initialise, cpu, &ctxt)); diff --git a/sys/i386/xen/xen_machdep.c b/sys/i386/xen/xen_machdep.c index 7049be6..1b1c74d 100644 --- a/sys/i386/xen/xen_machdep.c +++ b/sys/i386/xen/xen_machdep.c @@ -89,6 +89,7 @@ IDTVEC(div), IDTVEC(dbg), IDTVEC(nmi), IDTVEC(bpt), IDTVEC(ofl), int xendebug_flags; start_info_t *xen_start_info; +start_info_t *HYPERVISOR_start_info; shared_info_t *HYPERVISOR_shared_info; xen_pfn_t *xen_machine_phys = machine_to_phys_mapping; xen_pfn_t *xen_phys_machine; @@ -744,7 +745,7 @@ void initvalues(start_info_t *startinfo); struct xenstore_domain_interface; extern struct xenstore_domain_interface *xen_store; -char *console_page; +extern char *console_page; void * bootmem_alloc(unsigned int size) @@ -927,6 +928,7 @@ initvalues(start_info_t *startinfo) HYPERVISOR_vm_assist(VMASST_CMD_enable, VMASST_TYPE_4gb_segments_notify); #endif xen_start_info = startinfo; + HYPERVISOR_start_info = startinfo; xen_phys_machine = (xen_pfn_t *)startinfo->mfn_list; IdlePTD = (pd_entry_t *)((uint8_t *)startinfo->pt_base + PAGE_SIZE); diff --git a/sys/x86/isa/clock.c b/sys/x86/isa/clock.c index a12e175..a5aed1c 100644 --- a/sys/x86/isa/clock.c +++ b/sys/x86/isa/clock.c @@ -247,61 +247,13 @@ getit(void) return ((high << 8) | low); } -#ifndef DELAYDEBUG -static u_int -get_tsc(__unused struct timecounter *tc) -{ - - return (rdtsc32()); -} - -static __inline int -delay_tc(int n) -{ - struct timecounter *tc; - timecounter_get_t *func; - uint64_t end, freq, now; - u_int last, mask, u; - - tc = timecounter; - freq = atomic_load_acq_64(&tsc_freq); - if (tsc_is_invariant && freq != 0) { - func = get_tsc; - mask = ~0u; - } else { - if (tc->tc_quality <= 0) - return (0); - func = tc->tc_get_timecount; - mask = tc->tc_counter_mask; - freq = tc->tc_frequency; - } - now = 0; - end = freq * n / 1000000; - if (func == get_tsc) - sched_pin(); - last = func(tc) & mask; - do { - cpu_spinwait(); - u = func(tc) & mask; - if (u < last) - now += mask - last + u + 1; - else - now += u - last; - last = u; - } while (now < end); - if (func == get_tsc) - sched_unpin(); - return (1); -} -#endif - /* * Wait "n" microseconds. * Relies on timer 1 counting down from (i8254_freq / hz) * Note: timer had better have been programmed before this is first used! */ void -DELAY(int n) +i8254_delay(int n) { int delta, prev_tick, tick, ticks_left; #ifdef DELAYDEBUG @@ -317,9 +269,6 @@ DELAY(int n) } if (state == 1) printf("DELAY(%d)...", n); -#else - if (delay_tc(n)) - return; #endif /* * Read the counter first, so that the rest of the setup overhead is diff --git a/sys/x86/isa/isa.c b/sys/x86/isa/isa.c index 1a57137..09d1ab7 100644 --- a/sys/x86/isa/isa.c +++ b/sys/x86/isa/isa.c @@ -241,3 +241,6 @@ isa_release_resource(device_t bus, device_t child, int type, int rid, * On this platform, isa can also attach to the legacy bus. */ DRIVER_MODULE(isa, legacy, isa_driver, isa_devclass, 0, 0); +#ifdef XENHVM +DRIVER_MODULE(isa, nexus, isa_driver, isa_devclass, 0, 0); +#endif diff --git a/sys/x86/x86/delay.c b/sys/x86/x86/delay.c new file mode 100644 index 0000000..7ea70b1 --- /dev/null +++ b/sys/x86/x86/delay.c @@ -0,0 +1,95 @@ +/*- + * Copyright (c) 1990 The Regents of the University of California. + * Copyright (c) 2010 Alexander Motin <mav@FreeBSD.org> + * All rights reserved. + * + * This code is derived from software contributed to Berkeley by + * William Jolitz and Don Ahn. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * from: @(#)clock.c 7.2 (Berkeley) 5/12/91 + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD$"); + +/* Generic x86 routines to handle delay */ + +#include <sys/param.h> +#include <sys/systm.h> +#include <sys/timetc.h> +#include <sys/proc.h> +#include <sys/kernel.h> +#include <sys/sched.h> + +#include <machine/clock.h> +#include <machine/cpu.h> + +static u_int +get_tsc(__unused struct timecounter *tc) +{ + + return (rdtsc32()); +} + +int +delay_tc(int n) +{ + struct timecounter *tc; + timecounter_get_t *func; + uint64_t end, freq, now; + u_int last, mask, u; + + tc = timecounter; + freq = atomic_load_acq_64(&tsc_freq); + if (tsc_is_invariant && freq != 0) { + func = get_tsc; + mask = ~0u; + } else { + if (tc->tc_quality <= 0) + return (0); + func = tc->tc_get_timecount; + mask = tc->tc_counter_mask; + freq = tc->tc_frequency; + } + now = 0; + end = freq * n / 1000000; + if (func == get_tsc) + sched_pin(); + last = func(tc) & mask; + do { + cpu_spinwait(); + u = func(tc) & mask; + if (u < last) + now += mask - last + u + 1; + else + now += u - last; + last = u; + } while (now < end); + if (func == get_tsc) + sched_unpin(); + return (1); +} diff --git a/sys/x86/x86/local_apic.c b/sys/x86/x86/local_apic.c index 8c8eef6..d8d7701 100644 --- a/sys/x86/x86/local_apic.c +++ b/sys/x86/x86/local_apic.c @@ -1368,9 +1368,13 @@ apic_setup_io(void *dummy __unused) if (retval != 0) printf("%s: Failed to setup I/O APICs: returned %d\n", best_enum->apic_name, retval); -#ifdef XEN - return; + +#if defined(XEN) || defined(XENHVM) + /* There's no lapic on PV Xen */ + if (xen_pv_domain()) + return; #endif + /* * Finish setting up the local APIC on the BSP once we know how to * properly program the LINT pins. diff --git a/sys/x86/xen/hvm.c b/sys/x86/xen/hvm.c index 72811dc..dc8d9a2 100644 --- a/sys/x86/xen/hvm.c +++ b/sys/x86/xen/hvm.c @@ -35,15 +35,21 @@ __FBSDID("$FreeBSD$"); #include <sys/proc.h> #include <sys/smp.h> #include <sys/systm.h> +#include <sys/lock.h> +#include <sys/mutex.h> +#include <sys/reboot.h> #include <vm/vm.h> #include <vm/pmap.h> +#include <vm/vm_kern.h> +#include <vm/vm_extern.h> #include <dev/pci/pcivar.h> #include <machine/cpufunc.h> #include <machine/cpu.h> #include <machine/smp.h> +#include <machine/stdarg.h> #include <x86/apicreg.h> @@ -52,6 +58,9 @@ __FBSDID("$FreeBSD$"); #include <xen/gnttab.h> #include <xen/hypervisor.h> #include <xen/hvm.h> +#ifdef __amd64__ +#include <xen/pv.h> +#endif #include <xen/xen_intr.h> #include <xen/interface/hvm/params.h> @@ -97,6 +106,11 @@ extern void pmap_lazyfix_action(void); /* Variables used by mp_machdep to perform the bitmap IPI */ extern volatile u_int cpu_ipi_pending[MAXCPU]; +#ifdef __amd64__ +/* Native AP start used on PVHVM */ +extern int native_start_all_aps(void); +#endif + /*---------------------------------- Macros ----------------------------------*/ #define IPI_TO_IDX(ipi) ((ipi) - APIC_IPI_INTS) @@ -119,7 +133,10 @@ enum xen_domain_type xen_domain_type = XEN_NATIVE; struct cpu_ops xen_hvm_cpu_ops = { .ipi_vectored = lapic_ipi_vectored, .cpu_init = xen_hvm_cpu_init, - .cpu_resume = xen_hvm_cpu_resume + .cpu_resume = xen_hvm_cpu_resume, +#ifdef __amd64__ + .start_all_aps = native_start_all_aps, +#endif }; static MALLOC_DEFINE(M_XENHVM, "xen_hvm", "Xen HVM PV Support"); @@ -157,8 +174,9 @@ DPCPU_DEFINE(xen_intr_handle_t, ipi_handle[nitems(xen_ipis)]); /*------------------ Hypervisor Access Shared Memory Regions -----------------*/ /** Hypercall table accessed via HYPERVISOR_*_op() methods. */ -char *hypercall_stubs; +extern char *hypercall_page; shared_info_t *HYPERVISOR_shared_info; +start_info_t *HYPERVISOR_start_info; #ifdef SMP /*---------------------------- XEN PV IPI Handlers ---------------------------*/ @@ -522,7 +540,7 @@ xen_setup_cpus(void) { int i; - if (!xen_hvm_domain() || !xen_vector_callback_enabled) + if (!xen_vector_callback_enabled) return; #ifdef __amd64__ @@ -558,7 +576,7 @@ xen_hvm_cpuid_base(void) * Allocate and fill in the hypcall page. */ static int -xen_hvm_init_hypercall_stubs(void) +xen_hvm_init_hypercall_stubs(enum xen_hvm_init_type init_type) { uint32_t base, regs[4]; int i; @@ -567,7 +585,7 @@ xen_hvm_init_hypercall_stubs(void) if (base == 0) return (ENXIO); - if (hypercall_stubs == NULL) { + if (init_type == XEN_HVM_INIT_COLD) { do_cpuid(base + 1, regs); printf("XEN: Hypervisor version %d.%d detected.\n", regs[0] >> 16, regs[0] & 0xffff); @@ -577,18 +595,9 @@ xen_hvm_init_hypercall_stubs(void) * Find the hypercall pages. */ do_cpuid(base + 2, regs); - - if (hypercall_stubs == NULL) { - size_t call_region_size; - - call_region_size = regs[0] * PAGE_SIZE; - hypercall_stubs = malloc(call_region_size, M_XENHVM, M_NOWAIT); - if (hypercall_stubs == NULL) - panic("Unable to allocate Xen hypercall region"); - } for (i = 0; i < regs[0]; i++) - wrmsr(regs[1], vtophys(hypercall_stubs + i * PAGE_SIZE) + i); + wrmsr(regs[1], vtophys(&hypercall_page + i * PAGE_SIZE) + i); return (0); } @@ -677,8 +686,6 @@ xen_hvm_disable_emulated_devices(void) if (inw(XEN_MAGIC_IOPORT) != XMI_MAGIC) return; - if (bootverbose) - printf("XEN: Disabling emulated block and network devices\n"); outw(XEN_MAGIC_IOPORT, XMI_UNPLUG_IDE_DISKS|XMI_UNPLUG_NICS); } @@ -691,7 +698,12 @@ xen_hvm_init(enum xen_hvm_init_type init_type) if (init_type == XEN_HVM_INIT_CANCELLED_SUSPEND) return; - error = xen_hvm_init_hypercall_stubs(); + if (xen_pv_domain()) { + /* hypercall page is already set in the PV case */ + error = 0; + } else { + error = xen_hvm_init_hypercall_stubs(init_type); + } switch (init_type) { case XEN_HVM_INIT_COLD: @@ -701,6 +713,12 @@ xen_hvm_init(enum xen_hvm_init_type init_type) setup_xen_features(); cpu_ops = xen_hvm_cpu_ops; vm_guest = VM_GUEST_XEN; +#ifdef __amd64__ + if (xen_pv_domain()) + cpu_ops.start_all_aps = xen_pv_start_all_aps; + else +#endif + printf("XEN: Disabling emulated block and network devices\n"); break; case XEN_HVM_INIT_RESUME: if (error != 0) @@ -715,10 +733,13 @@ xen_hvm_init(enum xen_hvm_init_type init_type) } xen_vector_callback_enabled = 0; - xen_domain_type = XEN_HVM_DOMAIN; - xen_hvm_init_shared_info_page(); xen_hvm_set_callback(NULL); - xen_hvm_disable_emulated_devices(); + + if (!xen_pv_domain()) { + xen_domain_type = XEN_HVM_DOMAIN; + xen_hvm_init_shared_info_page(); + xen_hvm_disable_emulated_devices(); + } } void @@ -749,10 +770,14 @@ xen_set_vcpu_id(void) struct pcpu *pc; int i; - /* Set vcpu_id to acpi_id */ + if (!xen_domain()) + return; + + /* Set vcpu_id to acpi_id for PVHVM guests */ CPU_FOREACH(i) { pc = pcpu_find(i); - pc->pc_vcpu_id = pc->pc_acpi_id; + if (xen_hvm_domain()) + pc->pc_vcpu_id = pc->pc_acpi_id; if (bootverbose) printf("XEN: CPU %u has VCPU ID %u\n", i, pc->pc_vcpu_id); @@ -790,9 +815,34 @@ xen_hvm_cpu_init(void) DPCPU_SET(vcpu_info, vcpu_info); } +/*----------------------------- Debug functions ------------------------------*/ +#define PRINTK_BUFSIZE 1024 +static int +vprintk(const char *fmt, __va_list ap) +{ + int retval, len; + static char buf[PRINTK_BUFSIZE]; + + retval = vsnprintf(buf, PRINTK_BUFSIZE - 1, fmt, ap); + buf[retval] = 0; + len = strlen(buf); + retval = HYPERVISOR_console_io(CONSOLEIO_write, len, (char *)buf); + return retval; +} + +void +xen_early_printf(const char *fmt, ...) +{ + __va_list ap; + + va_start(ap, fmt); + vprintk(fmt, ap); + va_end(ap); +} + SYSINIT(xen_hvm_init, SI_SUB_HYPERVISOR, SI_ORDER_FIRST, xen_hvm_sysinit, NULL); #ifdef SMP -SYSINIT(xen_setup_cpus, SI_SUB_SMP, SI_ORDER_FIRST, xen_setup_cpus, NULL); +SYSINIT(xen_setup_cpus, SI_SUB_SMP-1, SI_ORDER_ANY, xen_setup_cpus, NULL); #endif SYSINIT(xen_hvm_cpu_init, SI_SUB_INTR, SI_ORDER_FIRST, xen_hvm_cpu_init, NULL); SYSINIT(xen_set_vcpu_id, SI_SUB_CPU, SI_ORDER_ANY, xen_set_vcpu_id, NULL); diff --git a/sys/x86/xen/mptable.c b/sys/x86/xen/mptable.c new file mode 100644 index 0000000..8916314 --- /dev/null +++ b/sys/x86/xen/mptable.c @@ -0,0 +1,136 @@ +/*- + * Copyright (c) 2003 John Baldwin <jhb@FreeBSD.org> + * Copyright (c) 2013 Roger Pau Monné <roger.pau@citrix.com> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. Neither the name of the author nor the names of any co-contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD$"); + +#include <sys/param.h> +#include <sys/systm.h> +#include <sys/bus.h> +#include <sys/kernel.h> +#include <sys/smp.h> +#include <sys/pcpu.h> +#include <vm/vm.h> +#include <vm/pmap.h> + +#include <machine/intr_machdep.h> +#include <machine/apicvar.h> + +#include <machine/cpu.h> +#include <machine/smp.h> + +#include <xen/xen-os.h> +#include <xen/hypervisor.h> + +#include <xen/interface/vcpu.h> + +static int xenpv_probe(void); +static int xenpv_probe_cpus(void); +static int xenpv_setup_local(void); +static int xenpv_setup_io(void); + +static struct apic_enumerator xenpv_enumerator = { + "Xen PV", + xenpv_probe, + xenpv_probe_cpus, + xenpv_setup_local, + xenpv_setup_io +}; + +/* + * Look for an ACPI Multiple APIC Description Table ("APIC") + */ +static int +xenpv_probe(void) +{ + return (-100); +} + +/* + * Run through the MP table enumerating CPUs. + */ +static int +xenpv_probe_cpus(void) +{ + int i, ret; + + for (i = 0; i < MAXCPU; i++) { + ret = HYPERVISOR_vcpu_op(VCPUOP_is_up, i, NULL); + if (ret >= 0) + cpu_add((i * 2), (i == 0)); + } + + return (0); +} + +/* + * Initialize the local APIC on the BSP. + */ +static int +xenpv_setup_local(void) +{ + PCPU_SET(vcpu_id, 0); + return (0); +} + +/* + * Enumerate I/O APICs and setup interrupt sources. + */ +static int +xenpv_setup_io(void) +{ + return (0); +} + +static void +xenpv_register(void *dummy __unused) +{ + if (xen_pv_domain()) { + apic_register_enumerator(&xenpv_enumerator); + } +} +SYSINIT(xenpv_register, SI_SUB_TUNABLES - 1, SI_ORDER_FIRST, xenpv_register, NULL); + +/* + * Setup per-CPU ACPI IDs. + */ +static void +xenpv_set_ids(void *dummy) +{ + struct pcpu *pc; + int i; + + CPU_FOREACH(i) { + pc = pcpu_find(i); + pc->pc_vcpu_id = i; + } + return; +} +SYSINIT(xenpv_set_ids, SI_SUB_CPU, SI_ORDER_MIDDLE, xenpv_set_ids, NULL); diff --git a/sys/x86/xen/pv.c b/sys/x86/xen/pv.c new file mode 100644 index 0000000..6756dec --- /dev/null +++ b/sys/x86/xen/pv.c @@ -0,0 +1,247 @@ +/* + * Copyright (c) 2004 Christian Limpach. + * Copyright (c) 2004-2006,2008 Kip Macy + * Copyright (c) 2013 Roger Pau Monné <roger.pau@citrix.com> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD$"); + +#include <sys/param.h> +#include <sys/bus.h> +#include <sys/kernel.h> +#include <sys/malloc.h> +#include <sys/proc.h> +#include <sys/smp.h> +#include <sys/systm.h> +#include <sys/lock.h> +#include <sys/mutex.h> +#include <sys/reboot.h> + +#include <vm/vm.h> +#include <vm/pmap.h> +#include <vm/vm_kern.h> +#include <vm/vm_extern.h> + +#include <dev/pci/pcivar.h> + +#include <machine/cpufunc.h> +#include <machine/cpu.h> +#include <machine/smp.h> +#include <machine/tss.h> +#include <machine/sysarch.h> +#include <machine/clock.h> + +#include <x86/apicreg.h> + +#include <xen/xen-os.h> +#include <xen/features.h> +#include <xen/gnttab.h> +#include <xen/hypervisor.h> +#include <xen/hvm.h> +#include <xen/pv.h> +#include <xen/xen_intr.h> + +#include <xen/interface/hvm/params.h> +#include <xen/interface/vcpu.h> + +#define MAX_E820_ENTRIES 128 + +/*--------------------------- Forward Declarations ---------------------------*/ +static caddr_t xen_pv_parse_preload_data(u_int64_t); +static void xen_pv_fetch_e820_map(caddr_t, struct bios_smap **, u_int32_t *); + +/*---------------------------- Extern Declarations ---------------------------*/ +/* Variables used by amd64 mp_machdep to start APs */ +extern struct mtx ap_boot_mtx; +extern void *bootstacks[]; +extern char *doublefault_stack; +extern char *nmi_stack; +extern void *dpcpu; +extern int bootAP; +extern char *bootSTK; +extern bool lapic_disabled; + +/*-------------------------------- Global Data -------------------------------*/ +/* Xen init_ops implementation. */ +struct init_ops xen_init_ops = { + .parse_preload_data = xen_pv_parse_preload_data, + .early_delay_init = xen_delay_init, + .early_delay = xen_delay, + .fetch_e820_map = xen_pv_fetch_e820_map, +}; + +static struct +{ + const char *ev; + int mask; +} howto_names[] = { + {"boot_askname", RB_ASKNAME}, + {"boot_single", RB_SINGLE}, + {"boot_nosync", RB_NOSYNC}, + {"boot_halt", RB_ASKNAME}, + {"boot_serial", RB_SERIAL}, + {"boot_cdrom", RB_CDROM}, + {"boot_gdb", RB_GDB}, + {"boot_gdb_pause", RB_RESERVED1}, + {"boot_verbose", RB_VERBOSE}, + {"boot_multicons", RB_MULTIPLE}, + {NULL, 0} +}; + +static struct bios_smap xen_smap[MAX_E820_ENTRIES]; + +static int +start_xen_ap(int cpu) +{ + struct vcpu_guest_context *ctxt; + int ms, cpus = mp_naps; + + ctxt = malloc(sizeof(*ctxt), M_TEMP, M_NOWAIT | M_ZERO); + if (ctxt == NULL) + panic("unable to allocate memory"); + + ctxt->flags = VGCF_IN_KERNEL; + ctxt->user_regs.rip = (unsigned long) init_secondary; + ctxt->user_regs.rsp = (unsigned long) bootSTK; + + /* Set the CPU to use the same page tables and CR4 value */ + ctxt->ctrlreg[3] = KPML4phys; + ctxt->ctrlreg[4] = rcr4(); + + if (HYPERVISOR_vcpu_op(VCPUOP_initialise, cpu, ctxt)) + panic("unable to initialize CPU#%d\n", cpu); + + free(ctxt, M_TEMP); + + /* Launch the vCPU */ + if (HYPERVISOR_vcpu_op(VCPUOP_up, cpu, NULL)) + panic("unable to start AP#%d\n", cpu); + + /* Wait up to 5 seconds for it to start. */ + for (ms = 0; ms < 5000; ms++) { + if (mp_naps > cpus) + return 1; /* return SUCCESS */ + DELAY(1000); + } + + return 0; +} + +int +xen_pv_start_all_aps(void) +{ + int cpu; + + mtx_init(&ap_boot_mtx, "ap boot", NULL, MTX_SPIN); + lapic_disabled = true; + + for (cpu = 1; cpu < mp_ncpus; cpu++) { + + /* allocate and set up an idle stack data page */ + bootstacks[cpu] = (void *)kmem_malloc(kernel_arena, + KSTACK_PAGES * PAGE_SIZE, M_WAITOK | M_ZERO); + doublefault_stack = (char *)kmem_malloc(kernel_arena, + PAGE_SIZE, M_WAITOK | M_ZERO); + nmi_stack = (char *)kmem_malloc(kernel_arena, PAGE_SIZE, + M_WAITOK | M_ZERO); + dpcpu = (void *)kmem_malloc(kernel_arena, DPCPU_SIZE, + M_WAITOK | M_ZERO); + + bootSTK = (char *)bootstacks[cpu] + KSTACK_PAGES * PAGE_SIZE - 8; + bootAP = cpu; + + /* attempt to start the Application Processor */ + if (!start_xen_ap(cpu)) + panic("AP #%d failed to start!", cpu); + + CPU_SET(cpu, &all_cpus); /* record AP in CPU map */ + } + + return mp_naps; +} + +/* + * Functions to convert the "extra" parameters passed by Xen + * into FreeBSD boot options (from the i386 Xen port). + */ +static char * +xen_setbootenv(char *cmd_line) +{ + char *cmd_line_next; + + /* Skip leading spaces */ + for (; *cmd_line == ' '; cmd_line++); + + for (cmd_line_next = cmd_line; strsep(&cmd_line_next, ",") != NULL;); + return (cmd_line); +} + +static int +xen_boothowto(char *envp) +{ + int i, howto = 0; + + /* get equivalents from the environment */ + for (i = 0; howto_names[i].ev != NULL; i++) + if (getenv(howto_names[i].ev) != NULL) + howto |= howto_names[i].mask; + return (howto); +} + +static caddr_t +xen_pv_parse_preload_data(u_int64_t modulep) +{ + /* Parse the extra boot information given by Xen */ + if (HYPERVISOR_start_info->cmd_line) + kern_envp = xen_setbootenv(HYPERVISOR_start_info->cmd_line); + boothowto |= xen_boothowto(kern_envp); + + return (NULL); +} + +static void +xen_pv_fetch_e820_map(caddr_t kmdp, struct bios_smap **smap, u_int32_t *size) +{ + struct xen_memory_map memmap; + int rc; + + /* Fetch the E820 map from Xen */ + memmap.nr_entries = MAX_E820_ENTRIES; + set_xen_guest_handle(memmap.buffer, xen_smap); + rc = HYPERVISOR_memory_op(XENMEM_memory_map, &memmap); + if (rc) + panic("unable to fetch Xen E820 memory map"); + + *smap = xen_smap; + *size = memmap.nr_entries * sizeof(xen_smap[0]); +} + +void +xen_pv_set_init_ops(void) +{ + /* Init ops for Xen PV */ + init_ops = xen_init_ops; +} diff --git a/sys/x86/xen/pvcpu.c b/sys/x86/xen/pvcpu.c new file mode 100644 index 0000000..35d88148 --- /dev/null +++ b/sys/x86/xen/pvcpu.c @@ -0,0 +1,77 @@ +/* + * Copyright (c) 2013 Roger Pau Monné <roger.pau@citrix.com> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD$"); + +#include <sys/param.h> +#include <sys/systm.h> +#include <sys/bus.h> +#include <sys/kernel.h> +#include <sys/module.h> +#include <sys/pcpu.h> +#include <sys/smp.h> + +#include <xen/xen-os.h> + +static int +xenpvcpu_probe(device_t dev) +{ + if (!xen_pv_domain()) + return (ENXIO); + + device_set_desc(dev, "Xen PV CPU"); + return (0); +} + +static int +xenpvcpu_attach(device_t dev) +{ + struct pcpu *pc; + int cpu; + + cpu = device_get_unit(dev); + pc = pcpu_find(cpu); + pc->pc_device = dev; + return (0); +} + +static device_method_t xenpvcpu_methods[] = { + DEVMETHOD(device_probe, xenpvcpu_probe), + DEVMETHOD(device_attach, xenpvcpu_attach), + DEVMETHOD_END +}; + +static driver_t xenpvcpu_driver = { + "pvcpu", + xenpvcpu_methods, + 0, +}; + +devclass_t xenpvcpu_devclass; + +DRIVER_MODULE(xenpvcpu, nexus, xenpvcpu_driver, xenpvcpu_devclass, 0, 0); +MODULE_DEPEND(xenpvcpu, nexus, 1, 1, 1); diff --git a/sys/x86/xen/xen_nexus.c b/sys/x86/xen/xen_nexus.c new file mode 100644 index 0000000..288e6b6 --- /dev/null +++ b/sys/x86/xen/xen_nexus.c @@ -0,0 +1,99 @@ +/* + * Copyright (c) 2013 Roger Pau Monné <roger.pau@citrix.com> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD$"); + +#include <sys/param.h> +#include <sys/bus.h> +#include <sys/kernel.h> +#include <sys/module.h> +#include <sys/sysctl.h> +#include <sys/systm.h> +#include <sys/smp.h> + +#include <machine/nexusvar.h> + +#include <xen/xen-os.h> + +static const char *xen_devices[] = +{ + "xenstore", /* XenStore bus */ + "xen_et", /* Xen PV timer (provides: tc, et, clk) */ + "xc", /* Xen PV console */ + "isa", /* Dummy ISA bus for sc to attach */ +}; + +/* + * Xen nexus(4) driver. + */ +static int +nexus_xen_probe(device_t dev) +{ + if (!xen_pv_domain()) + return (ENXIO); + + return (BUS_PROBE_DEFAULT); +} + +static int +nexus_xen_attach(device_t dev) +{ + int i, error = 0; + + nexus_init_resources(); + bus_generic_probe(dev); + + /* + * Since we have no ACPI, we need to create a dummy CPU device + * in order to set pcpu->pc_device. + */ + CPU_FOREACH(i) + if (BUS_ADD_CHILD(dev, 0, "pvcpu", i) == NULL) + panic("unable to add pvcpu#%d device", i); + + for (i = 0; i < nitems(xen_devices); i++) { + if (BUS_ADD_CHILD(dev, 0, xen_devices[i], 0) == NULL) + panic("%s: could not add", xen_devices[i]); + } + + bus_generic_attach(dev); + + return (error); +} + +static device_method_t nexus_xen_methods[] = { + /* Device interface */ + DEVMETHOD(device_probe, nexus_xen_probe), + DEVMETHOD(device_attach, nexus_xen_attach), + + { 0, 0 } +}; + +DEFINE_CLASS_1(nexus, nexus_xen_driver, nexus_xen_methods, 1, nexus_driver); +static devclass_t nexus_devclass; + +DRIVER_MODULE(nexus_xen, root, nexus_xen_driver, nexus_devclass, 0, 0); diff --git a/sys/xen/gnttab.c b/sys/xen/gnttab.c index 03c32b7..909378a 100644 --- a/sys/xen/gnttab.c +++ b/sys/xen/gnttab.c @@ -25,6 +25,7 @@ __FBSDID("$FreeBSD$"); #include <sys/lock.h> #include <sys/malloc.h> #include <sys/mman.h> +#include <sys/limits.h> #include <xen/xen-os.h> #include <xen/hypervisor.h> @@ -607,6 +608,7 @@ gnttab_resume(void) { int error; unsigned int max_nr_gframes, nr_gframes; + void *alloc_mem; nr_gframes = nr_grant_frames; max_nr_gframes = max_nr_grant_frames(); @@ -614,11 +616,20 @@ gnttab_resume(void) return (ENOSYS); if (!resume_frames) { - error = xenpci_alloc_space(PAGE_SIZE * max_nr_gframes, - &resume_frames); - if (error) { - printf("error mapping gnttab share frames\n"); - return (error); + if (xen_pv_domain()) { + alloc_mem = contigmalloc(max_nr_gframes * PAGE_SIZE, + M_DEVBUF, M_NOWAIT, 0, + ULONG_MAX, PAGE_SIZE, 0); + KASSERT((alloc_mem != NULL), + ("unable to alloc memory for gnttab")); + resume_frames = vtophys(alloc_mem); + } else { + error = xenpci_alloc_space(PAGE_SIZE * max_nr_gframes, + &resume_frames); + if (error) { + printf("error mapping gnttab share frames\n"); + return (error); + } } } diff --git a/sys/xen/interface/arch-x86/xen.h b/sys/xen/interface/arch-x86/xen.h index 1c186d7..6cc15d3 100644 --- a/sys/xen/interface/arch-x86/xen.h +++ b/sys/xen/interface/arch-x86/xen.h @@ -147,7 +147,16 @@ struct vcpu_guest_context { struct cpu_user_regs user_regs; /* User-level CPU registers */ struct trap_info trap_ctxt[256]; /* Virtual IDT */ unsigned long ldt_base, ldt_ents; /* LDT (linear address, # ents) */ - unsigned long gdt_frames[16], gdt_ents; /* GDT (machine frames, # ents) */ + union { + struct { + /* PV: GDT (machine frames, # ents).*/ + unsigned long gdt_frames[16], gdt_ents; + } pv; + struct { + /* PVH: GDTR addr and size */ + unsigned long gdtaddr, gdtsz; + } pvh; + } u; unsigned long kernel_ss, kernel_sp; /* Virtual TSS (only SS1/SP1) */ /* NB. User pagetable on x86/64 is placed in ctrlreg[1]. */ unsigned long ctrlreg[8]; /* CR0-CR7 (control registers) */ diff --git a/sys/xen/pv.h b/sys/xen/pv.h new file mode 100644 index 0000000..bbb1048 --- /dev/null +++ b/sys/xen/pv.h @@ -0,0 +1,29 @@ +/* + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * $FreeBSD$ + */ + +#ifndef __XEN_PV_H__ +#define __XEN_PV_H__ + +int xen_pv_start_all_aps(void); +void xen_pv_set_init_ops(void); + +#endif /* __XEN_PV_H__ */ \ No newline at end of file diff --git a/sys/xen/xen-os.h b/sys/xen/xen-os.h index 87644e9..70e4719 100644 --- a/sys/xen/xen-os.h +++ b/sys/xen/xen-os.h @@ -51,6 +51,11 @@ void force_evtchn_callback(void); extern shared_info_t *HYPERVISOR_shared_info; +extern start_info_t *HYPERVISOR_start_info; + +/* XXX: we need to get rid of this and use HYPERVISOR_start_info directly */ +extern struct xenstore_domain_interface *xen_store; +extern char *console_page; enum xen_domain_type { XEN_NATIVE, /* running on bare hardware */ @@ -78,6 +83,9 @@ xen_hvm_domain(void) return (xen_domain_type == XEN_HVM_DOMAIN); } +/* Debug function, prints directly to hypervisor console */ +void xen_early_printf(const char *, ...); + #ifndef xen_mb #define xen_mb() mb() #endif diff --git a/sys/xen/xenstore/xenstore.c b/sys/xen/xenstore/xenstore.c index d404862..a4ef369 100644 --- a/sys/xen/xenstore/xenstore.c +++ b/sys/xen/xenstore/xenstore.c @@ -1079,12 +1079,6 @@ xs_init_comms(void) } /*------------------ Private Device Attachment Functions --------------------*/ -static void -xs_identify(driver_t *driver, device_t parent) -{ - - BUS_ADD_CHILD(parent, 0, "xenstore", 0); -} /** * Probe for the existance of the XenStore. @@ -1148,11 +1142,17 @@ xs_attach(device_t dev) struct proc *p; #ifdef XENHVM - xs.evtchn = hvm_get_parameter(HVM_PARAM_STORE_EVTCHN); - xs.gpfn = hvm_get_parameter(HVM_PARAM_STORE_PFN); - xen_store = pmap_mapdev(xs.gpfn * PAGE_SIZE, PAGE_SIZE); + if (xen_hvm_domain()) { + xs.evtchn = hvm_get_parameter(HVM_PARAM_STORE_EVTCHN); + xs.gpfn = hvm_get_parameter(HVM_PARAM_STORE_PFN); + xen_store = pmap_mapdev(xs.gpfn * PAGE_SIZE, PAGE_SIZE); + } else if (xen_pv_domain()) { + xs.evtchn = HYPERVISOR_start_info->store_evtchn; + } else { + panic("Unknown domain type, cannot initialize xenstore\n"); + } #else - xs.evtchn = xen_start_info->store_evtchn; + xs.evtchn = HYPERVISOR_start_info->store_evtchn; #endif TAILQ_INIT(&xs.reply_list); @@ -1240,7 +1240,6 @@ xs_resume(device_t dev __unused) /*-------------------- Private Device Attachment Data -----------------------*/ static device_method_t xenstore_methods[] = { /* Device interface */ - DEVMETHOD(device_identify, xs_identify), DEVMETHOD(device_probe, xs_probe), DEVMETHOD(device_attach, xs_attach), DEVMETHOD(device_detach, bus_generic_detach), @@ -1263,9 +1262,8 @@ static devclass_t xenstore_devclass; #ifdef XENHVM DRIVER_MODULE(xenstore, xenpci, xenstore_driver, xenstore_devclass, 0, 0); -#else -DRIVER_MODULE(xenstore, nexus, xenstore_driver, xenstore_devclass, 0, 0); #endif +DRIVER_MODULE(xenstore, nexus, xenstore_driver, xenstore_devclass, 0, 0); /*------------------------------- Sysctl Data --------------------------------*/ /* XXX Shouldn't the node be somewhere else? */ -- 1.7.7.5 (Apple Git-26) --------------010605090609060304010908--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?527BD793.8010606>