Date: Tue, 12 Mar 2019 15:02:35 +0000 (UTC) From: =?UTF-8?Q?Roger_Pau_Monn=c3=a9?= <royger@FreeBSD.org> To: ports-committers@freebsd.org, svn-ports-all@freebsd.org, svn-ports-head@freebsd.org Subject: svn commit: r495458 - in head/emulators/xen-kernel: . files Message-ID: <201903121502.x2CF2ZCM095504@repo.freebsd.org>
next in thread | raw e-mail | index | archive | help
Author: royger (src committer) Date: Tue Mar 12 15:02:35 2019 New Revision: 495458 URL: https://svnweb.freebsd.org/changeset/ports/495458 Log: emulators/xen-kernel: backport fixes and apply XSAs Backport a couple of fixes critical for PVH dom0 and fixes for XSA-{284,287,290,292-294}. Sponsored-by: Citrix Systems R&D Reviewed by: bapt Differential revision: https://reviews.freebsd.org/D19413 Added: head/emulators/xen-kernel/files/0001-pvh-dom0-fix-deadlock-in-GSI-mapping.patch (contents, props changed) head/emulators/xen-kernel/files/0001-x86-dom0-propagate-PVH-vlapic-EOIs-to-hardware.patch (contents, props changed) head/emulators/xen-kernel/files/0001-x86-mm-locks-remove-trailing-whitespace.patch (contents, props changed) head/emulators/xen-kernel/files/0002-x86-mm-locks-convert-some-macros-to-inline-functions.patch (contents, props changed) head/emulators/xen-kernel/files/0003-x86-mm-locks-apply-a-bias-to-lock-levels-for-control.patch (contents, props changed) head/emulators/xen-kernel/files/xsa284.patch (contents, props changed) head/emulators/xen-kernel/files/xsa287-4.11.patch (contents, props changed) head/emulators/xen-kernel/files/xsa290-4.11-1.patch (contents, props changed) head/emulators/xen-kernel/files/xsa290-4.11-2.patch (contents, props changed) head/emulators/xen-kernel/files/xsa292.patch (contents, props changed) head/emulators/xen-kernel/files/xsa293-4.11-1.patch (contents, props changed) head/emulators/xen-kernel/files/xsa293-4.11-2.patch (contents, props changed) head/emulators/xen-kernel/files/xsa294-4.11.patch (contents, props changed) Modified: head/emulators/xen-kernel/Makefile Modified: head/emulators/xen-kernel/Makefile ============================================================================== --- head/emulators/xen-kernel/Makefile Tue Mar 12 14:35:24 2019 (r495457) +++ head/emulators/xen-kernel/Makefile Tue Mar 12 15:02:35 2019 (r495458) @@ -2,7 +2,7 @@ PORTNAME= xen PORTVERSION= 4.11.1 -PORTREVISION= 0 +PORTREVISION= 1 CATEGORIES= emulators MASTER_SITES= http://downloads.xenproject.org/release/xen/${PORTVERSION}/ PKGNAMESUFFIX= -kernel @@ -45,6 +45,29 @@ EXTRA_PATCHES+= ${FILESDIR}/0001-x86-mtrr-introduce-ma EXTRA_PATCHES+= ${FILESDIR}/0001-x86-replace-usage-in-the-linker-script.patch:-p1 # Fix PVH Dom0 build with shadow paging EXTRA_PATCHES+= ${FILESDIR}/0001-x86-pvh-change-the-order-of-the-iommu-initialization.patch:-p1 +# Forward dom0 lapic EOIs to underlying hardware +EXTRA_PATCHES+= ${FILESDIR}/0001-x86-dom0-propagate-PVH-vlapic-EOIs-to-hardware.patch:-p1 +# Fix deadlock in IO-APIC gsi mapping +EXTRA_PATCHES+= ${FILESDIR}/0001-pvh-dom0-fix-deadlock-in-GSI-mapping.patch:-p1 +# Fix for migration/save +EXTRA_PATCHES+= ${FILESDIR}/0001-x86-mm-locks-remove-trailing-whitespace.patch:-p1 \ + ${FILESDIR}/0002-x86-mm-locks-convert-some-macros-to-inline-functions.patch:-p1 \ + ${FILESDIR}/0003-x86-mm-locks-apply-a-bias-to-lock-levels-for-control.patch:-p1 + +# XSA-284 +EXTRA_PATCHES+= ${FILESDIR}/xsa284.patch:-p1 +# XSA-287 +EXTRA_PATCHES+= ${FILESDIR}/xsa287-4.11.patch:-p1 +# XSA-290 +EXTRA_PATCHES+= ${FILESDIR}/xsa290-4.11-1.patch:-p1 \ + ${FILESDIR}/xsa290-4.11-2.patch:-p1 +# XSA-292 +EXTRA_PATCHES+= ${FILESDIR}/xsa292.patch:-p1 +# XSA-293 +EXTRA_PATCHES+= ${FILESDIR}/xsa293-4.11-1.patch:-p1 \ + ${FILESDIR}/xsa293-4.11-2.patch:-p1 +# XSA-294 +EXTRA_PATCHES+= ${FILESDIR}/xsa294-4.11.patch:-p1 .include <bsd.port.options.mk> Added: head/emulators/xen-kernel/files/0001-pvh-dom0-fix-deadlock-in-GSI-mapping.patch ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ head/emulators/xen-kernel/files/0001-pvh-dom0-fix-deadlock-in-GSI-mapping.patch Tue Mar 12 15:02:35 2019 (r495458) @@ -0,0 +1,115 @@ +From 603ad88fe8a681a2c5408c3f432d7083dd1c41b1 Mon Sep 17 00:00:00 2001 +From: Roger Pau Monne <roger.pau@citrix.com> +Date: Mon, 28 Jan 2019 15:22:45 +0100 +Subject: [PATCH] pvh/dom0: fix deadlock in GSI mapping +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +The current GSI mapping code can cause the following deadlock: + +(XEN) *** Dumping CPU0 host state: *** +(XEN) ----[ Xen-4.12.0-rc x86_64 debug=y Tainted: C ]---- +[...] +(XEN) Xen call trace: +(XEN) [<ffff82d080239852>] vmac.c#_spin_lock_cb+0x32/0x70 +(XEN) [<ffff82d0802ed40f>] vmac.c#hvm_gsi_assert+0x2f/0x60 <- pick hvm.irq_lock +(XEN) [<ffff82d080255cc9>] io.c#hvm_dirq_assist+0xd9/0x130 <- pick event_lock +(XEN) [<ffff82d080255b4b>] io.c#dpci_softirq+0xdb/0x120 +(XEN) [<ffff82d080238ce6>] softirq.c#__do_softirq+0x46/0xa0 +(XEN) [<ffff82d08026f955>] domain.c#idle_loop+0x35/0x90 +(XEN) +[...] +(XEN) *** Dumping CPU3 host state: *** +(XEN) ----[ Xen-4.12.0-rc x86_64 debug=y Tainted: C ]---- +[...] +(XEN) Xen call trace: +(XEN) [<ffff82d08023985d>] vmac.c#_spin_lock_cb+0x3d/0x70 +(XEN) [<ffff82d080281fc8>] vmac.c#allocate_and_map_gsi_pirq+0xc8/0x130 <- pick event_lock +(XEN) [<ffff82d0802f44c0>] vioapic.c#vioapic_hwdom_map_gsi+0x80/0x130 +(XEN) [<ffff82d0802f4399>] vioapic.c#vioapic_write_redirent+0x119/0x1c0 <- pick hvm.irq_lock +(XEN) [<ffff82d0802f4075>] vioapic.c#vioapic_write+0x35/0x40 +(XEN) [<ffff82d0802e96a2>] vmac.c#hvm_process_io_intercept+0xd2/0x230 +(XEN) [<ffff82d0802e9842>] vmac.c#hvm_io_intercept+0x22/0x50 +(XEN) [<ffff82d0802dbe9b>] emulate.c#hvmemul_do_io+0x21b/0x3c0 +(XEN) [<ffff82d0802db302>] emulate.c#hvmemul_do_io_buffer+0x32/0x70 +(XEN) [<ffff82d0802dcd29>] emulate.c#hvmemul_do_mmio_buffer+0x29/0x30 +(XEN) [<ffff82d0802dcc19>] emulate.c#hvmemul_phys_mmio_access+0xf9/0x1b0 +(XEN) [<ffff82d0802dc6d0>] emulate.c#hvmemul_linear_mmio_access+0xf0/0x180 +(XEN) [<ffff82d0802de971>] emulate.c#hvmemul_linear_mmio_write+0x21/0x30 +(XEN) [<ffff82d0802de742>] emulate.c#linear_write+0xa2/0x100 +(XEN) [<ffff82d0802dce15>] emulate.c#hvmemul_write+0xb5/0x120 +(XEN) [<ffff82d0802babba>] vmac.c#x86_emulate+0x132aa/0x149a0 +(XEN) [<ffff82d0802c04f9>] vmac.c#x86_emulate_wrapper+0x29/0x70 +(XEN) [<ffff82d0802db570>] emulate.c#_hvm_emulate_one+0x50/0x140 +(XEN) [<ffff82d0802e9e31>] vmac.c#hvm_emulate_one_insn+0x41/0x100 +(XEN) [<ffff82d080345066>] guest_4.o#sh_page_fault__guest_4+0x976/0xd30 +(XEN) [<ffff82d08030cc69>] vmac.c#vmx_vmexit_handler+0x949/0xea0 +(XEN) [<ffff82d08031411a>] vmac.c#vmx_asm_vmexit_handler+0xfa/0x270 + +In order to solve it move the vioapic_hwdom_map_gsi outside of the +locked region in vioapic_write_redirent. vioapic_hwdom_map_gsi will +not access any of the vioapic fields, so there's no need to call the +function holding the hvm.irq_lock. + +Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> +Reviewed-by: Wei Liu <wei.liu2@citrix.com> +Reviewed-by: Jan Beulich <jbeulich@suse.com> +Release-acked-by: Juergen Gross <jgross@suse.com> +--- + xen/arch/x86/hvm/vioapic.c | 32 ++++++++++++++++++-------------- + 1 file changed, 18 insertions(+), 14 deletions(-) + +diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c +index 2b74f92d51..2d71c33c1c 100644 +--- a/xen/arch/x86/hvm/vioapic.c ++++ b/xen/arch/x86/hvm/vioapic.c +@@ -236,20 +236,6 @@ static void vioapic_write_redirent( + + *pent = ent; + +- if ( is_hardware_domain(d) && unmasked ) +- { +- int ret; +- +- ret = vioapic_hwdom_map_gsi(gsi, ent.fields.trig_mode, +- ent.fields.polarity); +- if ( ret ) +- { +- /* Mask the entry again. */ +- pent->fields.mask = 1; +- unmasked = 0; +- } +- } +- + if ( gsi == 0 ) + { + vlapic_adjust_i8259_target(d); +@@ -266,6 +252,24 @@ static void vioapic_write_redirent( + + spin_unlock(&d->arch.hvm.irq_lock); + ++ if ( is_hardware_domain(d) && unmasked ) ++ { ++ /* ++ * NB: don't call vioapic_hwdom_map_gsi while holding hvm.irq_lock ++ * since it can cause deadlocks as event_lock is taken by ++ * allocate_and_map_gsi_pirq, and that will invert the locking order ++ * used by other parts of the code. ++ */ ++ int ret = vioapic_hwdom_map_gsi(gsi, ent.fields.trig_mode, ++ ent.fields.polarity); ++ if ( ret ) ++ { ++ gprintk(XENLOG_ERR, ++ "unable to bind gsi %u to hardware domain: %d\n", gsi, ret); ++ unmasked = 0; ++ } ++ } ++ + if ( gsi == 0 || unmasked ) + pt_may_unmask_irq(d, NULL); + } +-- +2.17.2 (Apple Git-113) + Added: head/emulators/xen-kernel/files/0001-x86-dom0-propagate-PVH-vlapic-EOIs-to-hardware.patch ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ head/emulators/xen-kernel/files/0001-x86-dom0-propagate-PVH-vlapic-EOIs-to-hardware.patch Tue Mar 12 15:02:35 2019 (r495458) @@ -0,0 +1,39 @@ +From 19d2bce1c3cbfdc636c142cdf0ae38795f2202dd Mon Sep 17 00:00:00 2001 +From: Roger Pau Monne <roger.pau@citrix.com> +Date: Thu, 14 Feb 2019 14:41:03 +0100 +Subject: [PATCH for-4.12] x86/dom0: propagate PVH vlapic EOIs to hardware +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +Current check for MSI EIO is missing a special case for PVH Dom0, +which doesn't have a hvm_irq_dpci struct but requires EIOs to be +forwarded to the physical lapic for passed-through devices. + +Add a short-circuit to allow EOIs from PVH Dom0 to be propagated. + +Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> +--- +Cc: Jan Beulich <jbeulich@suse.com> +Cc: Juergen Gross <jgross@suse.com> +--- + xen/drivers/passthrough/io.c | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) + +diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c +index a6eb8a4336..4290c7c710 100644 +--- a/xen/drivers/passthrough/io.c ++++ b/xen/drivers/passthrough/io.c +@@ -869,7 +869,8 @@ static int _hvm_dpci_msi_eoi(struct domain *d, + + void hvm_dpci_msi_eoi(struct domain *d, int vector) + { +- if ( !iommu_enabled || !hvm_domain_irq(d)->dpci ) ++ if ( !iommu_enabled || ++ (!hvm_domain_irq(d)->dpci && !is_hardware_domain(d)) ) + return; + + spin_lock(&d->event_lock); +-- +2.17.2 (Apple Git-113) + Added: head/emulators/xen-kernel/files/0001-x86-mm-locks-remove-trailing-whitespace.patch ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ head/emulators/xen-kernel/files/0001-x86-mm-locks-remove-trailing-whitespace.patch Tue Mar 12 15:02:35 2019 (r495458) @@ -0,0 +1,101 @@ +From 468937da985661e5cd1d6b2df6d6ab2d1fb1e5e4 Mon Sep 17 00:00:00 2001 +From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com> +Date: Tue, 12 Mar 2019 12:21:03 +0100 +Subject: [PATCH 1/3] x86/mm-locks: remove trailing whitespace +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +No functional change. + +Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> +Reviewed-by: George Dunlap <george.dunlap@citrix.com> +--- + xen/arch/x86/mm/mm-locks.h | 24 ++++++++++++------------ + 1 file changed, 12 insertions(+), 12 deletions(-) + +diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h +index e5fceb2d2e..6c15b9a4cc 100644 +--- a/xen/arch/x86/mm/mm-locks.h ++++ b/xen/arch/x86/mm/mm-locks.h +@@ -3,11 +3,11 @@ + * + * Spinlocks used by the code in arch/x86/mm. + * +- * Copyright (c) 2011 Citrix Systems, inc. ++ * Copyright (c) 2011 Citrix Systems, inc. + * Copyright (c) 2007 Advanced Micro Devices (Wei Huang) + * Copyright (c) 2006-2007 XenSource Inc. + * Copyright (c) 2006 Michael A Fetterman +- * ++ * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or +@@ -41,7 +41,7 @@ static inline void mm_lock_init(mm_lock_t *l) + l->unlock_level = 0; + } + +-static inline int mm_locked_by_me(mm_lock_t *l) ++static inline int mm_locked_by_me(mm_lock_t *l) + { + return (l->lock.recurse_cpu == current->processor); + } +@@ -67,7 +67,7 @@ do { \ + + static inline void _mm_lock(mm_lock_t *l, const char *func, int level, int rec) + { +- if ( !((mm_locked_by_me(l)) && rec) ) ++ if ( !((mm_locked_by_me(l)) && rec) ) + __check_lock_level(level); + spin_lock_recursive(&l->lock); + if ( l->lock.recurse_cnt == 1 ) +@@ -186,7 +186,7 @@ static inline void mm_unlock(mm_lock_t *l) + spin_unlock_recursive(&l->lock); + } + +-static inline void mm_enforce_order_unlock(int unlock_level, ++static inline void mm_enforce_order_unlock(int unlock_level, + unsigned short *recurse_count) + { + if ( recurse_count ) +@@ -310,7 +310,7 @@ declare_mm_rwlock(altp2m); + #define gfn_locked_by_me(p,g) p2m_locked_by_me(p) + + /* PoD lock (per-p2m-table) +- * ++ * + * Protects private PoD data structs: entry and cache + * counts, page lists, sweep parameters. */ + +@@ -322,7 +322,7 @@ declare_mm_lock(pod) + + /* Page alloc lock (per-domain) + * +- * This is an external lock, not represented by an mm_lock_t. However, ++ * This is an external lock, not represented by an mm_lock_t. However, + * pod code uses it in conjunction with the p2m lock, and expecting + * the ordering which we enforce here. + * The lock is not recursive. */ +@@ -338,13 +338,13 @@ declare_mm_order_constraint(page_alloc) + * For shadow pagetables, this lock protects + * - all changes to shadow page table pages + * - the shadow hash table +- * - the shadow page allocator ++ * - the shadow page allocator + * - all changes to guest page table pages + * - all changes to the page_info->tlbflush_timestamp +- * - the page_info->count fields on shadow pages +- * +- * For HAP, it protects the NPT/EPT tables and mode changes. +- * ++ * - the page_info->count fields on shadow pages ++ * ++ * For HAP, it protects the NPT/EPT tables and mode changes. ++ * + * It also protects the log-dirty bitmap from concurrent accesses (and + * teardowns, etc). */ + +-- +2.17.2 (Apple Git-113) + Added: head/emulators/xen-kernel/files/0002-x86-mm-locks-convert-some-macros-to-inline-functions.patch ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ head/emulators/xen-kernel/files/0002-x86-mm-locks-convert-some-macros-to-inline-functions.patch Tue Mar 12 15:02:35 2019 (r495458) @@ -0,0 +1,210 @@ +From 45e260afe7ee0e6b18a7e64173a081eec6e056aa Mon Sep 17 00:00:00 2001 +From: Roger Pau Monne <roger.pau@citrix.com> +Date: Tue, 12 Mar 2019 12:24:37 +0100 +Subject: [PATCH 2/3] x86/mm-locks: convert some macros to inline functions +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +And rename to have only one prefix underscore where applicable. + +No functional change. + +Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> +Reviewed-by: George Dunlap <george.dunlap@citrix.com> +--- + xen/arch/x86/mm/mm-locks.h | 98 ++++++++++++++++++++------------------ + 1 file changed, 52 insertions(+), 46 deletions(-) + +diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h +index 6c15b9a4cc..d3497713e9 100644 +--- a/xen/arch/x86/mm/mm-locks.h ++++ b/xen/arch/x86/mm/mm-locks.h +@@ -29,7 +29,6 @@ + + /* Per-CPU variable for enforcing the lock ordering */ + DECLARE_PER_CPU(int, mm_lock_level); +-#define __get_lock_level() (this_cpu(mm_lock_level)) + + DECLARE_PERCPU_RWLOCK_GLOBAL(p2m_percpu_rwlock); + +@@ -46,43 +45,47 @@ static inline int mm_locked_by_me(mm_lock_t *l) + return (l->lock.recurse_cpu == current->processor); + } + ++static inline int _get_lock_level(void) ++{ ++ return this_cpu(mm_lock_level); ++} ++ + /* + * If you see this crash, the numbers printed are order levels defined + * in this file. + */ +-#define __check_lock_level(l) \ +-do { \ +- if ( unlikely(__get_lock_level() > (l)) ) \ +- { \ +- printk("mm locking order violation: %i > %i\n", \ +- __get_lock_level(), (l)); \ +- BUG(); \ +- } \ +-} while(0) +- +-#define __set_lock_level(l) \ +-do { \ +- __get_lock_level() = (l); \ +-} while(0) ++static inline void _check_lock_level(int l) ++{ ++ if ( unlikely(_get_lock_level() > l) ) ++ { ++ printk("mm locking order violation: %i > %i\n", _get_lock_level(), l); ++ BUG(); ++ } ++} ++ ++static inline void _set_lock_level(int l) ++{ ++ this_cpu(mm_lock_level) = l; ++} + + static inline void _mm_lock(mm_lock_t *l, const char *func, int level, int rec) + { + if ( !((mm_locked_by_me(l)) && rec) ) +- __check_lock_level(level); ++ _check_lock_level(level); + spin_lock_recursive(&l->lock); + if ( l->lock.recurse_cnt == 1 ) + { + l->locker_function = func; +- l->unlock_level = __get_lock_level(); ++ l->unlock_level = _get_lock_level(); + } + else if ( (unlikely(!rec)) ) +- panic("mm lock already held by %s", l->locker_function); +- __set_lock_level(level); ++ panic("mm lock already held by %s\n", l->locker_function); ++ _set_lock_level(level); + } + + static inline void _mm_enforce_order_lock_pre(int level) + { +- __check_lock_level(level); ++ _check_lock_level(level); + } + + static inline void _mm_enforce_order_lock_post(int level, int *unlock_level, +@@ -92,12 +95,12 @@ static inline void _mm_enforce_order_lock_post(int level, int *unlock_level, + { + if ( (*recurse_count)++ == 0 ) + { +- *unlock_level = __get_lock_level(); ++ *unlock_level = _get_lock_level(); + } + } else { +- *unlock_level = __get_lock_level(); ++ *unlock_level = _get_lock_level(); + } +- __set_lock_level(level); ++ _set_lock_level(level); + } + + +@@ -118,12 +121,12 @@ static inline void _mm_write_lock(mm_rwlock_t *l, const char *func, int level) + { + if ( !mm_write_locked_by_me(l) ) + { +- __check_lock_level(level); ++ _check_lock_level(level); + percpu_write_lock(p2m_percpu_rwlock, &l->lock); + l->locker = get_processor_id(); + l->locker_function = func; +- l->unlock_level = __get_lock_level(); +- __set_lock_level(level); ++ l->unlock_level = _get_lock_level(); ++ _set_lock_level(level); + } + l->recurse_count++; + } +@@ -134,13 +137,13 @@ static inline void mm_write_unlock(mm_rwlock_t *l) + return; + l->locker = -1; + l->locker_function = "nobody"; +- __set_lock_level(l->unlock_level); ++ _set_lock_level(l->unlock_level); + percpu_write_unlock(p2m_percpu_rwlock, &l->lock); + } + + static inline void _mm_read_lock(mm_rwlock_t *l, int level) + { +- __check_lock_level(level); ++ _check_lock_level(level); + percpu_read_lock(p2m_percpu_rwlock, &l->lock); + /* There's nowhere to store the per-CPU unlock level so we can't + * set the lock level. */ +@@ -181,7 +184,7 @@ static inline void mm_unlock(mm_lock_t *l) + if ( l->lock.recurse_cnt == 1 ) + { + l->locker_function = "nobody"; +- __set_lock_level(l->unlock_level); ++ _set_lock_level(l->unlock_level); + } + spin_unlock_recursive(&l->lock); + } +@@ -194,10 +197,10 @@ static inline void mm_enforce_order_unlock(int unlock_level, + BUG_ON(*recurse_count == 0); + if ( (*recurse_count)-- == 1 ) + { +- __set_lock_level(unlock_level); ++ _set_lock_level(unlock_level); + } + } else { +- __set_lock_level(unlock_level); ++ _set_lock_level(unlock_level); + } + } + +@@ -287,21 +290,24 @@ declare_mm_lock(altp2mlist) + + #define MM_LOCK_ORDER_altp2m 40 + declare_mm_rwlock(altp2m); +-#define p2m_lock(p) \ +- do { \ +- if ( p2m_is_altp2m(p) ) \ +- mm_write_lock(altp2m, &(p)->lock); \ +- else \ +- mm_write_lock(p2m, &(p)->lock); \ +- (p)->defer_flush++; \ +- } while (0) +-#define p2m_unlock(p) \ +- do { \ +- if ( --(p)->defer_flush == 0 ) \ +- p2m_unlock_and_tlb_flush(p); \ +- else \ +- mm_write_unlock(&(p)->lock); \ +- } while (0) ++ ++static inline void p2m_lock(struct p2m_domain *p) ++{ ++ if ( p2m_is_altp2m(p) ) ++ mm_write_lock(altp2m, &p->lock); ++ else ++ mm_write_lock(p2m, &p->lock); ++ p->defer_flush++; ++} ++ ++static inline void p2m_unlock(struct p2m_domain *p) ++{ ++ if ( --p->defer_flush == 0 ) ++ p2m_unlock_and_tlb_flush(p); ++ else ++ mm_write_unlock(&p->lock); ++} ++ + #define gfn_lock(p,g,o) p2m_lock(p) + #define gfn_unlock(p,g,o) p2m_unlock(p) + #define p2m_read_lock(p) mm_read_lock(p2m, &(p)->lock) +-- +2.17.2 (Apple Git-113) + Added: head/emulators/xen-kernel/files/0003-x86-mm-locks-apply-a-bias-to-lock-levels-for-control.patch ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ head/emulators/xen-kernel/files/0003-x86-mm-locks-apply-a-bias-to-lock-levels-for-control.patch Tue Mar 12 15:02:35 2019 (r495458) @@ -0,0 +1,319 @@ +From efce89c1df5969486bef82eec05223a4a6522d2d Mon Sep 17 00:00:00 2001 +From: Roger Pau Monne <roger.pau@citrix.com> +Date: Tue, 12 Mar 2019 12:25:21 +0100 +Subject: [PATCH 3/3] x86/mm-locks: apply a bias to lock levels for control + domain +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +paging_log_dirty_op function takes mm locks from a subject domain and +then attempts to perform copy to operations against the caller domain +in order to copy the result of the hypercall into the caller provided +buffer. + +This works fine when the caller is a non-paging domain, but triggers a +lock order panic when the caller is a paging domain due to the fact +that at the point where the copy to operation is performed the subject +domain paging lock is locked, and the copy operation requires +locking the caller p2m lock which has a lower level. + +Fix this limitation by adding a bias to the level of control domain mm +locks, so that the lower control domain mm lock always has a level +greater than the higher unprivileged domain lock level. This allows +locking the subject domain mm locks and then locking the control +domain mm locks, while keeping the same lock ordering and the changes +mostly confined to mm-locks.h. + +Note that so far only this flow (locking a subject domain locks and +then the control domain ones) has been identified, but not all +possible code paths have been inspected. Hence this solution attempts +to be a non-intrusive fix for the problem at hand, without discarding +further changes in the future if other valid code paths are found that +require more complex lock level ordering. + +Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> +Reviewed-by: George Dunlap <george.dunlap@citrix.com> +--- + xen/arch/x86/mm/mm-locks.h | 119 +++++++++++++++++++++++-------------- + xen/arch/x86/mm/p2m-pod.c | 5 +- + 2 files changed, 78 insertions(+), 46 deletions(-) + +diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h +index d3497713e9..d6c073dc5c 100644 +--- a/xen/arch/x86/mm/mm-locks.h ++++ b/xen/arch/x86/mm/mm-locks.h +@@ -50,15 +50,35 @@ static inline int _get_lock_level(void) + return this_cpu(mm_lock_level); + } + ++#define MM_LOCK_ORDER_MAX 64 ++/* ++ * Return the lock level taking the domain bias into account. If the domain is ++ * privileged a bias of MM_LOCK_ORDER_MAX is applied to the lock level, so that ++ * mm locks that belong to a control domain can be acquired after having ++ * acquired mm locks of an unprivileged domain. ++ * ++ * This is required in order to use some hypercalls from a paging domain that ++ * take locks of a subject domain and then attempt to copy data to/from the ++ * caller domain. ++ */ ++static inline int _lock_level(const struct domain *d, int l) ++{ ++ ASSERT(l <= MM_LOCK_ORDER_MAX); ++ ++ return l + (d && is_control_domain(d) ? MM_LOCK_ORDER_MAX : 0); ++} ++ + /* + * If you see this crash, the numbers printed are order levels defined + * in this file. + */ +-static inline void _check_lock_level(int l) ++static inline void _check_lock_level(const struct domain *d, int l) + { +- if ( unlikely(_get_lock_level() > l) ) ++ int lvl = _lock_level(d, l); ++ ++ if ( unlikely(_get_lock_level() > lvl) ) + { +- printk("mm locking order violation: %i > %i\n", _get_lock_level(), l); ++ printk("mm locking order violation: %i > %i\n", _get_lock_level(), lvl); + BUG(); + } + } +@@ -68,10 +88,11 @@ static inline void _set_lock_level(int l) + this_cpu(mm_lock_level) = l; + } + +-static inline void _mm_lock(mm_lock_t *l, const char *func, int level, int rec) ++static inline void _mm_lock(const struct domain *d, mm_lock_t *l, ++ const char *func, int level, int rec) + { + if ( !((mm_locked_by_me(l)) && rec) ) +- _check_lock_level(level); ++ _check_lock_level(d, level); + spin_lock_recursive(&l->lock); + if ( l->lock.recurse_cnt == 1 ) + { +@@ -80,16 +101,17 @@ static inline void _mm_lock(mm_lock_t *l, const char *func, int level, int rec) + } + else if ( (unlikely(!rec)) ) + panic("mm lock already held by %s\n", l->locker_function); +- _set_lock_level(level); ++ _set_lock_level(_lock_level(d, level)); + } + +-static inline void _mm_enforce_order_lock_pre(int level) ++static inline void _mm_enforce_order_lock_pre(const struct domain *d, int level) + { +- _check_lock_level(level); ++ _check_lock_level(d, level); + } + +-static inline void _mm_enforce_order_lock_post(int level, int *unlock_level, +- unsigned short *recurse_count) ++static inline void _mm_enforce_order_lock_post(const struct domain *d, int level, ++ int *unlock_level, ++ unsigned short *recurse_count) + { + if ( recurse_count ) + { +@@ -100,7 +122,7 @@ static inline void _mm_enforce_order_lock_post(int level, int *unlock_level, + } else { + *unlock_level = _get_lock_level(); + } +- _set_lock_level(level); ++ _set_lock_level(_lock_level(d, level)); + } + + +@@ -117,16 +139,17 @@ static inline int mm_write_locked_by_me(mm_rwlock_t *l) + return (l->locker == get_processor_id()); + } + +-static inline void _mm_write_lock(mm_rwlock_t *l, const char *func, int level) ++static inline void _mm_write_lock(const struct domain *d, mm_rwlock_t *l, ++ const char *func, int level) + { + if ( !mm_write_locked_by_me(l) ) + { +- _check_lock_level(level); ++ _check_lock_level(d, level); + percpu_write_lock(p2m_percpu_rwlock, &l->lock); + l->locker = get_processor_id(); + l->locker_function = func; + l->unlock_level = _get_lock_level(); +- _set_lock_level(level); ++ _set_lock_level(_lock_level(d, level)); + } + l->recurse_count++; + } +@@ -141,9 +164,10 @@ static inline void mm_write_unlock(mm_rwlock_t *l) + percpu_write_unlock(p2m_percpu_rwlock, &l->lock); + } + +-static inline void _mm_read_lock(mm_rwlock_t *l, int level) ++static inline void _mm_read_lock(const struct domain *d, mm_rwlock_t *l, ++ int level) + { +- _check_lock_level(level); ++ _check_lock_level(d, level); + percpu_read_lock(p2m_percpu_rwlock, &l->lock); + /* There's nowhere to store the per-CPU unlock level so we can't + * set the lock level. */ +@@ -156,28 +180,32 @@ static inline void mm_read_unlock(mm_rwlock_t *l) + + /* This wrapper uses the line number to express the locking order below */ + #define declare_mm_lock(name) \ +- static inline void mm_lock_##name(mm_lock_t *l, const char *func, int rec)\ +- { _mm_lock(l, func, MM_LOCK_ORDER_##name, rec); } ++ static inline void mm_lock_##name(const struct domain *d, mm_lock_t *l, \ ++ const char *func, int rec) \ ++ { _mm_lock(d, l, func, MM_LOCK_ORDER_##name, rec); } + #define declare_mm_rwlock(name) \ +- static inline void mm_write_lock_##name(mm_rwlock_t *l, const char *func) \ +- { _mm_write_lock(l, func, MM_LOCK_ORDER_##name); } \ +- static inline void mm_read_lock_##name(mm_rwlock_t *l) \ +- { _mm_read_lock(l, MM_LOCK_ORDER_##name); } ++ static inline void mm_write_lock_##name(const struct domain *d, \ ++ mm_rwlock_t *l, const char *func) \ ++ { _mm_write_lock(d, l, func, MM_LOCK_ORDER_##name); } \ ++ static inline void mm_read_lock_##name(const struct domain *d, \ ++ mm_rwlock_t *l) \ ++ { _mm_read_lock(d, l, MM_LOCK_ORDER_##name); } + /* These capture the name of the calling function */ +-#define mm_lock(name, l) mm_lock_##name(l, __func__, 0) +-#define mm_lock_recursive(name, l) mm_lock_##name(l, __func__, 1) +-#define mm_write_lock(name, l) mm_write_lock_##name(l, __func__) +-#define mm_read_lock(name, l) mm_read_lock_##name(l) ++#define mm_lock(name, d, l) mm_lock_##name(d, l, __func__, 0) ++#define mm_lock_recursive(name, d, l) mm_lock_##name(d, l, __func__, 1) ++#define mm_write_lock(name, d, l) mm_write_lock_##name(d, l, __func__) ++#define mm_read_lock(name, d, l) mm_read_lock_##name(d, l) + + /* This wrapper is intended for "external" locks which do not use + * the mm_lock_t types. Such locks inside the mm code are also subject + * to ordering constraints. */ +-#define declare_mm_order_constraint(name) \ +- static inline void mm_enforce_order_lock_pre_##name(void) \ +- { _mm_enforce_order_lock_pre(MM_LOCK_ORDER_##name); } \ +- static inline void mm_enforce_order_lock_post_##name( \ +- int *unlock_level, unsigned short *recurse_count) \ +- { _mm_enforce_order_lock_post(MM_LOCK_ORDER_##name, unlock_level, recurse_count); } \ ++#define declare_mm_order_constraint(name) \ ++ static inline void mm_enforce_order_lock_pre_##name(const struct domain *d) \ ++ { _mm_enforce_order_lock_pre(d, MM_LOCK_ORDER_##name); } \ ++ static inline void mm_enforce_order_lock_post_##name(const struct domain *d,\ ++ int *unlock_level, unsigned short *recurse_count) \ ++ { _mm_enforce_order_lock_post(d, MM_LOCK_ORDER_##name, unlock_level, \ ++ recurse_count); } + + static inline void mm_unlock(mm_lock_t *l) + { +@@ -221,7 +249,7 @@ static inline void mm_enforce_order_unlock(int unlock_level, + + #define MM_LOCK_ORDER_nestedp2m 8 + declare_mm_lock(nestedp2m) +-#define nestedp2m_lock(d) mm_lock(nestedp2m, &(d)->arch.nested_p2m_lock) ++#define nestedp2m_lock(d) mm_lock(nestedp2m, d, &(d)->arch.nested_p2m_lock) + #define nestedp2m_unlock(d) mm_unlock(&(d)->arch.nested_p2m_lock) + + /* P2M lock (per-non-alt-p2m-table) +@@ -260,9 +288,10 @@ declare_mm_rwlock(p2m); + + #define MM_LOCK_ORDER_per_page_sharing 24 + declare_mm_order_constraint(per_page_sharing) +-#define page_sharing_mm_pre_lock() mm_enforce_order_lock_pre_per_page_sharing() ++#define page_sharing_mm_pre_lock() \ ++ mm_enforce_order_lock_pre_per_page_sharing(NULL) + #define page_sharing_mm_post_lock(l, r) \ +- mm_enforce_order_lock_post_per_page_sharing((l), (r)) ++ mm_enforce_order_lock_post_per_page_sharing(NULL, (l), (r)) + #define page_sharing_mm_unlock(l, r) mm_enforce_order_unlock((l), (r)) + + /* Alternate P2M list lock (per-domain) +@@ -275,7 +304,8 @@ declare_mm_order_constraint(per_page_sharing) + + #define MM_LOCK_ORDER_altp2mlist 32 + declare_mm_lock(altp2mlist) +-#define altp2m_list_lock(d) mm_lock(altp2mlist, &(d)->arch.altp2m_list_lock) ++#define altp2m_list_lock(d) mm_lock(altp2mlist, d, \ ++ &(d)->arch.altp2m_list_lock) + #define altp2m_list_unlock(d) mm_unlock(&(d)->arch.altp2m_list_lock) + + /* P2M lock (per-altp2m-table) +@@ -294,9 +324,9 @@ declare_mm_rwlock(altp2m); + static inline void p2m_lock(struct p2m_domain *p) + { + if ( p2m_is_altp2m(p) ) +- mm_write_lock(altp2m, &p->lock); ++ mm_write_lock(altp2m, p->domain, &p->lock); + else +- mm_write_lock(p2m, &p->lock); ++ mm_write_lock(p2m, p->domain, &p->lock); + p->defer_flush++; + } + +@@ -310,7 +340,7 @@ static inline void p2m_unlock(struct p2m_domain *p) + + #define gfn_lock(p,g,o) p2m_lock(p) + #define gfn_unlock(p,g,o) p2m_unlock(p) +-#define p2m_read_lock(p) mm_read_lock(p2m, &(p)->lock) ++#define p2m_read_lock(p) mm_read_lock(p2m, (p)->domain, &(p)->lock) + #define p2m_read_unlock(p) mm_read_unlock(&(p)->lock) + #define p2m_locked_by_me(p) mm_write_locked_by_me(&(p)->lock) + #define gfn_locked_by_me(p,g) p2m_locked_by_me(p) +@@ -322,7 +352,7 @@ static inline void p2m_unlock(struct p2m_domain *p) + + #define MM_LOCK_ORDER_pod 48 + declare_mm_lock(pod) +-#define pod_lock(p) mm_lock(pod, &(p)->pod.lock) ++#define pod_lock(p) mm_lock(pod, (p)->domain, &(p)->pod.lock) + #define pod_unlock(p) mm_unlock(&(p)->pod.lock) + #define pod_locked_by_me(p) mm_locked_by_me(&(p)->pod.lock) + +@@ -335,8 +365,9 @@ declare_mm_lock(pod) + + #define MM_LOCK_ORDER_page_alloc 56 + declare_mm_order_constraint(page_alloc) +-#define page_alloc_mm_pre_lock() mm_enforce_order_lock_pre_page_alloc() +-#define page_alloc_mm_post_lock(l) mm_enforce_order_lock_post_page_alloc(&(l), NULL) ++#define page_alloc_mm_pre_lock(d) mm_enforce_order_lock_pre_page_alloc(d) ++#define page_alloc_mm_post_lock(d, l) \ ++ mm_enforce_order_lock_post_page_alloc(d, &(l), NULL) + #define page_alloc_mm_unlock(l) mm_enforce_order_unlock((l), NULL) + + /* Paging lock (per-domain) +@@ -356,9 +387,9 @@ declare_mm_order_constraint(page_alloc) + + #define MM_LOCK_ORDER_paging 64 + declare_mm_lock(paging) +-#define paging_lock(d) mm_lock(paging, &(d)->arch.paging.lock) ++#define paging_lock(d) mm_lock(paging, d, &(d)->arch.paging.lock) + #define paging_lock_recursive(d) \ +- mm_lock_recursive(paging, &(d)->arch.paging.lock) ++ mm_lock_recursive(paging, d, &(d)->arch.paging.lock) + #define paging_unlock(d) mm_unlock(&(d)->arch.paging.lock) + #define paging_locked_by_me(d) mm_locked_by_me(&(d)->arch.paging.lock) + +diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c +index 631e9aec33..725a2921d9 100644 +--- a/xen/arch/x86/mm/p2m-pod.c ++++ b/xen/arch/x86/mm/p2m-pod.c +@@ -34,9 +34,10 @@ + /* Enforce lock ordering when grabbing the "external" page_alloc lock */ + static inline void lock_page_alloc(struct p2m_domain *p2m) + { +- page_alloc_mm_pre_lock(); ++ page_alloc_mm_pre_lock(p2m->domain); + spin_lock(&(p2m->domain->page_alloc_lock)); +- page_alloc_mm_post_lock(p2m->domain->arch.page_alloc_unlock_level); ++ page_alloc_mm_post_lock(p2m->domain, ++ p2m->domain->arch.page_alloc_unlock_level); + } + + static inline void unlock_page_alloc(struct p2m_domain *p2m) +-- +2.17.2 (Apple Git-113) + Added: head/emulators/xen-kernel/files/xsa284.patch ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ head/emulators/xen-kernel/files/xsa284.patch Tue Mar 12 15:02:35 2019 (r495458) @@ -0,0 +1,31 @@ +From: Jan Beulich <jbeulich@suse.com> +Subject: gnttab: set page refcount for copy-on-grant-transfer + +Commit 5cc77f9098 ("32-on-64: Fix domain address-size clamping, +implement"), which introduced this functionality, took care of clearing +the old page's PGC_allocated, but failed to set the bit (and install the +associated reference) on the newly allocated one. Furthermore the "mfn" +local variable was never updated, and hence the wrong MFN was passed to +guest_physmap_add_page() (and back to the destination domain) in this +case, leading to an IOMMU mapping into an unowned page. + +Ideally the code would use assign_pages(), but the call to +gnttab_prepare_for_transfer() sits in the middle of the actions +mirroring that function. + +This is XSA-284. + +Signed-off-by: Jan Beulich <jbeulich@suse.com> +Acked-by: George Dunlap <george.dunlap@citrix.com> + +--- a/xen/common/grant_table.c ++++ b/xen/common/grant_table.c +@@ -2183,6 +2183,8 @@ gnttab_transfer( + page->count_info &= ~(PGC_count_mask|PGC_allocated); + free_domheap_page(page); + page = new_page; ++ page->count_info = PGC_allocated | 1; ++ mfn = page_to_mfn(page); + } + + spin_lock(&e->page_alloc_lock); Added: head/emulators/xen-kernel/files/xsa287-4.11.patch ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ head/emulators/xen-kernel/files/xsa287-4.11.patch Tue Mar 12 15:02:35 2019 (r495458) @@ -0,0 +1,328 @@ +From 67620c1ccb13f7b58645f48248ba1f408b021fdc Mon Sep 17 00:00:00 2001 +From: George Dunlap <george.dunlap@citrix.com> +Date: Fri, 18 Jan 2019 15:00:34 +0000 +Subject: [PATCH] steal_page: Get rid of bogus struct page states + +The original rules for `struct page` required the following invariants +at all times: + +- refcount > 0 implies owner != NULL +- PGC_allocated implies refcount > 0 + +steal_page, in a misguided attempt to protect against unknown races, +violates both of these rules, thus introducing other races: + +- Temporarily, the count_info has the refcount go to 0 while + PGC_allocated is set + +- It explicitly returns the page PGC_allocated set, but owner == NULL + and page not on the page_list. + +The second one meant that page_get_owner_and_reference() could return +NULL even after having successfully grabbed a reference on the page, +leading the caller to leak the reference (since "couldn't get ref" and +"got ref but no owner" look the same). + +Furthermore, rather than grabbing a page reference to ensure that the +owner doesn't change under its feet, it appears to rely on holding +d->page_alloc lock to prevent this. + +Unfortunately, this is ineffective: page->owner remains non-NULL for +some time after the count has been set to 0; meaning that it would be +entirely possible for the page to be freed and re-allocated to a +different domain between the page_get_owner() check and the count_info +check. + +Modify steal_page to instead follow the appropriate access discipline, +taking the page through series of states similar to being freed and +then re-allocated with MEMF_no_owner: + +- Grab an extra reference to make sure we don't race with anyone else + freeing the page + +- Drop both references and PGC_allocated atomically, so that (if +successful), anyone else trying to grab a reference will fail + +- Attempt to reset Xen's mappings + +- Reset the rest of the state. + +Then, modify the two callers appropriately: + +- Leave count_info alone (it's already been cleared) +- Call free_domheap_page() directly if appropriate +- Call assign_pages() rather than open-coding a partial assign + +With all callers to assign_pages() now passing in pages with the +type_info field clear, tighten the respective assertion there. + +This is XSA-287. + +Signed-off-by: George Dunlap <george.dunlap@citrix.com> +Signed-off-by: Jan Beulich <jbeulich@suse.com> +--- + xen/arch/x86/mm.c | 84 ++++++++++++++++++++++++++++------------ + xen/common/grant_table.c | 20 +++++----- + xen/common/memory.c | 19 +++++---- + xen/common/page_alloc.c | 2 +- + 4 files changed, 83 insertions(+), 42 deletions(-) + +diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c +index 6509035a5c..d8ff58c901 100644 +--- a/xen/arch/x86/mm.c ++++ b/xen/arch/x86/mm.c +@@ -3966,70 +3966,106 @@ int donate_page( + return -EINVAL; + } + ++/* ++ * Steal page will attempt to remove `page` from domain `d`. Upon ++ * return, `page` will be in a state similar to the state of a page ++ * returned from alloc_domheap_page() with MEMF_no_owner set: ++ * - refcount 0 ++ * - type count cleared ++ * - owner NULL ++ * - page caching attributes cleaned up ++ * - removed from the domain's page_list ++ * ++ * If MEMF_no_refcount is not set, the domain's tot_pages will be ++ * adjusted. If this results in the page count falling to 0, ++ * put_domain() will be called. ++ * ++ * The caller should either call free_domheap_page() to free the ++ * page, or assign_pages() to put it back on some domain's page list. ++ */ + int steal_page( + struct domain *d, struct page_info *page, unsigned int memflags) + { + unsigned long x, y; + bool drop_dom_ref = false; +- const struct domain *owner = dom_xen; ++ const struct domain *owner; ++ int rc; + + if ( paging_mode_external(d) ) + return -EOPNOTSUPP; + +- spin_lock(&d->page_alloc_lock); +- +- if ( is_xen_heap_page(page) || ((owner = page_get_owner(page)) != d) ) ++ /* Grab a reference to make sure the page doesn't change under our feet */ ++ rc = -EINVAL; ++ if ( !(owner = page_get_owner_and_reference(page)) ) + goto fail; + ++ if ( owner != d || is_xen_heap_page(page) ) ++ goto fail_put; ++ + /* +- * We require there is just one reference (PGC_allocated). We temporarily +- * drop this reference now so that we can safely swizzle the owner. ++ * We require there are exactly two references -- the one we just ++ * took, and PGC_allocated. We temporarily drop both these ++ * references so that the page becomes effectively non-"live" for *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201903121502.x2CF2ZCM095504>