From nobody Tue Dec 2 14:45:59 2025 X-Original-To: dev-commits-src-branches@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4dLNqq75Z9z6KXvC for ; Tue, 02 Dec 2025 14:45:59 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R12" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4dLNqq6W6rz42Yq for ; Tue, 02 Dec 2025 14:45:59 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1764686759; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=oxoaiwkeotGMKIiqHXyKZT4RVAt0VGNvT7rpnluqFiE=; b=KBOjgaZ7GrBmL28mbEKpjkFirtSOv22nbNjfZwDFPksKxc8V8yXvP2hc+tDVWCJl+lp7cC dGjcEp4LUHkltigBBzLkVWGN6Je61AgQ2KnN+0REP3ACxaRIptIlNa4HgOzGc3l00B6zKU k35ezNsQ/Z3B6Xx8YyMYWx7WucXUqzDQD2otfq8THSME49iWP3kL0ldgDHkUw1Dc/jTu7X ygrktOpclHPLbeSWIeLpY5t28ju/Kc441qGtpXPSx0Uz9/wrtOn+R9F1+bboogXGA6TVq4 NYYTxuhNO6RAN6vW5s4HLoSALgyJxl2hUXm/ckDF4usf73PAUFnnOAldlyJ+2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1764686759; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=oxoaiwkeotGMKIiqHXyKZT4RVAt0VGNvT7rpnluqFiE=; b=ULGtkZovhvk64hS2QZ5yK2fSh33GX0WLcdHhUxPoJUkyVpI1kLviDc5dm3VgxU4+GnD8qy GMWiNX9gdhGC1MieUWp8KJWCUYuM6UX/Wq2fQ01awcoLNfDzVFGA+cp1Gj7kY25m5B9qLT 4Tle/03je1yrnP+degBhoYpN7rBWNSFio8w3nf0NWfdll0nwzOhUaXkpdLh6KOIXmy7UZL TLc84m3KdRPW3qYKUmUKcWfutKkax1S+SKvOQav5F4RMm0fD0XQEZ0wCUF+tqO5nCY5KtD vUUp3B6TxO4XIjnJgX3mFGUaGDCgLHq8mCIaxZXN8fl90gZnhkXj5QHPEAVf6Q== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1764686759; a=rsa-sha256; cv=none; b=eT1ZS6eCpsu6Vcu0bmW7a20PdZrqSRpJADvNpKfQwF38GHL+5ZQHgXvKknbVw0qs/zFS2v FQye7X8DK0Ce795J54Y/vBmnIxO7G9nj6mmWZRcm0zKTa5xPHOLBSSbgZx6+7sh6S3kyD4 dg26BcnsfqcLwrBXxicHy/32jXKOBG2Nz5IQuUE8Z6SM0QxvCRbEuScj+7EsYCnH4dUZUK iyY/AmU4QfMgZlFg7Wv4T6dmoJj1TopjwLIuU+vChKBlkG0N/Y0xAz3qC1utmAbX2ZCdhF F1IKfG5t+FHoD74F5qZ6m2CY60fKEvpe5NnstQXxWv3e8G3HKanV4vGcnPfkTQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) by mxrelay.nyi.freebsd.org (Postfix) with ESMTP id 4dLNqq63D8z19hb for ; Tue, 02 Dec 2025 14:45:59 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from git (uid 1279) (envelope-from git@FreeBSD.org) id 2f02f by gitrepo.freebsd.org (DragonFly Mail Agent v0.13+ on gitrepo.freebsd.org); Tue, 02 Dec 2025 14:45:59 +0000 To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org From: Mark Johnston Subject: git: 1630af436a61 - stable/14 - vmm: Fix a deadlock between vm_smp_rendezvous() and vcpu_lock_all() List-Id: Commits to the stable branches of the FreeBSD src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-branches List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: dev-commits-src-branches@freebsd.org Sender: owner-dev-commits-src-branches@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: markj X-Git-Repository: src X-Git-Refname: refs/heads/stable/14 X-Git-Reftype: branch X-Git-Commit: 1630af436a6145e90f7b1aac1a91fbbfe9b22f57 Auto-Submitted: auto-generated Date: Tue, 02 Dec 2025 14:45:59 +0000 Message-Id: <692efba7.2f02f.33327be2@gitrepo.freebsd.org> The branch stable/14 has been updated by markj: URL: https://cgit.FreeBSD.org/src/commit/?id=1630af436a6145e90f7b1aac1a91fbbfe9b22f57 commit 1630af436a6145e90f7b1aac1a91fbbfe9b22f57 Author: Mark Johnston AuthorDate: 2025-10-17 13:09:38 +0000 Commit: Mark Johnston CommitDate: 2025-12-02 14:44:36 +0000 vmm: Fix a deadlock between vm_smp_rendezvous() and vcpu_lock_all() vm_smp_rendezvous() invokes a callback on all vCPUs, blocking the initiator until all vCPUs have responded. vcpu_lock_all() blocks each vCPU by waiting for it to go idle and setting the vCPU state to frozen. These two operations can deadlock on each other, particularly when booting a Windows guest, when vcpu_lock_all() blocks waiting for a rendezvous initiator, and the initiator is blocked waiting for the vCPU thread which called vcpu_lock_all() to invoke the rendezvous callback. Implement vcpu_lock_all() in a way that avoids deadlocks with vm_smp_rendezvous(). In particular, when traversing vCPUs, invoke the rendezvous callback on the vCPU's behalf to help the initiator finish. We can only safely do so when the vCPU is IDLE or we have already locked it, otherwise we may be racing with the target vCPU thread. Thus: - Use an exclusive lock to serialize vcpu_lock_all() callers, which lets us lock vCPUs out of order without fear of deadlock with parallel vcpu_lock_all() callers. - If a rendezvous is pending, lock all idle vCPUs and invoke the callback on their behalf. If the vcpu_lock_all() caller is itself a vCPU thread, this will handle that thread. - Block waiting for all non-idle vCPUs to idle, or until one of them initiates a rendezvous, in which case we go back and invoke callbacks on behalf of already-locked vCPUs. Note that on !amd64 no changes are needed since there is no rendezvous mechanism, so there is a separate vcpu_set_state_all() for them based on the previous vcpu_lock_all(). These will be merged together once vcpu state handling is consolidated into sys/dev/vmm. Reviewed by: corvink (previous version) MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D52968 (cherry picked from commit f39768e52e513264da60add0ca2412bddda271ff) --- sys/amd64/include/vmm.h | 3 +- sys/amd64/vmm/vmm.c | 177 ++++++++++++++++++++++++++++++++++++++++-------- sys/amd64/vmm/vmm_dev.c | 30 ++------ 3 files changed, 157 insertions(+), 53 deletions(-) diff --git a/sys/amd64/include/vmm.h b/sys/amd64/include/vmm.h index 368c3a6cb95b..09f2ea7cb0e5 100644 --- a/sys/amd64/include/vmm.h +++ b/sys/amd64/include/vmm.h @@ -225,7 +225,7 @@ extern u_int vm_maxcpu; /* maximum virtual cpus */ int vm_create(const char *name, struct vm **retvm); struct vcpu *vm_alloc_vcpu(struct vm *vm, int vcpuid); void vm_disable_vcpu_creation(struct vm *vm); -void vm_slock_vcpus(struct vm *vm); +void vm_lock_vcpus(struct vm *vm); void vm_unlock_vcpus(struct vm *vm); void vm_destroy(struct vm *vm); int vm_reinit(struct vm *vm); @@ -378,6 +378,7 @@ enum vcpu_state { }; int vcpu_set_state(struct vcpu *vcpu, enum vcpu_state state, bool from_idle); +int vcpu_set_state_all(struct vm *vm, enum vcpu_state state); enum vcpu_state vcpu_get_state(struct vcpu *vcpu, int *hostcpu); static int __inline diff --git a/sys/amd64/vmm/vmm.c b/sys/amd64/vmm/vmm.c index 677dcd05b9b2..787599bcfc97 100644 --- a/sys/amd64/vmm/vmm.c +++ b/sys/amd64/vmm/vmm.c @@ -557,9 +557,9 @@ vm_alloc_vcpu(struct vm *vm, int vcpuid) } void -vm_slock_vcpus(struct vm *vm) +vm_lock_vcpus(struct vm *vm) { - sx_slock(&vm->vcpus_init_lock); + sx_xlock(&vm->vcpus_init_lock); } void @@ -1342,6 +1342,54 @@ save_guest_fpustate(struct vcpu *vcpu) static VMM_STAT(VCPU_IDLE_TICKS, "number of ticks vcpu was idle"); +/* + * Invoke the rendezvous function on the specified vcpu if applicable. Return + * true if the rendezvous is finished, false otherwise. + */ +static bool +vm_rendezvous(struct vcpu *vcpu) +{ + struct vm *vm = vcpu->vm; + int vcpuid; + + mtx_assert(&vcpu->vm->rendezvous_mtx, MA_OWNED); + KASSERT(vcpu->vm->rendezvous_func != NULL, + ("vm_rendezvous: no rendezvous pending")); + + /* 'rendezvous_req_cpus' must be a subset of 'active_cpus' */ + CPU_AND(&vm->rendezvous_req_cpus, &vm->rendezvous_req_cpus, + &vm->active_cpus); + + vcpuid = vcpu->vcpuid; + if (CPU_ISSET(vcpuid, &vm->rendezvous_req_cpus) && + !CPU_ISSET(vcpuid, &vm->rendezvous_done_cpus)) { + VMM_CTR0(vcpu, "Calling rendezvous func"); + (*vm->rendezvous_func)(vcpu, vm->rendezvous_arg); + CPU_SET(vcpuid, &vm->rendezvous_done_cpus); + } + if (CPU_CMP(&vm->rendezvous_req_cpus, + &vm->rendezvous_done_cpus) == 0) { + VMM_CTR0(vcpu, "Rendezvous completed"); + CPU_ZERO(&vm->rendezvous_req_cpus); + vm->rendezvous_func = NULL; + wakeup(&vm->rendezvous_func); + return (true); + } + return (false); +} + +static void +vcpu_wait_idle(struct vcpu *vcpu) +{ + KASSERT(vcpu->state != VCPU_IDLE, ("vcpu already idle")); + + vcpu->reqidle = 1; + vcpu_notify_event_locked(vcpu, false); + VMM_CTR1(vcpu, "vcpu state change from %s to " + "idle requested", vcpu_state2str(vcpu->state)); + msleep_spin(&vcpu->state, &vcpu->mtx, "vmstat", hz); +} + static int vcpu_set_state_locked(struct vcpu *vcpu, enum vcpu_state newstate, bool from_idle) @@ -1356,13 +1404,8 @@ vcpu_set_state_locked(struct vcpu *vcpu, enum vcpu_state newstate, * ioctl() operating on a vcpu at any point. */ if (from_idle) { - while (vcpu->state != VCPU_IDLE) { - vcpu->reqidle = 1; - vcpu_notify_event_locked(vcpu, false); - VMM_CTR1(vcpu, "vcpu state change from %s to " - "idle requested", vcpu_state2str(vcpu->state)); - msleep_spin(&vcpu->state, &vcpu->mtx, "vmstat", hz); - } + while (vcpu->state != VCPU_IDLE) + vcpu_wait_idle(vcpu); } else { KASSERT(vcpu->state != VCPU_IDLE, ("invalid transition from " "vcpu idle state")); @@ -1414,6 +1457,95 @@ vcpu_set_state_locked(struct vcpu *vcpu, enum vcpu_state newstate, return (0); } +/* + * Try to lock all of the vCPUs in the VM while taking care to avoid deadlocks + * with vm_smp_rendezvous(). + * + * The complexity here suggests that the rendezvous mechanism needs a rethink. + */ +int +vcpu_set_state_all(struct vm *vm, enum vcpu_state newstate) +{ + cpuset_t locked; + struct vcpu *vcpu; + int error, i; + uint16_t maxcpus; + + KASSERT(newstate != VCPU_IDLE, + ("vcpu_set_state_all: invalid target state %d", newstate)); + + error = 0; + CPU_ZERO(&locked); + maxcpus = vm->maxcpus; + + mtx_lock(&vm->rendezvous_mtx); +restart: + if (vm->rendezvous_func != NULL) { + /* + * If we have a pending rendezvous, then the initiator may be + * blocked waiting for other vCPUs to execute the callback. The + * current thread may be a vCPU thread so we must not block + * waiting for the initiator, otherwise we get a deadlock. + * Thus, execute the callback on behalf of any idle vCPUs. + */ + for (i = 0; i < maxcpus; i++) { + vcpu = vm_vcpu(vm, i); + if (vcpu == NULL) + continue; + vcpu_lock(vcpu); + if (vcpu->state == VCPU_IDLE) { + (void)vcpu_set_state_locked(vcpu, VCPU_FROZEN, + true); + CPU_SET(i, &locked); + } + if (CPU_ISSET(i, &locked)) { + /* + * We can safely execute the callback on this + * vCPU's behalf. + */ + vcpu_unlock(vcpu); + (void)vm_rendezvous(vcpu); + vcpu_lock(vcpu); + } + vcpu_unlock(vcpu); + } + } + + /* + * Now wait for remaining vCPUs to become idle. This may include the + * initiator of a rendezvous that is currently blocked on the rendezvous + * mutex. + */ + CPU_FOREACH_ISCLR(i, &locked) { + if (i >= maxcpus) + break; + vcpu = vm_vcpu(vm, i); + if (vcpu == NULL) + continue; + vcpu_lock(vcpu); + while (vcpu->state != VCPU_IDLE) { + mtx_unlock(&vm->rendezvous_mtx); + vcpu_wait_idle(vcpu); + vcpu_unlock(vcpu); + mtx_lock(&vm->rendezvous_mtx); + if (vm->rendezvous_func != NULL) + goto restart; + vcpu_lock(vcpu); + } + error = vcpu_set_state_locked(vcpu, newstate, true); + vcpu_unlock(vcpu); + if (error != 0) { + /* Roll back state changes. */ + CPU_FOREACH_ISSET(i, &locked) + (void)vcpu_set_state(vcpu, VCPU_IDLE, false); + break; + } + CPU_SET(i, &locked); + } + mtx_unlock(&vm->rendezvous_mtx); + return (error); +} + static void vcpu_require_state(struct vcpu *vcpu, enum vcpu_state newstate) { @@ -1435,36 +1567,23 @@ vcpu_require_state_locked(struct vcpu *vcpu, enum vcpu_state newstate) static int vm_handle_rendezvous(struct vcpu *vcpu) { - struct vm *vm = vcpu->vm; + struct vm *vm; struct thread *td; - int error, vcpuid; - error = 0; - vcpuid = vcpu->vcpuid; td = curthread; + vm = vcpu->vm; + mtx_lock(&vm->rendezvous_mtx); while (vm->rendezvous_func != NULL) { - /* 'rendezvous_req_cpus' must be a subset of 'active_cpus' */ - CPU_AND(&vm->rendezvous_req_cpus, &vm->rendezvous_req_cpus, &vm->active_cpus); - - if (CPU_ISSET(vcpuid, &vm->rendezvous_req_cpus) && - !CPU_ISSET(vcpuid, &vm->rendezvous_done_cpus)) { - VMM_CTR0(vcpu, "Calling rendezvous func"); - (*vm->rendezvous_func)(vcpu, vm->rendezvous_arg); - CPU_SET(vcpuid, &vm->rendezvous_done_cpus); - } - if (CPU_CMP(&vm->rendezvous_req_cpus, - &vm->rendezvous_done_cpus) == 0) { - VMM_CTR0(vcpu, "Rendezvous completed"); - CPU_ZERO(&vm->rendezvous_req_cpus); - vm->rendezvous_func = NULL; - wakeup(&vm->rendezvous_func); + if (vm_rendezvous(vcpu)) break; - } + VMM_CTR0(vcpu, "Wait for rendezvous completion"); mtx_sleep(&vm->rendezvous_func, &vm->rendezvous_mtx, 0, "vmrndv", hz); if (td_ast_pending(td, TDA_SUSPEND)) { + int error; + mtx_unlock(&vm->rendezvous_mtx); error = thread_check_susp(td, true); if (error != 0) diff --git a/sys/amd64/vmm/vmm_dev.c b/sys/amd64/vmm/vmm_dev.c index 5214cd3f1447..032b7471664e 100644 --- a/sys/amd64/vmm/vmm_dev.c +++ b/sys/amd64/vmm/vmm_dev.c @@ -179,32 +179,16 @@ vcpu_unlock_one(struct vmmdev_softc *sc, int vcpuid, struct vcpu *vcpu) static int vcpu_lock_all(struct vmmdev_softc *sc) { - struct vcpu *vcpu; int error; - uint16_t i, j, maxcpus; - - error = 0; - vm_slock_vcpus(sc->vm); - maxcpus = vm_get_maxcpus(sc->vm); - for (i = 0; i < maxcpus; i++) { - vcpu = vm_vcpu(sc->vm, i); - if (vcpu == NULL) - continue; - error = vcpu_lock_one(vcpu); - if (error) - break; - } - if (error) { - for (j = 0; j < i; j++) { - vcpu = vm_vcpu(sc->vm, j); - if (vcpu == NULL) - continue; - vcpu_unlock_one(sc, j, vcpu); - } + /* + * Serialize vcpu_lock_all() callers. Individual vCPUs are not locked + * in a consistent order so we need to serialize to avoid deadlocks. + */ + vm_lock_vcpus(sc->vm); + error = vcpu_set_state_all(sc->vm, VCPU_FROZEN); + if (error != 0) vm_unlock_vcpus(sc->vm); - } - return (error); }