Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 06 Jan 2023 23:56:44 +0000
From:      bugzilla-noreply@freebsd.org
To:        virtualization@FreeBSD.org
Subject:   [Bug 268794] Simultaneous vcpu_lock_all() and vm_handle_rendezvous() can deadlock vmm
Message-ID:  <bug-268794-27103@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D268794

            Bug ID: 268794
           Summary: Simultaneous vcpu_lock_all() and
                    vm_handle_rendezvous() can deadlock vmm
           Product: Base System
           Version: 13.1-STABLE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: bhyve
          Assignee: virtualization@FreeBSD.org
          Reporter: crowston@protonmail.com

Guest is Windows 11 22H2. This only happens with a PCI device passed-throug=
h,
only very early into the boot, and only if there's more than one vCPU. It d=
oes
not happen reliably, maybe 90% of boots. It happens even on the installer
image.

I am running on an AMD Ryzen 1700.

This does not happen with Windows 10 nor Windows Server 2022, which suggest=
s to
me a recent change to the NT kernel might have exposed it.

Action:
1. Windows writes to the APIC on vCPU x.
1a. That vCPU exits, and its state toggles to VCPU_FROZEN.
1b. That vCPU goes into vm_handle_inst_emul() -> emulate_mov() ->
vioapic_mmio_write() -> vioapic_write() -> vm_smp_handle_rendezvous().
1c. vm_handle_rendezvous() waits for all vCPU threads to handle the rendezv=
ous.
2. Simultaneously, from userland's pci_passthru.c, either vm_map_pptdev_mmi=
o()
or vm_unmap_pptdev_mmio() is called.
2a. vmmdev_ioctl() invokes vcpu_lock_all().
2b. vcpu_lock_all() iterates through the vCPUs, calling vcpu_lock_one() on =
each
vCPU, eventually reaching vCPU x (the APIC one).
2c. vCPU x is already in the VCPU_FROZEN state, from (1a).
vcpu_set_state_locked() hangs waiting for it to transition to the VCPU_IDLE
sate.
3. All the other vCPUs eventually end up either in vm_handle_rendezvous() o=
r in
vcpu_set_state_locked(), and hang there.

It's not clear to me what the fix should be. Should we check and run the
rendezvous func while waiting for the VCPU_IDLE transition in
vcpu_set_state_locked()? That will presumably require a strong contract on =
the
kind of rendezvous functions that can be invoked.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-268794-27103>