Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 04 Jan 2023 23:55:14 +0000
From:      Robert Crowston <crowston@protonmail.com>
To:        FreeBSD virtualization <freebsd-virtualization@freebsd.org>
Subject:   Re: Windows 11 22H2 with passed-through PCI devices hangs in vm_handle_rendezvous() at boot
Message-ID:  <B8eLSlRPg8ZYv6n1ftZj5sfGAZncMxnntAsFPGGlGpRvtwTkAgxLoAzd00tWO5orrV-PLOojhuO1O9aVU9gd-GU4R8zvYvgqTZqlY-N6CWc=@protonmail.com>
In-Reply-To: <bGTSISO-1rwQV3zYi9lQObHjGtGEePZ8AydjMeAHQkTlO9PGAFmDy3cinDsUQ7-njq4W76r6pmBDgy64WBhSDMlS9gbwfD_pKpr7S4EOcVI=@protonmail.com>
References:  <bGTSISO-1rwQV3zYi9lQObHjGtGEePZ8AydjMeAHQkTlO9PGAFmDy3cinDsUQ7-njq4W76r6pmBDgy64WBhSDMlS9gbwfD_pKpr7S4EOcVI=@protonmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
So it looks like the problem is:

- in one thread we call vcpu_set_state_locked() [from a VM_MAP_PPTDEV_MMIO =
call from userspace]
-- both the new and old states are VCPU_FROZEN
-- the threads enters a loop while vcpu->state !=3D VCPU_IDLE
-- it gets stuck here forever since nothing will ever change the state to V=
CPU_IDLE
-- apparently this is to stop two ioctl()s acting on the same vCPU simultan=
eously, but I don't see any other ioctl against the vCPU in kgdb.

- in all the other threads, we sit in vm_handle_rendezvous()
-- these threads are waiting for the rendezvous to complete
-- every vCPU has completed the rendezvous except for the one stuck in vcpu=
_set_state_locked()

I see a lot of commits in -CURRENT since my cut of -STABLE, but nothing tha=
t looks too relevant. I'll try against CURRENT next.

=C2=A0 =C2=A0 =E2=80=94 RHC.


------- Original Message -------
On Tuesday, January 3rd, 2023 at 23:54, Robert Crowston <crowston@protonmai=
l.com> wrote:


> Still investigating this. AMD 1700, FreeBSD 13.1 stable@3dd6497894. VM is=
 Windows 11 22H2.
>=20
> It happens on the setup disk -- at the TianoCore logo, before the "ring" =
has finished its first rotation -- so very early in the boot process. It's =
eventually happened for every Win 11 install I have made. Removing the pass=
through devices and installing Windows, then re-adding the devices, a fresh=
 install will boot with the passthrough devices a few times, but then shows=
 the same hang behaviour forever after. Windows Boot Repair also hangs. On =
the host, bhyvectl --destroy hangs. gdb cannot stop bhyve and just hangs as=
 well. None of these hangs show any CPU use. kldunload vmm crashes the host=
 with a page fault. Only a reboot of the host will kill the guest.
>=20
> Setting the guest cpu count to 1, or removing all the passthrough devices=
 allows Windows 11 to boot. The same behaviour happens for two different US=
B controllers I have and two different GPUs. The same bhyve configurations =
reliably boot Windows Server 2022 and Windows 10 with passthrough working.
>=20
> Debugging in userspace, I can see that Windows 11 does PCI enumeration in=
 parallel across multiple cores, and sometimes during boot one vCPU writes =
a PCI config register at approximately the same time as another vCPU reads =
that exact register. The hang seems to be aligned with this synchronized wr=
ite/read. Also, I can sometimes boot successfully under gdb when single ste=
pping PCI cfg register writes, but it's difficult to be sure because my deb=
ugging is probably disturbing the timing. I looked at the bhyve code and I =
don't see what here could be racing in user space. In any event, it's a ker=
nel-side bug.
>=20
> Spinning up the kernel debugger, what I always see is:
> 1. 1 bhyve thread in vioapic_mmio_write() -> ... -> vm_handle_rendezvous(=
) -> _sleep()
>=20
> 2. 1 bhyve thread in vcpu_lock_one() -> ... -> vcpu_set_state_locked() ->=
 msleep_spin_sbt()
>=20
> 3. All remaining bhyve threads, if any, in vm_run() -> vm_handle_rendezvo=
us() -> _sleep().
>=20
>=20
> Example backtrace attached.
>=20
> So it looks like we have some kind of a deadlock between vcpu_lock_one() =
and vioapci_mmio_write()? Anyone seen anything like it?
>=20
> =C2=A0 =C2=A0 =E2=80=94 RHC.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B8eLSlRPg8ZYv6n1ftZj5sfGAZncMxnntAsFPGGlGpRvtwTkAgxLoAzd00tWO5orrV-PLOojhuO1O9aVU9gd-GU4R8zvYvgqTZqlY-N6CWc=>