From nobody Wed Jan 4 23:55:14 2023 X-Original-To: freebsd-virtualization@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4NnRLQ5Xrbz2qlH2 for ; Wed, 4 Jan 2023 23:55:26 +0000 (UTC) (envelope-from crowston@protonmail.com) Received: from mail-40131.protonmail.ch (mail-40131.protonmail.ch [185.70.40.131]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "protonmail.com", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4NnRLP3vVHz3n72 for ; Wed, 4 Jan 2023 23:55:25 +0000 (UTC) (envelope-from crowston@protonmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=protonmail.com header.s=protonmail3 header.b=cZw6eSDX; spf=pass (mx1.freebsd.org: domain of crowston@protonmail.com designates 185.70.40.131 as permitted sender) smtp.mailfrom=crowston@protonmail.com; dmarc=pass (policy=quarantine) header.from=protonmail.com Date: Wed, 04 Jan 2023 23:55:14 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail3; t=1672876522; x=1673135722; bh=AfDr2AwQzXjA9QYxa/0cGQYEejrZIPypnDzTlKWnhpk=; h=Date:To:From:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector; b=cZw6eSDXPcNaOWMCghUKC6/iG0hY6brquwvA2vCKJkZOYViCxiX+KkzRqazzkdbeX hR6pmbRVnObllo8gBA+HmaA5i+rvCWLY7O8/8/t4VsXmHQF8Y+mpPeTh9wyp2kuHV8 mjNgmzHu38KbWd4DXuKdlerXvLhsVF2N0RFiZBorJtyqhQXa6BYXhUhyHimGXDtljS HVQxath41ggxWnBAxGt6qD+0zrLJAwYM8sXzCgX6CWCi+K2lqnW2jEOksuT9U8yXHh GZvajOx3QDeKlVq4X0sY9BdR/qsf9UuLmRVegbtLadKKH4MljSo1kpUw/Ozm3yEbZ/ BpLEXXqr3jtRQ== To: FreeBSD virtualization From: Robert Crowston Subject: Re: Windows 11 22H2 with passed-through PCI devices hangs in vm_handle_rendezvous() at boot Message-ID: In-Reply-To: References: Feedback-ID: 11681536:user:proton List-Id: Discussion List-Archive: https://lists.freebsd.org/archives/freebsd-virtualization List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-virtualization@freebsd.org X-BeenThere: freebsd-virtualization@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spamd-Result: default: False [-3.78 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.79)[-0.785]; DMARC_POLICY_ALLOW(-0.50)[protonmail.com,quarantine]; R_SPF_ALLOW(-0.20)[+ip4:185.70.40.0/24]; R_DKIM_ALLOW(-0.20)[protonmail.com:s=protonmail3]; MIME_GOOD(-0.10)[text/plain]; FREEMAIL_ENVFROM(0.00)[protonmail.com]; MLMMJ_DEST(0.00)[freebsd-virtualization@freebsd.org]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_ZERO(0.00)[0]; ASN(0.00)[asn:62371, ipnet:185.70.40.0/24, country:CH]; FROM_EQ_ENVFROM(0.00)[]; ARC_NA(0.00)[]; DKIM_TRACE(0.00)[protonmail.com:+]; MID_RHS_MATCH_FROM(0.00)[]; FREEMAIL_FROM(0.00)[protonmail.com]; FROM_HAS_DN(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[185.70.40.131:from]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_ALL(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; RWL_MAILSPIKE_POSSIBLE(0.00)[185.70.40.131:from] X-Rspamd-Queue-Id: 4NnRLP3vVHz3n72 X-Spamd-Bar: --- X-ThisMailContainsUnwantedMimeParts: N So it looks like the problem is: - in one thread we call vcpu_set_state_locked() [from a VM_MAP_PPTDEV_MMIO = call from userspace] -- both the new and old states are VCPU_FROZEN -- the threads enters a loop while vcpu->state !=3D VCPU_IDLE -- it gets stuck here forever since nothing will ever change the state to V= CPU_IDLE -- apparently this is to stop two ioctl()s acting on the same vCPU simultan= eously, but I don't see any other ioctl against the vCPU in kgdb. - in all the other threads, we sit in vm_handle_rendezvous() -- these threads are waiting for the rendezvous to complete -- every vCPU has completed the rendezvous except for the one stuck in vcpu= _set_state_locked() I see a lot of commits in -CURRENT since my cut of -STABLE, but nothing tha= t looks too relevant. I'll try against CURRENT next. =C2=A0 =C2=A0 =E2=80=94 RHC. ------- Original Message ------- On Tuesday, January 3rd, 2023 at 23:54, Robert Crowston wrote: > Still investigating this. AMD 1700, FreeBSD 13.1 stable@3dd6497894. VM is= Windows 11 22H2. >=20 > It happens on the setup disk -- at the TianoCore logo, before the "ring" = has finished its first rotation -- so very early in the boot process. It's = eventually happened for every Win 11 install I have made. Removing the pass= through devices and installing Windows, then re-adding the devices, a fresh= install will boot with the passthrough devices a few times, but then shows= the same hang behaviour forever after. Windows Boot Repair also hangs. On = the host, bhyvectl --destroy hangs. gdb cannot stop bhyve and just hangs as= well. None of these hangs show any CPU use. kldunload vmm crashes the host= with a page fault. Only a reboot of the host will kill the guest. >=20 > Setting the guest cpu count to 1, or removing all the passthrough devices= allows Windows 11 to boot. The same behaviour happens for two different US= B controllers I have and two different GPUs. The same bhyve configurations = reliably boot Windows Server 2022 and Windows 10 with passthrough working. >=20 > Debugging in userspace, I can see that Windows 11 does PCI enumeration in= parallel across multiple cores, and sometimes during boot one vCPU writes = a PCI config register at approximately the same time as another vCPU reads = that exact register. The hang seems to be aligned with this synchronized wr= ite/read. Also, I can sometimes boot successfully under gdb when single ste= pping PCI cfg register writes, but it's difficult to be sure because my deb= ugging is probably disturbing the timing. I looked at the bhyve code and I = don't see what here could be racing in user space. In any event, it's a ker= nel-side bug. >=20 > Spinning up the kernel debugger, what I always see is: > 1. 1 bhyve thread in vioapic_mmio_write() -> ... -> vm_handle_rendezvous(= ) -> _sleep() >=20 > 2. 1 bhyve thread in vcpu_lock_one() -> ... -> vcpu_set_state_locked() ->= msleep_spin_sbt() >=20 > 3. All remaining bhyve threads, if any, in vm_run() -> vm_handle_rendezvo= us() -> _sleep(). >=20 >=20 > Example backtrace attached. >=20 > So it looks like we have some kind of a deadlock between vcpu_lock_one() = and vioapci_mmio_write()? Anyone seen anything like it? >=20 > =C2=A0 =C2=A0 =E2=80=94 RHC.