From nobody Mon Dec 8 21:11:22 2025 X-Original-To: freebsd-virtualization@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4dQF5p0gzfz6KNxG for ; Mon, 08 Dec 2025 21:11:26 +0000 (UTC) (envelope-from jbwlists@hilltopgroup.com) Received: from mail.nova.hilltopgroup.com (nova.hilltopgroup.com [66.135.8.57]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4dQF5m72PNz44q3 for ; Mon, 08 Dec 2025 21:11:24 +0000 (UTC) (envelope-from jbwlists@hilltopgroup.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=hilltopgroup.com header.s=mail header.b=oxP7qPSu; dmarc=pass (policy=reject) header.from=hilltopgroup.com; spf=pass (mx1.freebsd.org: domain of jbwlists@hilltopgroup.com designates 66.135.8.57 as permitted sender) smtp.mailfrom=jbwlists@hilltopgroup.com Received: from mail.relativity.hilltopgroup.com (unknown [104.185.205.155]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (prime256v1) server-digest SHA256) (No client certificate requested) by mail.nova.hilltopgroup.com (Postfix) with ESMTPS id 8F73E1ADA for ; Mon, 08 Dec 2025 16:11:24 -0500 (EST) Received: from [192.168.8.200] (unknown [104.185.205.155]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: jbwlists@hilltopgroup.com) by mail.relativity.hilltopgroup.com (Postfix) with ESMTPSA id 4741C70190 for ; Mon, 08 Dec 2025 16:11:23 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=hilltopgroup.com; s=mail; t=1765228283; bh=4mebqY9aAcU3neI+q7AyYdZJYpeiYkIc63MK/rzd1Xs=; h=Date:To:From:Subject; b=oxP7qPSu/GCdXN5cnpURXxWJzbOERzYowhcIZG9SrYKdomKTiJt4cO3Ur7xCf9PHy Nd3Rt3RP3PgLD8Fzez4VOjtzQaYX1Py+1uFDbKgCpA7o2RVRT+pAi/wv8GSG3pYfPG all4qG+cR06gWo24AmxDI7L1tTFugmcmSQMkZcHw= Content-Type: multipart/alternative; boundary="------------UB0O0E609QdCfDIGnI7dYCBs" Message-ID: <783f03ff-623e-4ecc-9e37-167fc2f19826@hilltopgroup.com> Date: Mon, 8 Dec 2025 16:11:22 -0500 List-Id: Discussion List-Archive: https://lists.freebsd.org/archives/freebsd-virtualization List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-virtualization@freebsd.org Sender: owner-freebsd-virtualization@FreeBSD.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: freebsd-virtualization@freebsd.org From: Joseph Ward Content-Language: en-US Subject: Host system crash - bhyve pci passthrough X-Spamd-Bar: -- X-Spamd-Result: default: False [-2.69 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; HFILTER_HELO_IP_A(1.00)[mail.nova.hilltopgroup.com]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.99)[-0.990]; DMARC_POLICY_ALLOW(-0.50)[hilltopgroup.com,reject]; HFILTER_HELO_NORES_A_OR_MX(0.30)[mail.nova.hilltopgroup.com]; R_DKIM_ALLOW(-0.20)[hilltopgroup.com:s=mail]; R_SPF_ALLOW(-0.20)[+mx]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; RCPT_COUNT_ONE(0.00)[1]; ASN(0.00)[asn:20473, ipnet:66.135.0.0/19, country:US]; RCVD_VIA_SMTP_AUTH(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; RCVD_TLS_ALL(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; MLMMJ_DEST(0.00)[freebsd-virtualization@freebsd.org]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; ARC_NA(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-virtualization@freebsd.org]; DKIM_TRACE(0.00)[hilltopgroup.com:+] X-Rspamd-Queue-Id: 4dQF5m72PNz44q3 This is a multi-part message in MIME format. --------------UB0O0E609QdCfDIGnI7dYCBs Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit I'm running FreeBSD 14.3-RELEASE-p6 on a Supermicro H11SSL motherboard with an EPYC 7551 32-core CPU. This system runs 2 guests under bhyve, both with pci passthough: 1. A FreeBSD system running off of a physical disk (disk0_name="/dev/ada1", disk0_dev="custom") and 2 LSI HBAs passed through with pci passthrough for large amounts of storage. 2. A Linux system running off a zvol on the host with a Google Coral Edge TPU passed through. VM #2 streams approximately 10MiB/s across the virtual network to VM #1 for storage over NFS. With the default FreeBSD settings, the host will lock within a couple of minutes after VM #2 boots (and sometimes during the Linux boot phase).  A error that appears on the console is: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 23943, size: 28672  (the blkno and size change of course) When this happens, the system is completely unresponsive, and the Linux VM is locked as well.  Sometimes I can bring it back by shutting down VM #1 which usually remains responsive for a while, but eventually it will also freeze. Without PCI passthrough, there is no crash. I've tried many things, but one config element that does seem to delay (for up to several days) the freeze has been setting vfs.zfs.vdev.max_active=600 in /boot/loader.conf. Memory usage remains low before a lockup, a tiny fraction of swap is used, iostat doesn't show unusual volume, and there's plenty of idle CPU. I'd love to be able to identify what's actually happening so that I could either address it via config changes or to file a defect, but I'm unable to find any metrics that are increasing, or any other way to trace the issue. Does anyone either have an idea about what's going on, or know some relevant metrics/traces that would help in IDing the issue? Thanks in advance, Joseph --------------UB0O0E609QdCfDIGnI7dYCBs Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit

I'm running FreeBSD 14.3-RELEASE-p6 on a Supermicro H11SSL motherboard with an EPYC 7551 32-core CPU.

This system runs 2 guests under bhyve, both with pci passthough: 

  1. A FreeBSD system running off of a physical disk (disk0_name="/dev/ada1", disk0_dev="custom") and 2 LSI HBAs passed through with pci passthrough for large amounts of storage.
  2. A Linux system running off a zvol on the host with a Google Coral Edge TPU passed through.

VM #2 streams approximately 10MiB/s across the virtual network to VM #1 for storage over NFS.


With the default FreeBSD settings, the host will lock within a couple of minutes after VM #2 boots (and sometimes during the Linux boot phase).  A error that appears on the console is:  

swap_pager: indefinite wait buffer: bufobj: 0, blkno: 23943, size: 28672  (the blkno and size change of course)

When this happens, the system is completely unresponsive, and the Linux VM is locked as well.  Sometimes I can bring it back by shutting down VM #1 which usually remains responsive for a while, but eventually it will also freeze.

Without PCI passthrough, there is no crash. 


I've tried many things, but one config element that does seem to delay (for up to several days) the freeze has been setting  vfs.zfs.vdev.max_active=600 in /boot/loader.conf.

Memory usage remains low before a lockup, a tiny fraction of swap is used, iostat doesn't show unusual volume, and there's plenty of idle CPU.

I'd love to be able to identify what's actually happening so that I could either address it via config changes or to file a defect, but I'm unable to find any metrics that are increasing, or any other way to trace the issue.

Does anyone either have an idea about what's going on, or know some relevant metrics/traces that would help in IDing the issue?

Thanks in advance,


Joseph

--------------UB0O0E609QdCfDIGnI7dYCBs--