From nobody Tue Jun 16 20:26:42 2026 X-Original-To: virtualization@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4gfz6f3MS0z6hlZl for ; Tue, 16 Jun 2026 20:26:50 +0000 (UTC) (envelope-from mgrooms@shrew.net) Received: from mx2.shrew.net (mx2.shrew.net [204.27.62.58]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4gfz6c3LcJz3gv9 for ; Tue, 16 Jun 2026 20:26:48 +0000 (UTC) (envelope-from mgrooms@shrew.net) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=shrew.net header.s=default header.b=wp3YWYGP; dmarc=none; spf=pass (mx1.freebsd.org: domain of mgrooms@shrew.net designates 204.27.62.58 as permitted sender) smtp.mailfrom=mgrooms@shrew.net Received: from mail.shrew.net (mail1.shrew.prv [10.26.2.18]) by mx2.shrew.net (8.18.1/8.18.1) with ESMTP id 65GKQgpX073158 for ; Tue, 16 Jun 2026 15:26:42 -0500 (CDT) (envelope-from mgrooms@shrew.net) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shrew.net; s=default; t=1781641602; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=9kWkTmnXoAUoY9uVFZ/fdONs/bvy8tnxwIu7ahRYkNU=; b=wp3YWYGPXQsqsNhXpxJ9JXC4qNW57csbduWtUYqMERawPfrvrWsgfNBGEFvaxkieT77q/K kWMiFBfdFZ8IsQBa0aoL67llCJXy/KETH8tJG6eC7l1/2Y9poJnznbpSiQumaXR//9DJ+P VI+gfRhID/rmqO+WwYm5RPWaa34bq5VDMrzenZ/+qe7b9vgAkaIvnwdbAjfTX6aaMiq/qv Ew13bniI/2U6i4b+KN5ftc+qli2q5bhstMFaBvhzda3qkUgwKRRXcHwrcX+7/f3MgQ737U /BJ6ishbPz7vkZ7AXHzBOfaL7ZuHU7rxZWJ4arCC5IsrNgW5KQSXpaCpdsxKAA== Received: from [10.22.200.32] (unknown [136.60.75.165]) by mail.shrew.net (Postfix) with ESMTPSA id CD0AA3B715 for ; Tue, 16 Jun 2026 15:26:42 -0500 (CDT) Content-Type: multipart/alternative; boundary="------------6p08OhymQA68uCWgAhnqyFir" Message-ID: Date: Tue, 16 Jun 2026 15:26:42 -0500 List-Id: Discussion List-Archive: https://lists.freebsd.org/archives/freebsd-virtualization List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-virtualization@freebsd.org Sender: owner-freebsd-virtualization@FreeBSD.org List-Id: List-Post: List-Help: List-Subscribe: List-Unsubscribe: List-Owner: Precedence: list MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Bhyve live migration, virtio-ballooning, kvm-clock To: virtualization@freebsd.org References: Content-Language: en-US From: Matthew Grooms In-Reply-To: X-Spamd-Result: default: False [-3.50 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.996]; R_SPF_ALLOW(-0.20)[+mx]; R_DKIM_ALLOW(-0.20)[shrew.net:s=default]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; DKIM_TRACE(0.00)[shrew.net:+]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCVD_TLS_LAST(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; DMARC_NA(0.00)[shrew.net]; ASN(0.00)[asn:19969, ipnet:204.27.56.0/21, country:US]; FROM_HAS_DN(0.00)[]; ARC_NA(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_DN_NONE(0.00)[]; MLMMJ_DEST(0.00)[virtualization@freebsd.org]; PREVIOUSLY_DELIVERED(0.00)[virtualization@freebsd.org]; MID_RHS_MATCH_FROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~] X-Spamd-Bar: --- X-Rspamd-Queue-Id: 4gfz6c3LcJz3gv9 This is a multi-part message in MIME format. --------------6p08OhymQA68uCWgAhnqyFir Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 6/16/26 14:28, William Mckenzie wrote: > Hi all, > > For some time we have been working on getting bhyve live vm-migration > working. We have developed, deployed, and validated three feature > series against the FreeBSD base system (15.0) and we would like to > contribute them upstream. I’m writing to ask whether a member of the > virtualization team would be willing to act as champion/mentor for > these series through the review process. > > What we’ve done: > > 1) bhyve live migration (vmm + libvmmapi + bhyve + bhyvectl; 10 > commits, the >    kernel engine decomposed into four buildable commits). >    Live migration of a running guest between two hosts: a versioned >    VM_MIGRATE_* ioctl surface in vmm(4), iterative RAM precopy driven by >    EPT/NPT dirty-bit harvesting, vCPU/device/timer state transfer > reusing the >    existing vm_snapshot machinery, "bhyve -M send/recv" as the userland >    mover, and a set of restore-correctness fixes (vCPU allocation order, >    authoritative RIP, PIT re-arm, vm_restore_time on finalize, TSC/vHPET >    co-anchoring). The PCI BAR re-registration fix is a standalone commit >    because it also repairs a pre-existing bug in stock bhyvectl(8) >    --checkpoint/restore, independent of migration. Validated > end-to-end on a two-host >    physical Intel lab as a transparent live handoff: a running Rocky > Linux 9 >    guest migrates in both directions keeping its boot_id, uptime, > processes, >    AND live network sessions across the cutover, at ~0.4 s idle > downtime; 20/20 >    bidirectional runs with zero failures, and a stress run (4 GB / 24 > GB guest >    under ~2 GB/s memory churn during the migration) stayed correct with >    downtime scaling as expected with the at-pause dirty set. One read-only >    ioctl is added to the capsicum allow-list; all state-changing > ioctls stay >    outside the sandbox. > > 2) bhyve virtio-balloon (usr.sbin/bhyve; 1 commit). >    A virtio-balloon (type 5) device emulation: inflate/deflate virtqueues >    with host reclaim via paddr_guest2host() + madvise(MADV_FREE), standard >    num_pages/actual config space, VIRTIO_BALLOON_F_STATS_VQ guest > telemetry, >    and a per-VM control socket created before cap_enter(). Guest-validated >    against FreeBSD virtio_balloon(4) on two nodes (inflate/deflate tracked >    exactly; mid-flight readings prove the values are guest-driven) and a >    Linux guest for the stats queue. > > 3) bhyve kvm-clock (vmm; 4 commits, gated behind hw.vmm.kvmclock, default >    off). A KVM-compatible paravirtual clock: KVM CPUID signature at >    0x40000100 (bhyve's own signature leaf untouched), >    MSR_KVM_SYSTEM_TIME_NEW / MSR_KVM_WALL_CLOCK_NEW on both VMX and SVM >    paths, publishing standard pvclock structures through >    vm_gpa_hold_global(). This is the durable fix for Linux guests marking >    the TSC unstable and degrading to hpet after any snapshot/restore or >    migration. Validated on hardware: guests register kvm-clock and survive >    repeated bidirectional migrations with zero TSC-unstable events (the >    pre-kvmclock baseline reliably degraded on the same hardware). > > > I’ve got a full submission document (design, per-failure bring-up > history, complete test matrix, untested-areas inventory, and security > analysis) and the git-format-patch series (against releng/15.0, where > they are validated). > > I’ve tested many rounds of live vm-migrations across hosts (AMD using > KVM nested virtualization and Intel physical systems) and have finally > gotten it to a stable state with 30+ live migrations without packets > dropping.  I intend to do further testing (specifically with AMD > physical boxes). > > Bhyve is phenomenal. If there is no interest in a champion, I still > intend to at least attempt to see the process through (acceptance or > not). Happy to provide the documentation/requested info. > Thanks for working on this. Live migration patch sets have been proposed a few times before. You can find the most recent attempt sitting in reviews from 2022 ... https://reviews.freebsd.org/D34722 https://reviews.freebsd.org/D34811 https://reviews.freebsd.org/D34719 https://reviews.freebsd.org/D34720 https://reviews.freebsd.org/D34721 You should also be able to locate several email threads related to the topic on the public freebsd mailing list archives. I won't rehash that here, but there was resistance. The orignal work for that and other bhyve related projects ( libvdsk w/ qcow2+vmdk support, user mode usb pass-through, etc ... ) were hosted here ... https://github.com/orgs/FreeBSD-UPB/repositories You should probably also have a look at this ... https://www.freebsd.org/status/report-2025-10-2025-12/bhyve-cpuid/ From what I gather from his Zagreb presentation, the feature is being developed as a foundational layer to import illumos bhyve migration code with an eye towards feature parity and potential interoperability. Good luck! -Matthew --------------6p08OhymQA68uCWgAhnqyFir Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit


On 6/16/26 14:28, William Mckenzie wrote:
Hi all, 

For some time we have been working on getting bhyve live vm-migration working. We have developed, deployed, and validated three feature series against the FreeBSD base system (15.0) and we would like to contribute them upstream. I’m writing to ask whether a member of the virtualization team would be willing to act as champion/mentor for these series through the review process.

What we’ve done: 

1) bhyve live migration (vmm + libvmmapi + bhyve + bhyvectl; 10 commits, the
   kernel engine decomposed into four buildable commits).
   Live migration of a running guest between two hosts: a versioned
   VM_MIGRATE_* ioctl surface in vmm(4), iterative RAM precopy driven by
   EPT/NPT dirty-bit harvesting, vCPU/device/timer state transfer reusing the
   existing vm_snapshot machinery, "bhyve -M send/recv" as the userland
   mover, and a set of restore-correctness fixes (vCPU allocation order,
   authoritative RIP, PIT re-arm, vm_restore_time on finalize, TSC/vHPET
   co-anchoring). The PCI BAR re-registration fix is a standalone commit
   because it also repairs a pre-existing bug in stock bhyvectl(8)
   --checkpoint/restore, independent of migration. Validated end-to-end on a two-host
   physical Intel lab as a transparent live handoff: a running Rocky Linux 9
   guest migrates in both directions keeping its boot_id, uptime, processes,
   AND live network sessions across the cutover, at ~0.4 s idle downtime; 20/20
   bidirectional runs with zero failures, and a stress run (4 GB / 24 GB guest
   under ~2 GB/s memory churn during the migration) stayed correct with
   downtime scaling as expected with the at-pause dirty set. One read-only
   ioctl is added to the capsicum allow-list; all state-changing ioctls stay
   outside the sandbox.

2) bhyve virtio-balloon (usr.sbin/bhyve; 1 commit).
   A virtio-balloon (type 5) device emulation: inflate/deflate virtqueues
   with host reclaim via paddr_guest2host() + madvise(MADV_FREE), standard
   num_pages/actual config space, VIRTIO_BALLOON_F_STATS_VQ guest telemetry,
   and a per-VM control socket created before cap_enter(). Guest-validated
   against FreeBSD virtio_balloon(4) on two nodes (inflate/deflate tracked
   exactly; mid-flight readings prove the values are guest-driven) and a
   Linux guest for the stats queue.

3) bhyve kvm-clock (vmm; 4 commits, gated behind hw.vmm.kvmclock, default
   off). A KVM-compatible paravirtual clock: KVM CPUID signature at
   0x40000100 (bhyve's own signature leaf untouched),
   MSR_KVM_SYSTEM_TIME_NEW / MSR_KVM_WALL_CLOCK_NEW on both VMX and SVM
   paths, publishing standard pvclock structures through
   vm_gpa_hold_global(). This is the durable fix for Linux guests marking
   the TSC unstable and degrading to hpet after any snapshot/restore or
   migration. Validated on hardware: guests register kvm-clock and survive
   repeated bidirectional migrations with zero TSC-unstable events (the
   pre-kvmclock baseline reliably degraded on the same hardware).


I’ve got a full submission document (design, per-failure bring-up history, complete test matrix, untested-areas inventory, and security analysis) and the git-format-patch series (against releng/15.0, where they are validated).

I’ve tested many rounds of live vm-migrations across hosts (AMD using KVM nested virtualization and Intel physical systems) and have finally gotten it to a stable state with 30+ live migrations without packets dropping.  I intend to do further testing (specifically with AMD physical boxes).  

Bhyve is phenomenal. If there is no interest in a champion, I still intend to at least attempt to see the process through (acceptance or not). Happy to provide the documentation/requested info.


Thanks for working on this. Live migration patch sets have been proposed a few times before. You can find the most recent attempt sitting in reviews from 2022 ...

https://reviews.freebsd.org/D34722
https://reviews.freebsd.org/D34811
https://reviews.freebsd.org/D34719
https://reviews.freebsd.org/D34720
https://reviews.freebsd.org/D34721

You should also be able to locate several email threads related to the topic on the public freebsd mailing list archives. I won't rehash that here, but there was resistance. The orignal work for that and other bhyve related projects ( libvdsk w/ qcow2+vmdk support, user mode usb pass-through, etc ... ) were hosted here ...

https://github.com/orgs/FreeBSD-UPB/repositories

You should probably also have a look at this ...

https://www.freebsd.org/status/report-2025-10-2025-12/bhyve-cpuid/

From what I gather from his Zagreb presentation, the feature is being developed as a foundational layer to import illumos bhyve migration code with an eye towards feature parity and potential interoperability.

Good luck!

-Matthew

--------------6p08OhymQA68uCWgAhnqyFir--