Date: Tue, 16 Jun 2026 19:28:33 +0000 From: William Mckenzie <wmckenzie@rhelitpro.com> To: "freebsd-virtualization@FreeBSD.org" <freebsd-virtualization@FreeBSD.org> Subject: Bhyve live migration, virtio-ballooning, kvm-clock Message-ID: <CH3PR12MB8187F88C7668D06E506DEB5CFEE52@CH3PR12MB8187.namprd12.prod.outlook.com>
index | next in thread | raw e-mail
[-- Attachment #1 --] Hi all, For some time we have been working on getting bhyve live vm-migration working. We have developed, deployed, and validated three feature series against the FreeBSD base system (15.0) and we would like to contribute them upstream. I’m writing to ask whether a member of the virtualization team would be willing to act as champion/mentor for these series through the review process. What we’ve done: 1) bhyve live migration (vmm + libvmmapi + bhyve + bhyvectl; 10 commits, the kernel engine decomposed into four buildable commits). Live migration of a running guest between two hosts: a versioned VM_MIGRATE_* ioctl surface in vmm(4), iterative RAM precopy driven by EPT/NPT dirty-bit harvesting, vCPU/device/timer state transfer reusing the existing vm_snapshot machinery, "bhyve -M send/recv" as the userland mover, and a set of restore-correctness fixes (vCPU allocation order, authoritative RIP, PIT re-arm, vm_restore_time on finalize, TSC/vHPET co-anchoring). The PCI BAR re-registration fix is a standalone commit because it also repairs a pre-existing bug in stock bhyvectl(8) --checkpoint/restore, independent of migration. Validated end-to-end on a two-host physical Intel lab as a transparent live handoff: a running Rocky Linux 9 guest migrates in both directions keeping its boot_id, uptime, processes, AND live network sessions across the cutover, at ~0.4 s idle downtime; 20/20 bidirectional runs with zero failures, and a stress run (4 GB / 24 GB guest under ~2 GB/s memory churn during the migration) stayed correct with downtime scaling as expected with the at-pause dirty set. One read-only ioctl is added to the capsicum allow-list; all state-changing ioctls stay outside the sandbox. 2) bhyve virtio-balloon (usr.sbin/bhyve; 1 commit). A virtio-balloon (type 5) device emulation: inflate/deflate virtqueues with host reclaim via paddr_guest2host() + madvise(MADV_FREE), standard num_pages/actual config space, VIRTIO_BALLOON_F_STATS_VQ guest telemetry, and a per-VM control socket created before cap_enter(). Guest-validated against FreeBSD virtio_balloon(4) on two nodes (inflate/deflate tracked exactly; mid-flight readings prove the values are guest-driven) and a Linux guest for the stats queue. 3) bhyve kvm-clock (vmm; 4 commits, gated behind hw.vmm.kvmclock, default off). A KVM-compatible paravirtual clock: KVM CPUID signature at 0x40000100 (bhyve's own signature leaf untouched), MSR_KVM_SYSTEM_TIME_NEW / MSR_KVM_WALL_CLOCK_NEW on both VMX and SVM paths, publishing standard pvclock structures through vm_gpa_hold_global(). This is the durable fix for Linux guests marking the TSC unstable and degrading to hpet after any snapshot/restore or migration. Validated on hardware: guests register kvm-clock and survive repeated bidirectional migrations with zero TSC-unstable events (the pre-kvmclock baseline reliably degraded on the same hardware). I’ve got a full submission document (design, per-failure bring-up history, complete test matrix, untested-areas inventory, and security analysis) and the git-format-patch series (against releng/15.0, where they are validated). I’ve tested many rounds of live vm-migrations across hosts (AMD using KVM nested virtualization and Intel physical systems) and have finally gotten it to a stable state with 30+ live migrations without packets dropping. I intend to do further testing (specifically with AMD physical boxes). Bhyve is phenomenal. If there is no interest in a champion, I still intend to at least attempt to see the process through (acceptance or not). Happy to provide the documentation/requested info. Thanks for the consideration. William Mckenzie wmckenzie@rhelitpro.com [-- Attachment #2 --] <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=Windows-1252"> </head> <body> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> Hi all, </div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> <br> </div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> For some time we have been working on getting bhyve live vm-migration working. We have developed, deployed, and validated three feature series against the FreeBSD base system (15.0) and we would like to contribute them upstream. I’m writing to ask whether a member of the virtualization team would be willing to act as champion/mentor for these series through the review process.</div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> <br> </div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> What we’ve done: </div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> <br> </div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> 1) bhyve live migration (vmm + libvmmapi + bhyve + bhyvectl; 10 commits, the</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> kernel engine decomposed into four buildable commits).</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> Live migration of a running guest between two hosts: a versioned</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> VM_MIGRATE_* ioctl surface in vmm(4), iterative RAM precopy driven by</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> EPT/NPT dirty-bit harvesting, vCPU/device/timer state transfer reusing the</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> existing vm_snapshot machinery, "bhyve -M send/recv" as the userland</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> mover, and a set of restore-correctness fixes (vCPU allocation order,</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> authoritative RIP, PIT re-arm, vm_restore_time on finalize, TSC/vHPET</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> co-anchoring). The PCI BAR re-registration fix is a standalone commit</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> because it also repairs a pre-existing bug in stock bhyvectl(8)</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> --checkpoint/restore, independent of migration. Validated end-to-end on a two-host</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> physical Intel lab as a transparent live handoff: a running Rocky Linux 9</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> guest migrates in both directions keeping its boot_id, uptime, processes,</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> AND live network sessions across the cutover, at ~0.4 s idle downtime; 20/20</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> bidirectional runs with zero failures, and a stress run (4 GB / 24 GB guest</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> under ~2 GB/s memory churn during the migration) stayed correct with</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> downtime scaling as expected with the at-pause dirty set. One read-only</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> ioctl is added to the capsicum allow-list; all state-changing ioctls stay</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> outside the sandbox.</div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> <br> </div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> 2) bhyve virtio-balloon (usr.sbin/bhyve; 1 commit).</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> A virtio-balloon (type 5) device emulation: inflate/deflate virtqueues</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> with host reclaim via paddr_guest2host() + madvise(MADV_FREE), standard</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> num_pages/actual config space, VIRTIO_BALLOON_F_STATS_VQ guest telemetry,</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> and a per-VM control socket created before cap_enter(). Guest-validated</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> against FreeBSD virtio_balloon(4) on two nodes (inflate/deflate tracked</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> exactly; mid-flight readings prove the values are guest-driven) and a</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> Linux guest for the stats queue.</div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> <br> </div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> 3) bhyve kvm-clock (vmm; 4 commits, gated behind hw.vmm.kvmclock, default</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> off). A KVM-compatible paravirtual clock: KVM CPUID signature at</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> 0x40000100 (bhyve's own signature leaf untouched),</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> MSR_KVM_SYSTEM_TIME_NEW / MSR_KVM_WALL_CLOCK_NEW on both VMX and SVM</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> paths, publishing standard pvclock structures through</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> vm_gpa_hold_global(). This is the durable fix for Linux guests marking</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> the TSC unstable and degrading to hpet after any snapshot/restore or</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> migration. Validated on hardware: guests register kvm-clock and survive</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> repeated bidirectional migrations with zero TSC-unstable events (the</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> pre-kvmclock baseline reliably degraded on the same hardware).</div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> <br> </div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> <br> </div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> I’ve got a full submission document (design, per-failure bring-up history, complete test matrix, untested-areas inventory, and security analysis) and the git-format-patch series (against releng/15.0, where they are validated).</div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> <br> </div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> I’ve tested many rounds of live vm-migrations across hosts (AMD using KVM nested virtualization and Intel physical systems) and have finally gotten it to a stable state with 30+ live migrations without packets dropping. I intend to do further testing (specifically with AMD physical boxes). </div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> <br> </div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> Bhyve is phenomenal. If there is no interest in a champion, I still intend to at least attempt to see the process through (acceptance or not). Happy to provide the documentation/requested info.</div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> <br> </div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> Thanks for the consideration.</div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> <br> </div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> William Mckenzie</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> wmckenzie@rhelitpro.com</div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> <br> </div> </body> </html>home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CH3PR12MB8187F88C7668D06E506DEB5CFEE52>
