Date: Tue, 16 Jun 2026 15:26:42 -0500 From: Matthew Grooms <mgrooms@shrew.net> To: virtualization@freebsd.org Subject: Re: Bhyve live migration, virtio-ballooning, kvm-clock Message-ID: <d10083bd-3eb0-4d99-bf91-c8e1486570a6@shrew.net> In-Reply-To: <CH3PR12MB8187F88C7668D06E506DEB5CFEE52@CH3PR12MB8187.namprd12.prod.outlook.com>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --] On 6/16/26 14:28, William Mckenzie wrote: > Hi all, > > For some time we have been working on getting bhyve live vm-migration > working. We have developed, deployed, and validated three feature > series against the FreeBSD base system (15.0) and we would like to > contribute them upstream. I’m writing to ask whether a member of the > virtualization team would be willing to act as champion/mentor for > these series through the review process. > > What we’ve done: > > 1) bhyve live migration (vmm + libvmmapi + bhyve + bhyvectl; 10 > commits, the > kernel engine decomposed into four buildable commits). > Live migration of a running guest between two hosts: a versioned > VM_MIGRATE_* ioctl surface in vmm(4), iterative RAM precopy driven by > EPT/NPT dirty-bit harvesting, vCPU/device/timer state transfer > reusing the > existing vm_snapshot machinery, "bhyve -M send/recv" as the userland > mover, and a set of restore-correctness fixes (vCPU allocation order, > authoritative RIP, PIT re-arm, vm_restore_time on finalize, TSC/vHPET > co-anchoring). The PCI BAR re-registration fix is a standalone commit > because it also repairs a pre-existing bug in stock bhyvectl(8) > --checkpoint/restore, independent of migration. Validated > end-to-end on a two-host > physical Intel lab as a transparent live handoff: a running Rocky > Linux 9 > guest migrates in both directions keeping its boot_id, uptime, > processes, > AND live network sessions across the cutover, at ~0.4 s idle > downtime; 20/20 > bidirectional runs with zero failures, and a stress run (4 GB / 24 > GB guest > under ~2 GB/s memory churn during the migration) stayed correct with > downtime scaling as expected with the at-pause dirty set. One read-only > ioctl is added to the capsicum allow-list; all state-changing > ioctls stay > outside the sandbox. > > 2) bhyve virtio-balloon (usr.sbin/bhyve; 1 commit). > A virtio-balloon (type 5) device emulation: inflate/deflate virtqueues > with host reclaim via paddr_guest2host() + madvise(MADV_FREE), standard > num_pages/actual config space, VIRTIO_BALLOON_F_STATS_VQ guest > telemetry, > and a per-VM control socket created before cap_enter(). Guest-validated > against FreeBSD virtio_balloon(4) on two nodes (inflate/deflate tracked > exactly; mid-flight readings prove the values are guest-driven) and a > Linux guest for the stats queue. > > 3) bhyve kvm-clock (vmm; 4 commits, gated behind hw.vmm.kvmclock, default > off). A KVM-compatible paravirtual clock: KVM CPUID signature at > 0x40000100 (bhyve's own signature leaf untouched), > MSR_KVM_SYSTEM_TIME_NEW / MSR_KVM_WALL_CLOCK_NEW on both VMX and SVM > paths, publishing standard pvclock structures through > vm_gpa_hold_global(). This is the durable fix for Linux guests marking > the TSC unstable and degrading to hpet after any snapshot/restore or > migration. Validated on hardware: guests register kvm-clock and survive > repeated bidirectional migrations with zero TSC-unstable events (the > pre-kvmclock baseline reliably degraded on the same hardware). > > > I’ve got a full submission document (design, per-failure bring-up > history, complete test matrix, untested-areas inventory, and security > analysis) and the git-format-patch series (against releng/15.0, where > they are validated). > > I’ve tested many rounds of live vm-migrations across hosts (AMD using > KVM nested virtualization and Intel physical systems) and have finally > gotten it to a stable state with 30+ live migrations without packets > dropping. I intend to do further testing (specifically with AMD > physical boxes). > > Bhyve is phenomenal. If there is no interest in a champion, I still > intend to at least attempt to see the process through (acceptance or > not). Happy to provide the documentation/requested info. > Thanks for working on this. Live migration patch sets have been proposed a few times before. You can find the most recent attempt sitting in reviews from 2022 ... https://reviews.freebsd.org/D34722 https://reviews.freebsd.org/D34811 https://reviews.freebsd.org/D34719 https://reviews.freebsd.org/D34720 https://reviews.freebsd.org/D34721 You should also be able to locate several email threads related to the topic on the public freebsd mailing list archives. I won't rehash that here, but there was resistance. The orignal work for that and other bhyve related projects ( libvdsk w/ qcow2+vmdk support, user mode usb pass-through, etc ... ) were hosted here ... https://github.com/orgs/FreeBSD-UPB/repositories You should probably also have a look at this ... https://www.freebsd.org/status/report-2025-10-2025-12/bhyve-cpuid/ From what I gather from his Zagreb presentation, the feature is being developed as a foundational layer to import illumos bhyve migration code with an eye towards feature parity and potential interoperability. Good luck! -Matthew [-- Attachment #2 --] <!DOCTYPE html> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> </head> <body> <p><br> </p> <div class="moz-cite-prefix">On 6/16/26 14:28, William Mckenzie wrote:<br> </div> <blockquote type="cite" cite="mid:CH3PR12MB8187F88C7668D06E506DEB5CFEE52@CH3PR12MB8187.namprd12.prod.outlook.com"> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> Hi all, </div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> <br> </div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> For some time we have been working on getting bhyve live vm-migration working. We have developed, deployed, and validated three feature series against the FreeBSD base system (15.0) and we would like to contribute them upstream. I’m writing to ask whether a member of the virtualization team would be willing to act as champion/mentor for these series through the review process.</div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> <br> </div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> What we’ve done: </div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> <br> </div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> 1) bhyve live migration (vmm + libvmmapi + bhyve + bhyvectl; 10 commits, the</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> kernel engine decomposed into four buildable commits).</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> Live migration of a running guest between two hosts: a versioned</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> VM_MIGRATE_* ioctl surface in vmm(4), iterative RAM precopy driven by</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> EPT/NPT dirty-bit harvesting, vCPU/device/timer state transfer reusing the</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> existing vm_snapshot machinery, "bhyve -M send/recv" as the userland</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> mover, and a set of restore-correctness fixes (vCPU allocation order,</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> authoritative RIP, PIT re-arm, vm_restore_time on finalize, TSC/vHPET</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> co-anchoring). The PCI BAR re-registration fix is a standalone commit</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> because it also repairs a pre-existing bug in stock bhyvectl(8)</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> --checkpoint/restore, independent of migration. Validated end-to-end on a two-host</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> physical Intel lab as a transparent live handoff: a running Rocky Linux 9</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> guest migrates in both directions keeping its boot_id, uptime, processes,</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> AND live network sessions across the cutover, at ~0.4 s idle downtime; 20/20</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> bidirectional runs with zero failures, and a stress run (4 GB / 24 GB guest</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> under ~2 GB/s memory churn during the migration) stayed correct with</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> downtime scaling as expected with the at-pause dirty set. One read-only</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> ioctl is added to the capsicum allow-list; all state-changing ioctls stay</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> outside the sandbox.</div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> <br> </div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> 2) bhyve virtio-balloon (usr.sbin/bhyve; 1 commit).</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> A virtio-balloon (type 5) device emulation: inflate/deflate virtqueues</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> with host reclaim via paddr_guest2host() + madvise(MADV_FREE), standard</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> num_pages/actual config space, VIRTIO_BALLOON_F_STATS_VQ guest telemetry,</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> and a per-VM control socket created before cap_enter(). Guest-validated</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> against FreeBSD virtio_balloon(4) on two nodes (inflate/deflate tracked</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> exactly; mid-flight readings prove the values are guest-driven) and a</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> Linux guest for the stats queue.</div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> <br> </div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> 3) bhyve kvm-clock (vmm; 4 commits, gated behind hw.vmm.kvmclock, default</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> off). A KVM-compatible paravirtual clock: KVM CPUID signature at</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> 0x40000100 (bhyve's own signature leaf untouched),</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> MSR_KVM_SYSTEM_TIME_NEW / MSR_KVM_WALL_CLOCK_NEW on both VMX and SVM</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> paths, publishing standard pvclock structures through</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> vm_gpa_hold_global(). This is the durable fix for Linux guests marking</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> the TSC unstable and degrading to hpet after any snapshot/restore or</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> migration. Validated on hardware: guests register kvm-clock and survive</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> repeated bidirectional migrations with zero TSC-unstable events (the</div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> pre-kvmclock baseline reliably degraded on the same hardware).</div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> <br> </div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> <br> </div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> I’ve got a full submission document (design, per-failure bring-up history, complete test matrix, untested-areas inventory, and security analysis) and the git-format-patch series (against releng/15.0, where they are validated).</div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> <br> </div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> I’ve tested many rounds of live vm-migrations across hosts (AMD using KVM nested virtualization and Intel physical systems) and have finally gotten it to a stable state with 30+ live migrations without packets dropping. I intend to do further testing (specifically with AMD physical boxes). </div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> <br> </div> <div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> Bhyve is phenomenal. If there is no interest in a champion, I still intend to at least attempt to see the process through (acceptance or not). Happy to provide the documentation/requested info.</div> <div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"><br> </div> </blockquote> <p><br> </p> <p>Thanks for working on this. Live migration patch sets have been proposed a few times before. You can find the most recent attempt sitting in reviews from 2022 ...<br> <br> <a class="moz-txt-link-freetext" href="https://reviews.freebsd.org/D34722">https://reviews.freebsd.org/D34722</a><br> <a class="moz-txt-link-freetext" href="https://reviews.freebsd.org/D34811">https://reviews.freebsd.org/D34811</a><br> <a class="moz-txt-link-freetext" href="https://reviews.freebsd.org/D34719">https://reviews.freebsd.org/D34719</a><br> <a class="moz-txt-link-freetext" href="https://reviews.freebsd.org/D34720">https://reviews.freebsd.org/D34720</a><br> <a class="moz-txt-link-freetext" href="https://reviews.freebsd.org/D34721">https://reviews.freebsd.org/D34721</a><br> <br> You should also be able to locate several email threads related to the topic on the public freebsd mailing list archives. I won't rehash that here, but there was resistance. The orignal work for that and other bhyve related projects ( libvdsk w/ qcow2+vmdk support, user mode usb pass-through, etc ... ) were hosted here ...<br> <br> <a class="moz-txt-link-freetext" href="https://github.com/orgs/FreeBSD-UPB/repositories">https://github.com/orgs/FreeBSD-UPB/repositories</a></p> <p>You should probably also have a look at this ...<br> <br> <a class="moz-txt-link-freetext" href="https://www.freebsd.org/status/report-2025-10-2025-12/bhyve-cpuid/">https://www.freebsd.org/status/report-2025-10-2025-12/bhyve-cpuid/</a><br> <br> From what I gather from his Zagreb presentation, the feature is being developed as a foundational layer to import illumos bhyve migration code with an eye towards feature parity and potential interoperability.<br> <br> Good luck!<br> <br> -Matthew</p> </body> </html>home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?d10083bd-3eb0-4d99-bf91-c8e1486570a6>
