Date: Tue, 16 Jun 2026 16:40:28 -0400 From: William Mckenzie <wmckenzie@rhelitpro.com> To: Matthew Grooms <mgrooms@shrew.net> Cc: virtualization@freebsd.org Subject: Re: Bhyve live migration, virtio-ballooning, kvm-clock Message-ID: <CAKhdOUre%2Ba74a7EYMgpQya8r1t1X=Fdet1-NJU=7oxRnmicDGA@mail.gmail.com> In-Reply-To: <d10083bd-3eb0-4d99-bf91-c8e1486570a6@shrew.net> References: <CH3PR12MB8187F88C7668D06E506DEB5CFEE52@CH3PR12MB8187.namprd12.prod.outlook.com> <d10083bd-3eb0-4d99-bf91-c8e1486570a6@shrew.net>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --] Thank you Matthew for the quick response. I certainly don't want to be a point of contention for the direction the team is already headed in. Ill read over what you sent out and if there's anyway I can contribute (lab hardware, resources, testing), i'd be more than happy to. Much appreciated! On Tue, Jun 16, 2026 at 4:27 PM Matthew Grooms <mgrooms@shrew.net> wrote: > > On 6/16/26 14:28, William Mckenzie wrote: > > Hi all, > > For some time we have been working on getting bhyve live vm-migration > working. We have developed, deployed, and validated three feature series > against the FreeBSD base system (15.0) and we would like to contribute them > upstream. I’m writing to ask whether a member of the virtualization team > would be willing to act as champion/mentor for these series through the > review process. > > What we’ve done: > > 1) bhyve live migration (vmm + libvmmapi + bhyve + bhyvectl; 10 commits, > the > kernel engine decomposed into four buildable commits). > Live migration of a running guest between two hosts: a versioned > VM_MIGRATE_* ioctl surface in vmm(4), iterative RAM precopy driven by > EPT/NPT dirty-bit harvesting, vCPU/device/timer state transfer reusing > the > existing vm_snapshot machinery, "bhyve -M send/recv" as the userland > mover, and a set of restore-correctness fixes (vCPU allocation order, > authoritative RIP, PIT re-arm, vm_restore_time on finalize, TSC/vHPET > co-anchoring). The PCI BAR re-registration fix is a standalone commit > because it also repairs a pre-existing bug in stock bhyvectl(8) > --checkpoint/restore, independent of migration. Validated end-to-end on > a two-host > physical Intel lab as a transparent live handoff: a running Rocky Linux > 9 > guest migrates in both directions keeping its boot_id, uptime, > processes, > AND live network sessions across the cutover, at ~0.4 s idle downtime; > 20/20 > bidirectional runs with zero failures, and a stress run (4 GB / 24 GB > guest > under ~2 GB/s memory churn during the migration) stayed correct with > downtime scaling as expected with the at-pause dirty set. One read-only > ioctl is added to the capsicum allow-list; all state-changing ioctls > stay > outside the sandbox. > > 2) bhyve virtio-balloon (usr.sbin/bhyve; 1 commit). > A virtio-balloon (type 5) device emulation: inflate/deflate virtqueues > with host reclaim via paddr_guest2host() + madvise(MADV_FREE), standard > num_pages/actual config space, VIRTIO_BALLOON_F_STATS_VQ guest > telemetry, > and a per-VM control socket created before cap_enter(). Guest-validated > against FreeBSD virtio_balloon(4) on two nodes (inflate/deflate tracked > exactly; mid-flight readings prove the values are guest-driven) and a > Linux guest for the stats queue. > > 3) bhyve kvm-clock (vmm; 4 commits, gated behind hw.vmm.kvmclock, default > off). A KVM-compatible paravirtual clock: KVM CPUID signature at > 0x40000100 (bhyve's own signature leaf untouched), > MSR_KVM_SYSTEM_TIME_NEW / MSR_KVM_WALL_CLOCK_NEW on both VMX and SVM > paths, publishing standard pvclock structures through > vm_gpa_hold_global(). This is the durable fix for Linux guests marking > the TSC unstable and degrading to hpet after any snapshot/restore or > migration. Validated on hardware: guests register kvm-clock and survive > repeated bidirectional migrations with zero TSC-unstable events (the > pre-kvmclock baseline reliably degraded on the same hardware). > > > I’ve got a full submission document (design, per-failure bring-up history, > complete test matrix, untested-areas inventory, and security analysis) and > the git-format-patch series (against releng/15.0, where they are validated). > > I’ve tested many rounds of live vm-migrations across hosts (AMD using KVM > nested virtualization and Intel physical systems) and have finally gotten > it to a stable state with 30+ live migrations without packets dropping. I > intend to do further testing (specifically with AMD physical boxes). > > Bhyve is phenomenal. If there is no interest in a champion, I still intend > to at least attempt to see the process through (acceptance or not). Happy > to provide the documentation/requested info. > > > Thanks for working on this. Live migration patch sets have been proposed a > few times before. You can find the most recent attempt sitting in reviews > from 2022 ... > > https://reviews.freebsd.org/D34722 > https://reviews.freebsd.org/D34811 > https://reviews.freebsd.org/D34719 > https://reviews.freebsd.org/D34720 > https://reviews.freebsd.org/D34721 > > You should also be able to locate several email threads related to the > topic on the public freebsd mailing list archives. I won't rehash that > here, but there was resistance. The orignal work for that and other bhyve > related projects ( libvdsk w/ qcow2+vmdk support, user mode usb > pass-through, etc ... ) were hosted here ... > > https://github.com/orgs/FreeBSD-UPB/repositories > > You should probably also have a look at this ... > > https://www.freebsd.org/status/report-2025-10-2025-12/bhyve-cpuid/ > > From what I gather from his Zagreb presentation, the feature is being > developed as a foundational layer to import illumos bhyve migration code > with an eye towards feature parity and potential interoperability. > > Good luck! > > -Matthew > [-- Attachment #2 --] <div dir="ltr"><div>Thank you Matthew for the quick response. I certainly don't want to be a point of contention for the direction the team is already headed in. Ill read over what you sent out and if there's anyway I can contribute (lab hardware, resources, testing), i'd be more than happy to.<br><br></div>Much appreciated!</div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Tue, Jun 16, 2026 at 4:27 PM Matthew Grooms <<a href="mailto:mgrooms@shrew.net">mgrooms@shrew.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><u></u> <div> <p><br> </p> <div>On 6/16/26 14:28, William Mckenzie wrote:<br> </div> <blockquote type="cite"> <div dir="ltr" style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> Hi all, </div> <div dir="ltr" style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> <br> </div> <div dir="ltr" style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> For some time we have been working on getting bhyve live vm-migration working. We have developed, deployed, and validated three feature series against the FreeBSD base system (15.0) and we would like to contribute them upstream. I’m writing to ask whether a member of the virtualization team would be willing to act as champion/mentor for these series through the review process.</div> <div dir="ltr" style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> <br> </div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> What we’ve done: </div> <div dir="ltr" style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> <br> </div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> 1) bhyve live migration (vmm + libvmmapi + bhyve + bhyvectl; 10 commits, the</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> kernel engine decomposed into four buildable commits).</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> Live migration of a running guest between two hosts: a versioned</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> VM_MIGRATE_* ioctl surface in vmm(4), iterative RAM precopy driven by</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> EPT/NPT dirty-bit harvesting, vCPU/device/timer state transfer reusing the</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> existing vm_snapshot machinery, "bhyve -M send/recv" as the userland</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> mover, and a set of restore-correctness fixes (vCPU allocation order,</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> authoritative RIP, PIT re-arm, vm_restore_time on finalize, TSC/vHPET</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> co-anchoring). The PCI BAR re-registration fix is a standalone commit</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> because it also repairs a pre-existing bug in stock bhyvectl(8)</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> --checkpoint/restore, independent of migration. Validated end-to-end on a two-host</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> physical Intel lab as a transparent live handoff: a running Rocky Linux 9</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> guest migrates in both directions keeping its boot_id, uptime, processes,</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> AND live network sessions across the cutover, at ~0.4 s idle downtime; 20/20</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> bidirectional runs with zero failures, and a stress run (4 GB / 24 GB guest</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> under ~2 GB/s memory churn during the migration) stayed correct with</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> downtime scaling as expected with the at-pause dirty set. One read-only</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> ioctl is added to the capsicum allow-list; all state-changing ioctls stay</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> outside the sandbox.</div> <div dir="ltr" style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> <br> </div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> 2) bhyve virtio-balloon (usr.sbin/bhyve; 1 commit).</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> A virtio-balloon (type 5) device emulation: inflate/deflate virtqueues</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> with host reclaim via paddr_guest2host() + madvise(MADV_FREE), standard</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> num_pages/actual config space, VIRTIO_BALLOON_F_STATS_VQ guest telemetry,</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> and a per-VM control socket created before cap_enter(). Guest-validated</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> against FreeBSD virtio_balloon(4) on two nodes (inflate/deflate tracked</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> exactly; mid-flight readings prove the values are guest-driven) and a</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> Linux guest for the stats queue.</div> <div dir="ltr" style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> <br> </div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> 3) bhyve kvm-clock (vmm; 4 commits, gated behind hw.vmm.kvmclock, default</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> off). A KVM-compatible paravirtual clock: KVM CPUID signature at</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> 0x40000100 (bhyve's own signature leaf untouched),</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> MSR_KVM_SYSTEM_TIME_NEW / MSR_KVM_WALL_CLOCK_NEW on both VMX and SVM</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> paths, publishing standard pvclock structures through</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> vm_gpa_hold_global(). This is the durable fix for Linux guests marking</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> the TSC unstable and degrading to hpet after any snapshot/restore or</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> migration. Validated on hardware: guests register kvm-clock and survive</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> repeated bidirectional migrations with zero TSC-unstable events (the</div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> pre-kvmclock baseline reliably degraded on the same hardware).</div> <div dir="ltr" style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> <br> </div> <div dir="ltr" style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> <br> </div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> I’ve got a full submission document (design, per-failure bring-up history, complete test matrix, untested-areas inventory, and security analysis) and the git-format-patch series (against releng/15.0, where they are validated).</div> <div dir="ltr" style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> <br> </div> <div dir="ltr" style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> I’ve tested many rounds of live vm-migrations across hosts (AMD using KVM nested virtualization and Intel physical systems) and have finally gotten it to a stable state with 30+ live migrations without packets dropping. I intend to do further testing (specifically with AMD physical boxes). </div> <div dir="ltr" style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> <br> </div> <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"> Bhyve is phenomenal. If there is no interest in a champion, I still intend to at least attempt to see the process through (acceptance or not). Happy to provide the documentation/requested info.</div> <div dir="ltr" style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"><br> </div> </blockquote> <p><br> </p> <p>Thanks for working on this. Live migration patch sets have been proposed a few times before. You can find the most recent attempt sitting in reviews from 2022 ...<br> <br> <a href="https://reviews.freebsd.org/D34722" target="_blank">https://reviews.freebsd.org/D34722</a><br> <a href="https://reviews.freebsd.org/D34811" target="_blank">https://reviews.freebsd.org/D34811</a><br> <a href="https://reviews.freebsd.org/D34719" target="_blank">https://reviews.freebsd.org/D34719</a><br> <a href="https://reviews.freebsd.org/D34720" target="_blank">https://reviews.freebsd.org/D34720</a><br> <a href="https://reviews.freebsd.org/D34721" target="_blank">https://reviews.freebsd.org/D34721</a><br> <br> You should also be able to locate several email threads related to the topic on the public freebsd mailing list archives. I won't rehash that here, but there was resistance. The orignal work for that and other bhyve related projects ( libvdsk w/ qcow2+vmdk support, user mode usb pass-through, etc ... ) were hosted here ...<br> <br> <a href="https://github.com/orgs/FreeBSD-UPB/repositories" target="_blank">https://github.com/orgs/FreeBSD-UPB/repositories</a></p> <p>You should probably also have a look at this ...<br> <br> <a href="https://www.freebsd.org/status/report-2025-10-2025-12/bhyve-cpuid/" target="_blank">https://www.freebsd.org/status/report-2025-10-2025-12/bhyve-cpuid/</a><br> <br> From what I gather from his Zagreb presentation, the feature is being developed as a foundational layer to import illumos bhyve migration code with an eye towards feature parity and potential interoperability.<br> <br> Good luck!<br> <br> -Matthew</p> </div> </blockquote></div>home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAKhdOUre%2Ba74a7EYMgpQya8r1t1X=Fdet1-NJU=7oxRnmicDGA>
