Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 16 Jun 2026 19:28:33 +0000
From:      William Mckenzie <wmckenzie@rhelitpro.com>
To:        "freebsd-virtualization@FreeBSD.org" <freebsd-virtualization@FreeBSD.org>
Subject:   Bhyve live migration, virtio-ballooning, kvm-clock
Message-ID:  <CH3PR12MB8187F88C7668D06E506DEB5CFEE52@CH3PR12MB8187.namprd12.prod.outlook.com>

index | next in thread | raw e-mail

[-- Attachment #1 --]
Hi all,

For some time we have been working on getting bhyve live vm-migration working. We have developed, deployed, and validated three feature series against the FreeBSD base system (15.0) and we would like to contribute them upstream. I’m writing to ask whether a member of the virtualization team would be willing to act as champion/mentor for these series through the review process.

What we’ve done:

1) bhyve live migration (vmm + libvmmapi + bhyve + bhyvectl; 10 commits, the
   kernel engine decomposed into four buildable commits).
   Live migration of a running guest between two hosts: a versioned
   VM_MIGRATE_* ioctl surface in vmm(4), iterative RAM precopy driven by
   EPT/NPT dirty-bit harvesting, vCPU/device/timer state transfer reusing the
   existing vm_snapshot machinery, "bhyve -M send/recv" as the userland
   mover, and a set of restore-correctness fixes (vCPU allocation order,
   authoritative RIP, PIT re-arm, vm_restore_time on finalize, TSC/vHPET
   co-anchoring). The PCI BAR re-registration fix is a standalone commit
   because it also repairs a pre-existing bug in stock bhyvectl(8)
   --checkpoint/restore, independent of migration. Validated end-to-end on a two-host
   physical Intel lab as a transparent live handoff: a running Rocky Linux 9
   guest migrates in both directions keeping its boot_id, uptime, processes,
   AND live network sessions across the cutover, at ~0.4 s idle downtime; 20/20
   bidirectional runs with zero failures, and a stress run (4 GB / 24 GB guest
   under ~2 GB/s memory churn during the migration) stayed correct with
   downtime scaling as expected with the at-pause dirty set. One read-only
   ioctl is added to the capsicum allow-list; all state-changing ioctls stay
   outside the sandbox.

2) bhyve virtio-balloon (usr.sbin/bhyve; 1 commit).
   A virtio-balloon (type 5) device emulation: inflate/deflate virtqueues
   with host reclaim via paddr_guest2host() + madvise(MADV_FREE), standard
   num_pages/actual config space, VIRTIO_BALLOON_F_STATS_VQ guest telemetry,
   and a per-VM control socket created before cap_enter(). Guest-validated
   against FreeBSD virtio_balloon(4) on two nodes (inflate/deflate tracked
   exactly; mid-flight readings prove the values are guest-driven) and a
   Linux guest for the stats queue.

3) bhyve kvm-clock (vmm; 4 commits, gated behind hw.vmm.kvmclock, default
   off). A KVM-compatible paravirtual clock: KVM CPUID signature at
   0x40000100 (bhyve's own signature leaf untouched),
   MSR_KVM_SYSTEM_TIME_NEW / MSR_KVM_WALL_CLOCK_NEW on both VMX and SVM
   paths, publishing standard pvclock structures through
   vm_gpa_hold_global(). This is the durable fix for Linux guests marking
   the TSC unstable and degrading to hpet after any snapshot/restore or
   migration. Validated on hardware: guests register kvm-clock and survive
   repeated bidirectional migrations with zero TSC-unstable events (the
   pre-kvmclock baseline reliably degraded on the same hardware).


I’ve got a full submission document (design, per-failure bring-up history, complete test matrix, untested-areas inventory, and security analysis) and the git-format-patch series (against releng/15.0, where they are validated).

I’ve tested many rounds of live vm-migrations across hosts (AMD using KVM nested virtualization and Intel physical systems) and have finally gotten it to a stable state with 30+ live migrations without packets dropping.  I intend to do further testing (specifically with AMD physical boxes).

Bhyve is phenomenal. If there is no interest in a champion, I still intend to at least attempt to see the process through (acceptance or not). Happy to provide the documentation/requested info.

Thanks for the consideration.

William Mckenzie
wmckenzie@rhelitpro.com


[-- Attachment #2 --]
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body>
<div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Hi all,&nbsp;</div>
<div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
For some time we have been working on getting bhyve live vm-migration working. We have developed, deployed, and validated three feature series against the FreeBSD base system (15.0) and we would like to contribute them upstream. I’m writing to ask whether a
 member of the virtualization team would be willing to act as champion/mentor for these series through the review process.</div>
<div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
What we’ve done:&nbsp;</div>
<div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
1) bhyve live migration (vmm + libvmmapi + bhyve + bhyvectl; 10 commits, the</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;kernel engine decomposed into four buildable commits).</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;Live migration of a running guest between two hosts: a versioned</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;VM_MIGRATE_* ioctl surface in vmm(4), iterative RAM precopy driven by</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;EPT/NPT dirty-bit harvesting, vCPU/device/timer state transfer reusing the</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;existing vm_snapshot machinery, &quot;bhyve -M send/recv&quot; as the userland</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;mover, and a set of restore-correctness fixes (vCPU allocation order,</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;authoritative RIP, PIT re-arm, vm_restore_time on finalize, TSC/vHPET</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;co-anchoring). The PCI BAR re-registration fix is a standalone commit</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;because it also repairs a pre-existing bug in stock bhyvectl(8)</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;--checkpoint/restore, independent of migration. Validated end-to-end on a two-host</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;physical Intel lab as a transparent live handoff: a running Rocky Linux 9</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;guest migrates in both directions keeping its boot_id, uptime, processes,</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;AND live network sessions across the cutover, at ~0.4 s idle downtime; 20/20</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;bidirectional runs with zero failures, and a stress run (4 GB / 24 GB guest</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;under ~2 GB/s memory churn during the migration) stayed correct with</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;downtime scaling as expected with the at-pause dirty set. One read-only</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;ioctl is added to the capsicum allow-list; all state-changing ioctls stay</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;outside the sandbox.</div>
<div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
2) bhyve virtio-balloon (usr.sbin/bhyve; 1 commit).</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;A virtio-balloon (type 5) device emulation: inflate/deflate virtqueues</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;with host reclaim via paddr_guest2host() + madvise(MADV_FREE), standard</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;num_pages/actual config space, VIRTIO_BALLOON_F_STATS_VQ guest telemetry,</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;and a per-VM control socket created before cap_enter(). Guest-validated</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;against FreeBSD virtio_balloon(4) on two nodes (inflate/deflate tracked</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;exactly; mid-flight readings prove the values are guest-driven) and a</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;Linux guest for the stats queue.</div>
<div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
3) bhyve kvm-clock (vmm; 4 commits, gated behind hw.vmm.kvmclock, default</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;off). A KVM-compatible paravirtual clock: KVM CPUID signature at</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;0x40000100 (bhyve's own signature leaf untouched),</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;MSR_KVM_SYSTEM_TIME_NEW / MSR_KVM_WALL_CLOCK_NEW on both VMX and SVM</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;paths, publishing standard pvclock structures through</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;vm_gpa_hold_global(). This is the durable fix for Linux guests marking</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;the TSC unstable and degrading to hpet after any snapshot/restore or</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;migration. Validated on hardware: guests register kvm-clock and survive</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;repeated bidirectional migrations with zero TSC-unstable events (the</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;pre-kvmclock baseline reliably degraded on the same hardware).</div>
<div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
I’ve got a full submission document (design, per-failure bring-up history, complete test matrix, untested-areas inventory, and security analysis) and the git-format-patch series (against releng/15.0, where they are validated).</div>
<div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
I’ve tested many rounds of live vm-migrations across hosts (AMD using KVM nested virtualization and Intel physical systems) and have finally gotten it to a stable state with 30+ live migrations without packets dropping. &nbsp;I intend to do further testing (specifically
 with AMD physical boxes). &nbsp;</div>
<div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Bhyve is phenomenal. If there is no interest in a champion, I still intend to at least attempt to see the process through (acceptance or not). Happy to provide the documentation/requested info.</div>
<div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Thanks for the consideration.</div>
<div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
William Mckenzie</div>
<div style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
wmckenzie@rhelitpro.com</div>
<div dir="ltr" style="font-family: Aptos, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
</body>
</html>
home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CH3PR12MB8187F88C7668D06E506DEB5CFEE52>