Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 16 Jun 2026 16:40:28 -0400
From:      William Mckenzie <wmckenzie@rhelitpro.com>
To:        Matthew Grooms <mgrooms@shrew.net>
Cc:        virtualization@freebsd.org
Subject:   Re: Bhyve live migration, virtio-ballooning, kvm-clock
Message-ID:  <CAKhdOUre%2Ba74a7EYMgpQya8r1t1X=Fdet1-NJU=7oxRnmicDGA@mail.gmail.com>
In-Reply-To: <d10083bd-3eb0-4d99-bf91-c8e1486570a6@shrew.net>
References:  <CH3PR12MB8187F88C7668D06E506DEB5CFEE52@CH3PR12MB8187.namprd12.prod.outlook.com> <d10083bd-3eb0-4d99-bf91-c8e1486570a6@shrew.net>

index | next in thread | previous in thread | raw e-mail

[-- Attachment #1 --]
Thank you Matthew for the quick response. I certainly don't want to be a
point of contention for the direction the team is already headed in. Ill
read over what you sent out and if there's anyway I can contribute (lab
hardware, resources, testing), i'd be more than happy to.

Much appreciated!

On Tue, Jun 16, 2026 at 4:27 PM Matthew Grooms <mgrooms@shrew.net> wrote:

>
> On 6/16/26 14:28, William Mckenzie wrote:
>
> Hi all,
>
> For some time we have been working on getting bhyve live vm-migration
> working. We have developed, deployed, and validated three feature series
> against the FreeBSD base system (15.0) and we would like to contribute them
> upstream. I’m writing to ask whether a member of the virtualization team
> would be willing to act as champion/mentor for these series through the
> review process.
>
> What we’ve done:
>
> 1) bhyve live migration (vmm + libvmmapi + bhyve + bhyvectl; 10 commits,
> the
>    kernel engine decomposed into four buildable commits).
>    Live migration of a running guest between two hosts: a versioned
>    VM_MIGRATE_* ioctl surface in vmm(4), iterative RAM precopy driven by
>    EPT/NPT dirty-bit harvesting, vCPU/device/timer state transfer reusing
> the
>    existing vm_snapshot machinery, "bhyve -M send/recv" as the userland
>    mover, and a set of restore-correctness fixes (vCPU allocation order,
>    authoritative RIP, PIT re-arm, vm_restore_time on finalize, TSC/vHPET
>    co-anchoring). The PCI BAR re-registration fix is a standalone commit
>    because it also repairs a pre-existing bug in stock bhyvectl(8)
>    --checkpoint/restore, independent of migration. Validated end-to-end on
> a two-host
>    physical Intel lab as a transparent live handoff: a running Rocky Linux
> 9
>    guest migrates in both directions keeping its boot_id, uptime,
> processes,
>    AND live network sessions across the cutover, at ~0.4 s idle downtime;
> 20/20
>    bidirectional runs with zero failures, and a stress run (4 GB / 24 GB
> guest
>    under ~2 GB/s memory churn during the migration) stayed correct with
>    downtime scaling as expected with the at-pause dirty set. One read-only
>    ioctl is added to the capsicum allow-list; all state-changing ioctls
> stay
>    outside the sandbox.
>
> 2) bhyve virtio-balloon (usr.sbin/bhyve; 1 commit).
>    A virtio-balloon (type 5) device emulation: inflate/deflate virtqueues
>    with host reclaim via paddr_guest2host() + madvise(MADV_FREE), standard
>    num_pages/actual config space, VIRTIO_BALLOON_F_STATS_VQ guest
> telemetry,
>    and a per-VM control socket created before cap_enter(). Guest-validated
>    against FreeBSD virtio_balloon(4) on two nodes (inflate/deflate tracked
>    exactly; mid-flight readings prove the values are guest-driven) and a
>    Linux guest for the stats queue.
>
> 3) bhyve kvm-clock (vmm; 4 commits, gated behind hw.vmm.kvmclock, default
>    off). A KVM-compatible paravirtual clock: KVM CPUID signature at
>    0x40000100 (bhyve's own signature leaf untouched),
>    MSR_KVM_SYSTEM_TIME_NEW / MSR_KVM_WALL_CLOCK_NEW on both VMX and SVM
>    paths, publishing standard pvclock structures through
>    vm_gpa_hold_global(). This is the durable fix for Linux guests marking
>    the TSC unstable and degrading to hpet after any snapshot/restore or
>    migration. Validated on hardware: guests register kvm-clock and survive
>    repeated bidirectional migrations with zero TSC-unstable events (the
>    pre-kvmclock baseline reliably degraded on the same hardware).
>
>
> I’ve got a full submission document (design, per-failure bring-up history,
> complete test matrix, untested-areas inventory, and security analysis) and
> the git-format-patch series (against releng/15.0, where they are validated).
>
> I’ve tested many rounds of live vm-migrations across hosts (AMD using KVM
> nested virtualization and Intel physical systems) and have finally gotten
> it to a stable state with 30+ live migrations without packets dropping.  I
> intend to do further testing (specifically with AMD physical boxes).
>
> Bhyve is phenomenal. If there is no interest in a champion, I still intend
> to at least attempt to see the process through (acceptance or not). Happy
> to provide the documentation/requested info.
>
>
> Thanks for working on this. Live migration patch sets have been proposed a
> few times before. You can find the most recent attempt sitting in reviews
> from 2022 ...
>
> https://reviews.freebsd.org/D34722
> https://reviews.freebsd.org/D34811
> https://reviews.freebsd.org/D34719
> https://reviews.freebsd.org/D34720
> https://reviews.freebsd.org/D34721
>
> You should also be able to locate several email threads related to the
> topic on the public freebsd mailing list archives. I won't rehash that
> here, but there was resistance. The orignal work for that and other bhyve
> related projects ( libvdsk w/ qcow2+vmdk support, user mode usb
> pass-through, etc ... ) were hosted here ...
>
> https://github.com/orgs/FreeBSD-UPB/repositories
>
> You should probably also have a look at this ...
>
> https://www.freebsd.org/status/report-2025-10-2025-12/bhyve-cpuid/
>
> From what I gather from his Zagreb presentation, the feature is being
> developed as a foundational layer to import illumos bhyve migration code
> with an eye towards feature parity and potential interoperability.
>
> Good luck!
>
> -Matthew
>

[-- Attachment #2 --]
<div dir="ltr"><div>Thank you Matthew for the quick response. I certainly don&#39;t want to be a point of contention for the direction the team is already headed in. Ill read over what you sent out and if there&#39;s anyway I can contribute (lab hardware, resources, testing), i&#39;d be more than happy to.<br><br></div>Much appreciated!</div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Tue, Jun 16, 2026 at 4:27 PM Matthew Grooms &lt;<a href="mailto:mgrooms@shrew.net">mgrooms@shrew.net</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><u></u>

  
    
  
  <div>
    <p><br>
    </p>
    <div>On 6/16/26 14:28, William Mckenzie
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr" style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
        Hi all, </div>
      <div dir="ltr" style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
        <br>
      </div>
      <div dir="ltr" style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
        For some time we have been working on getting bhyve live
        vm-migration working. We have developed, deployed, and validated
        three feature series against the FreeBSD base system (15.0) and
        we would like to contribute them upstream. I’m writing to ask
        whether a member of the virtualization team would be willing to
        act as champion/mentor for these series through the review
        process.</div>
      <div dir="ltr" style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
        <br>
      </div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
        What we’ve done: </div>
      <div dir="ltr" style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
        <br>
      </div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
        1) bhyve live migration (vmm + libvmmapi + bhyve + bhyvectl; 10
        commits, the</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           kernel engine decomposed into four buildable commits).</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           Live migration of a running guest between two hosts: a
        versioned</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           VM_MIGRATE_* ioctl surface in vmm(4), iterative RAM precopy
        driven by</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           EPT/NPT dirty-bit harvesting, vCPU/device/timer state
        transfer reusing the</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           existing vm_snapshot machinery, &quot;bhyve -M send/recv&quot; as the
        userland</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           mover, and a set of restore-correctness fixes (vCPU
        allocation order,</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           authoritative RIP, PIT re-arm, vm_restore_time on finalize,
        TSC/vHPET</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           co-anchoring). The PCI BAR re-registration fix is a
        standalone commit</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           because it also repairs a pre-existing bug in stock
        bhyvectl(8)</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           --checkpoint/restore, independent of migration. Validated
        end-to-end on a two-host</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           physical Intel lab as a transparent live handoff: a running
        Rocky Linux 9</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           guest migrates in both directions keeping its boot_id,
        uptime, processes,</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           AND live network sessions across the cutover, at ~0.4 s idle
        downtime; 20/20</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           bidirectional runs with zero failures, and a stress run (4 GB
        / 24 GB guest</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           under ~2 GB/s memory churn during the migration) stayed
        correct with</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           downtime scaling as expected with the at-pause dirty set. One
        read-only</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           ioctl is added to the capsicum allow-list; all state-changing
        ioctls stay</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           outside the sandbox.</div>
      <div dir="ltr" style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
        <br>
      </div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
        2) bhyve virtio-balloon (usr.sbin/bhyve; 1 commit).</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           A virtio-balloon (type 5) device emulation: inflate/deflate
        virtqueues</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           with host reclaim via paddr_guest2host() +
        madvise(MADV_FREE), standard</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           num_pages/actual config space, VIRTIO_BALLOON_F_STATS_VQ
        guest telemetry,</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           and a per-VM control socket created before cap_enter().
        Guest-validated</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           against FreeBSD virtio_balloon(4) on two nodes
        (inflate/deflate tracked</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           exactly; mid-flight readings prove the values are
        guest-driven) and a</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           Linux guest for the stats queue.</div>
      <div dir="ltr" style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
        <br>
      </div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
        3) bhyve kvm-clock (vmm; 4 commits, gated behind
        hw.vmm.kvmclock, default</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           off). A KVM-compatible paravirtual clock: KVM CPUID signature
        at</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           0x40000100 (bhyve&#39;s own signature leaf untouched),</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           MSR_KVM_SYSTEM_TIME_NEW / MSR_KVM_WALL_CLOCK_NEW on both VMX
        and SVM</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           paths, publishing standard pvclock structures through</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           vm_gpa_hold_global(). This is the durable fix for Linux
        guests marking</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           the TSC unstable and degrading to hpet after any
        snapshot/restore or</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           migration. Validated on hardware: guests register kvm-clock
        and survive</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           repeated bidirectional migrations with zero TSC-unstable
        events (the</div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
           pre-kvmclock baseline reliably degraded on the same
        hardware).</div>
      <div dir="ltr" style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
        <br>
      </div>
      <div dir="ltr" style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
        <br>
      </div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
        I’ve got a full submission document (design, per-failure
        bring-up history, complete test matrix, untested-areas
        inventory, and security analysis) and the git-format-patch
        series (against releng/15.0, where they are validated).</div>
      <div dir="ltr" style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
        <br>
      </div>
      <div dir="ltr" style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
        I’ve tested many rounds of live vm-migrations across hosts (AMD
        using KVM nested virtualization and Intel physical systems) and
        have finally gotten it to a stable state with 30+ live
        migrations without packets dropping.  I intend to do further
        testing (specifically with AMD physical boxes).  </div>
      <div dir="ltr" style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
        <br>
      </div>
      <div style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)">
        Bhyve is phenomenal. If there is no interest in a champion, I
        still intend to at least attempt to see the process through
        (acceptance or not). Happy to provide the
        documentation/requested info.</div>
      <div dir="ltr" style="font-family:Aptos,sans-serif;font-size:12pt;color:rgb(0,0,0)"><br>
      </div>
    </blockquote>
    <p><br>
    </p>
    <p>Thanks for working on this. Live migration patch sets have been
      proposed a few times before. You can find the most recent attempt
      sitting in reviews from 2022 ...<br>
      <br>
      <a href="https://reviews.freebsd.org/D34722" target="_blank">https://reviews.freebsd.org/D34722</a><br>;
      <a href="https://reviews.freebsd.org/D34811" target="_blank">https://reviews.freebsd.org/D34811</a><br>;
      <a href="https://reviews.freebsd.org/D34719" target="_blank">https://reviews.freebsd.org/D34719</a><br>;
      <a href="https://reviews.freebsd.org/D34720" target="_blank">https://reviews.freebsd.org/D34720</a><br>;
      <a href="https://reviews.freebsd.org/D34721" target="_blank">https://reviews.freebsd.org/D34721</a><br>;
      <br>
      You should also be able to locate several email threads related to
      the topic on the public freebsd mailing list archives. I won&#39;t
      rehash that here, but there was resistance. The orignal work for
      that and other bhyve related projects ( libvdsk w/ qcow2+vmdk
      support, user mode usb pass-through, etc ... ) were hosted here
      ...<br>
      <br>
      <a href="https://github.com/orgs/FreeBSD-UPB/repositories" target="_blank">https://github.com/orgs/FreeBSD-UPB/repositories</a></p>;
    <p>You should probably also have a look at this ...<br>
      <br>
      <a href="https://www.freebsd.org/status/report-2025-10-2025-12/bhyve-cpuid/" target="_blank">https://www.freebsd.org/status/report-2025-10-2025-12/bhyve-cpuid/</a><br>;
      <br>
      From what I gather from his Zagreb presentation, the feature is
      being developed as a foundational layer to import illumos bhyve
      migration code with an eye towards feature parity and potential
      interoperability.<br>
      <br>
      Good luck!<br>
      <br>
      -Matthew</p>
  </div>

</blockquote></div>
home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAKhdOUre%2Ba74a7EYMgpQya8r1t1X=Fdet1-NJU=7oxRnmicDGA>