From nobody Tue Jun 16 20:40:28 2026 X-Original-To: virtualization@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4gfzQQ1HR9z6hmsJ for ; Tue, 16 Jun 2026 20:40:30 +0000 (UTC) (envelope-from wmckenzie@rhelitpro.com) Received: from mail-qt1-x830.google.com (mail-qt1-x830.google.com [IPv6:2607:f8b0:4864:20::830]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4gfzQP57Vrz3kZ4 for ; Tue, 16 Jun 2026 20:40:29 +0000 (UTC) (envelope-from wmckenzie@rhelitpro.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-qt1-x830.google.com with SMTP id d75a77b69052e-5177945a279so56622271cf.0 for ; Tue, 16 Jun 2026 13:40:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1781642423; cv=none; d=google.com; s=arc-20240605; b=SZD3KkxwclrxzTHhKCs8341vf/4e/Af9loep0QLzWornoy7H1h6kG4MTYYn/0QELRn qx/oXEsTnBgjfC7zhWBfm+kAYFzjhWb0FhGrP4bSD+fv3ytrJHVTzG1HF+p4/pAqYQiQ wCmh/INhu5hk/lerQqmtbPTL0WwXyS9zqeFaAv/YpDoh7DW7nBwBnBpm/LchWRPb9SGk joaH0tskiqiVO9WMlvrNgVhIQEAr50Wx8QR5koYDnjEhmYTHDZwybdX9ZGUusHrGaChP OdmrMyhC+84lBL2XYPbHzNADIF0AWMgqjobKvYr9DlpMvy4+3VlmhBtegScoR25cI26L SBYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=vinntj9XamN/AdXXauFeDB/68YrrwBtpTHs6jO/fsIg=; fh=lDuWU0Le42xP677gly0d2S07EEycXbmLN7JFYYIPYuE=; b=bpN9BkVA7fEeDDIEeJTgnZjjHjj+ZSPTGcHCAadT/+/VeJmNGqCeIfZL0f9w34VlMX UedDhwMcqtu0+a9S+4h/u+5xncLK5mt+B3kzBRuhZgBqY2MznNcuIK43wH3BJmLPLMV8 JOptRJV6zmH1VF3e2hXe6Y2C4HaXDEY/gEJ0QdwDfBuUGHUdbD69s0nFMfiHu+ldYMjJ 3Ghzbpz34IUD64kAWRhwB/MR9DTraXyEzCHx1ErbulE9XuQSqMa31D8/esOnqmFY9os5 osLS9WiqWwlgG4BKmRMIusd31sw2BM4hqt9tGba/Lm2XTzlxoro/+R0VhHL5GjC9sL5R zTPw==; darn=freebsd.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rhelitpro.com; s=rhelitpro; t=1781642423; x=1782247223; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=vinntj9XamN/AdXXauFeDB/68YrrwBtpTHs6jO/fsIg=; b=A/TS8lwQU8hRB8foQMm8LrQE4/9t8chDC9EH8XJsM8CRRZUyyY7ElZY5R+AttJYsf8 INbfhIPWE6BS0HPHsfZfE1bYw5J/Tcvf/TdDbs4K0qPSy16WlIa6p42v8bZadiEwpU39 18g0BMDEirdeACeXrkGygaQPEtPwdNIktvagM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781642423; x=1782247223; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=vinntj9XamN/AdXXauFeDB/68YrrwBtpTHs6jO/fsIg=; b=dqWcGWjC/De+iETSNiC8SUsK4r4JJZYB2BTHeDx3CrYaL0+6tXSkk/K8XEx4vuYe90 QI8igO5wilsZ4GZE7T7IIeQT9w/7JRt+aZr0AQRjioFs3FpKzPx6dCdH0ajJPYbbl+Nn qVUjbPNCv7LE+cggTzRktz4odFHP1pJVcYzwdUbdTAYfTxHhuWSUzNkXOga+O4cB+gHQ sZGoet7GEE7vBDe1bt4ImfpxyGhGWmPUSND5EA+EuaLeLu88IM7cwnd1sFVtgEED/3vE 11EhxbTZqrSBsO3huNg1f3qPlEvJYD5VsmACZ+9fWBAwBqF9SaciMFKghkJXMzvdBV1U KfNQ== X-Gm-Message-State: AOJu0YxTT8GJkmAc2FFUyV7FB+JXnez9bjNdSzIzQEEuqlSoflunaIex kSAtnXtvNl+CILCHqJsecZs8lLXqemXXIzFJBTcyfE/VdKyc4dWgXgEaVbBNzOoQp/QIwu1BWXc IJvlrikOzdypROYlNuzNzyzUU66DMfVylUu3+mMyfPhOiIRrKei3jbAW9 X-Gm-Gg: Acq92OHD7/9/kMgWOE1HVKsw6bNMiKcJYm46xB7gTn748PGBGIMBm9JXRMuFXmxb5TO R8Gb9OnE3G0ldu8cEF4SEDSPHXc9dfRsEzj1iHz6d3iBlsJrhvHGsS5EIWwXw20cberXZNGqaHp 1PbA8g99H+9ikLavawSEEyOQOysjfl0hjFxUEoBV86mw40ZkFFLrLVMaTAYW/hkZ9yIL392vQCz 5y0LhZPrkG8bRe8JmXZSfovWH/7sTv4jY67x8TC8RToTP99qvz6aW0DkKzb5l8FQkJqDyIpLFyb /oE8QO/1 X-Received: by 2002:a05:622a:8d04:b0:519:5680:1b5 with SMTP id d75a77b69052e-519a8e00989mr16399931cf.21.1781642423205; Tue, 16 Jun 2026 13:40:23 -0700 (PDT) List-Id: Discussion List-Archive: https://lists.freebsd.org/archives/freebsd-virtualization List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-virtualization@freebsd.org Sender: owner-freebsd-virtualization@FreeBSD.org List-Id: List-Post: List-Help: List-Subscribe: List-Unsubscribe: List-Owner: Precedence: list MIME-Version: 1.0 References: In-Reply-To: From: William Mckenzie Date: Tue, 16 Jun 2026 16:40:28 -0400 X-Gm-Features: AVVi8Ce4w22QY7UnWvjvgmtIluXdei7Y0FVG6hbLwABn_d_xYyABE4pLcAMalgM Message-ID: Subject: Re: Bhyve live migration, virtio-ballooning, kvm-clock To: Matthew Grooms Cc: virtualization@freebsd.org Content-Type: multipart/alternative; boundary="000000000000c08e9b065464f301" X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Rspamd-Queue-Id: 4gfzQP57Vrz3kZ4 X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated --000000000000c08e9b065464f301 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thank you Matthew for the quick response. I certainly don't want to be a point of contention for the direction the team is already headed in. Ill read over what you sent out and if there's anyway I can contribute (lab hardware, resources, testing), i'd be more than happy to. Much appreciated! On Tue, Jun 16, 2026 at 4:27=E2=80=AFPM Matthew Grooms = wrote: > > On 6/16/26 14:28, William Mckenzie wrote: > > Hi all, > > For some time we have been working on getting bhyve live vm-migration > working. We have developed, deployed, and validated three feature series > against the FreeBSD base system (15.0) and we would like to contribute th= em > upstream. I=E2=80=99m writing to ask whether a member of the virtualizati= on team > would be willing to act as champion/mentor for these series through the > review process. > > What we=E2=80=99ve done: > > 1) bhyve live migration (vmm + libvmmapi + bhyve + bhyvectl; 10 commits, > the > kernel engine decomposed into four buildable commits). > Live migration of a running guest between two hosts: a versioned > VM_MIGRATE_* ioctl surface in vmm(4), iterative RAM precopy driven by > EPT/NPT dirty-bit harvesting, vCPU/device/timer state transfer reusing > the > existing vm_snapshot machinery, "bhyve -M send/recv" as the userland > mover, and a set of restore-correctness fixes (vCPU allocation order, > authoritative RIP, PIT re-arm, vm_restore_time on finalize, TSC/vHPET > co-anchoring). The PCI BAR re-registration fix is a standalone commit > because it also repairs a pre-existing bug in stock bhyvectl(8) > --checkpoint/restore, independent of migration. Validated end-to-end o= n > a two-host > physical Intel lab as a transparent live handoff: a running Rocky Linu= x > 9 > guest migrates in both directions keeping its boot_id, uptime, > processes, > AND live network sessions across the cutover, at ~0.4 s idle downtime; > 20/20 > bidirectional runs with zero failures, and a stress run (4 GB / 24 GB > guest > under ~2 GB/s memory churn during the migration) stayed correct with > downtime scaling as expected with the at-pause dirty set. One read-onl= y > ioctl is added to the capsicum allow-list; all state-changing ioctls > stay > outside the sandbox. > > 2) bhyve virtio-balloon (usr.sbin/bhyve; 1 commit). > A virtio-balloon (type 5) device emulation: inflate/deflate virtqueues > with host reclaim via paddr_guest2host() + madvise(MADV_FREE), standar= d > num_pages/actual config space, VIRTIO_BALLOON_F_STATS_VQ guest > telemetry, > and a per-VM control socket created before cap_enter(). Guest-validate= d > against FreeBSD virtio_balloon(4) on two nodes (inflate/deflate tracke= d > exactly; mid-flight readings prove the values are guest-driven) and a > Linux guest for the stats queue. > > 3) bhyve kvm-clock (vmm; 4 commits, gated behind hw.vmm.kvmclock, default > off). A KVM-compatible paravirtual clock: KVM CPUID signature at > 0x40000100 (bhyve's own signature leaf untouched), > MSR_KVM_SYSTEM_TIME_NEW / MSR_KVM_WALL_CLOCK_NEW on both VMX and SVM > paths, publishing standard pvclock structures through > vm_gpa_hold_global(). This is the durable fix for Linux guests marking > the TSC unstable and degrading to hpet after any snapshot/restore or > migration. Validated on hardware: guests register kvm-clock and surviv= e > repeated bidirectional migrations with zero TSC-unstable events (the > pre-kvmclock baseline reliably degraded on the same hardware). > > > I=E2=80=99ve got a full submission document (design, per-failure bring-up= history, > complete test matrix, untested-areas inventory, and security analysis) an= d > the git-format-patch series (against releng/15.0, where they are validate= d). > > I=E2=80=99ve tested many rounds of live vm-migrations across hosts (AMD u= sing KVM > nested virtualization and Intel physical systems) and have finally gotten > it to a stable state with 30+ live migrations without packets dropping. = I > intend to do further testing (specifically with AMD physical boxes). > > Bhyve is phenomenal. If there is no interest in a champion, I still inten= d > to at least attempt to see the process through (acceptance or not). Happy > to provide the documentation/requested info. > > > Thanks for working on this. Live migration patch sets have been proposed = a > few times before. You can find the most recent attempt sitting in reviews > from 2022 ... > > https://reviews.freebsd.org/D34722 > https://reviews.freebsd.org/D34811 > https://reviews.freebsd.org/D34719 > https://reviews.freebsd.org/D34720 > https://reviews.freebsd.org/D34721 > > You should also be able to locate several email threads related to the > topic on the public freebsd mailing list archives. I won't rehash that > here, but there was resistance. The orignal work for that and other bhyve > related projects ( libvdsk w/ qcow2+vmdk support, user mode usb > pass-through, etc ... ) were hosted here ... > > https://github.com/orgs/FreeBSD-UPB/repositories > > You should probably also have a look at this ... > > https://www.freebsd.org/status/report-2025-10-2025-12/bhyve-cpuid/ > > From what I gather from his Zagreb presentation, the feature is being > developed as a foundational layer to import illumos bhyve migration code > with an eye towards feature parity and potential interoperability. > > Good luck! > > -Matthew > --000000000000c08e9b065464f301 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thank you Matthew for the quick response. I certainly= don't want to be a point of contention for the direction the team is a= lready headed in. Ill read over what you sent out and if there's anyway= I can contribute (lab hardware, resources, testing), i'd be more than = happy=C2=A0to.

Much appreciated!

On Tue, = Jun 16, 2026 at 4:27=E2=80=AFPM Matthew Grooms <mgrooms@shrew.net> wrote:
=20 =20 =20


On 6/16/26 14:28, William Mckenzie wrote:
=20
Hi all,=C2=A0

For some time we have been working on getting bhyve live vm-migration working. We have developed, deployed, and validated three feature series against the FreeBSD base system (15.0) and we would like to contribute them upstream. I=E2=80=99m writing to a= sk whether a member of the virtualization team would be willing to act as champion/mentor for these series through the review process.

What we=E2=80=99ve done:=C2=A0

1) bhyve live migration (vmm + libvmmapi + bhyve + bhyvectl; 10 commits, the
=C2=A0 =C2=A0kernel engine decomposed into four buildable commits).=
=C2=A0 =C2=A0Live migration of a running guest between two hosts: a versioned
=C2=A0 =C2=A0VM_MIGRATE_* ioctl surface in vmm(4), iterative RAM pr= ecopy driven by
=C2=A0 =C2=A0EPT/NPT dirty-bit harvesting, vCPU/device/timer state transfer reusing the
=C2=A0 =C2=A0existing vm_snapshot machinery, "bhyve -M send/re= cv" as the userland
=C2=A0 =C2=A0mover, and a set of restore-correctness fixes (vCPU allocation order,
=C2=A0 =C2=A0authoritative RIP, PIT re-arm, vm_restore_time on fina= lize, TSC/vHPET
=C2=A0 =C2=A0co-anchoring). The PCI BAR re-registration fix is a standalone commit
=C2=A0 =C2=A0because it also repairs a pre-existing bug in stock bhyvectl(8)
=C2=A0 =C2=A0--checkpoint/restore, independent of migration. Valida= ted end-to-end on a two-host
=C2=A0 =C2=A0physical Intel lab as a transparent live handoff: a ru= nning Rocky Linux 9
=C2=A0 =C2=A0guest migrates in both directions keeping its boot_id, uptime, processes,
=C2=A0 =C2=A0AND live network sessions across the cutover, at ~0.4 = s idle downtime; 20/20
=C2=A0 =C2=A0bidirectional runs with zero failures, and a stress ru= n (4 GB / 24 GB guest
=C2=A0 =C2=A0under ~2 GB/s memory churn during the migration) staye= d correct with
=C2=A0 =C2=A0downtime scaling as expected with the at-pause dirty s= et. One read-only
=C2=A0 =C2=A0ioctl is added to the capsicum allow-list; all state-c= hanging ioctls stay
=C2=A0 =C2=A0outside the sandbox.

2) bhyve virtio-balloon (usr.sbin/bhyve; 1 commit).
=C2=A0 =C2=A0A virtio-balloon (type 5) device emulation: inflate/de= flate virtqueues
=C2=A0 =C2=A0with host reclaim via paddr_guest2host() + madvise(MADV_FREE), standard
=C2=A0 =C2=A0num_pages/actual config space, VIRTIO_BALLOON_F_STATS_= VQ guest telemetry,
=C2=A0 =C2=A0and a per-VM control socket created before cap_enter()= . Guest-validated
=C2=A0 =C2=A0against FreeBSD virtio_balloon(4) on two nodes (inflate/deflate tracked
=C2=A0 =C2=A0exactly; mid-flight readings prove the values are guest-driven) and a
=C2=A0 =C2=A0Linux guest for the stats queue.

3) bhyve kvm-clock (vmm; 4 commits, gated behind hw.vmm.kvmclock, default
=C2=A0 =C2=A0off). A KVM-compatible paravirtual clock: KVM CPUID si= gnature at
=C2=A0 =C2=A00x40000100 (bhyve's own signature leaf untouched),=
=C2=A0 =C2=A0MSR_KVM_SYSTEM_TIME_NEW / MSR_KVM_WALL_CLOCK_NEW on bo= th VMX and SVM
=C2=A0 =C2=A0paths, publishing standard pvclock structures through<= /div>
=C2=A0 =C2=A0vm_gpa_hold_global(). This is the durable fix for Linu= x guests marking
=C2=A0 =C2=A0the TSC unstable and degrading to hpet after any snapshot/restore or
=C2=A0 =C2=A0migration. Validated on hardware: guests register kvm-= clock and survive
=C2=A0 =C2=A0repeated bidirectional migrations with zero TSC-unstab= le events (the
=C2=A0 =C2=A0pre-kvmclock baseline reliably degraded on the same hardware).


I=E2=80=99ve got a full submission document (design, per-failure bring-up history, complete test matrix, untested-areas inventory, and security analysis) and the git-format-patch series (against releng/15.0, where they are validated).

I=E2=80=99ve tested many rounds of live vm-migrations across hosts = (AMD using KVM nested virtualization and Intel physical systems) and have finally gotten it to a stable state with 30+ live migrations without packets dropping.=C2=A0 I intend to do further testing (specifically with AMD physical boxes). =C2=A0

Bhyve is phenomenal. If there is no interest in a champion, I still intend to at least attempt to see the process through (acceptance or not). Happy to provide the documentation/requested info.


Thanks for working on this. Live migration patch sets have been proposed a few times before. You can find the most recent attempt sitting in reviews from 2022 ...

http= s://reviews.freebsd.org/D34722
http= s://reviews.freebsd.org/D34811
http= s://reviews.freebsd.org/D34719
http= s://reviews.freebsd.org/D34720
http= s://reviews.freebsd.org/D34721

You should also be able to locate several email threads related to the topic on the public freebsd mailing list archives. I won't rehash that here, but there was resistance. The orignal work for that and other bhyve related projects ( libvdsk w/ qcow2+vmdk support, user mode usb pass-through, etc ... ) were hosted here ...

https://github.com/orgs/FreeBSD-UPB/repositories

You should probably also have a look at this ...

https://www.freebsd.org/status/report-2025-10-2= 025-12/bhyve-cpuid/

From what I gather from his Zagreb presentation, the feature is being developed as a foundational layer to import illumos bhyve migration code with an eye towards feature parity and potential interoperability.

Good luck!

-Matthew

--000000000000c08e9b065464f301--