From nobody Tue Jun 16 19:28:33 2026 X-Original-To: freebsd-virtualization@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4gfxqY1fXGz6hgZp for ; Tue, 16 Jun 2026 19:28:41 +0000 (UTC) (envelope-from wmckenzie@rhelitpro.com) Received: from mail-yw1-x112d.google.com (mail-yw1-x112d.google.com [IPv6:2607:f8b0:4864:20::112d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4gfxqX1HqZz3Tpv for ; Tue, 16 Jun 2026 19:28:40 +0000 (UTC) (envelope-from wmckenzie@rhelitpro.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=rhelitpro.com header.s=rhelitpro header.b=sDHj8wR+; dmarc=none; spf=pass (mx1.freebsd.org: domain of wmckenzie@rhelitpro.com designates 2607:f8b0:4864:20::112d as permitted sender) smtp.mailfrom=wmckenzie@rhelitpro.com Received: by mail-yw1-x112d.google.com with SMTP id 00721157ae682-7e3b2a435ecso54196127b3.1 for ; Tue, 16 Jun 2026 12:28:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rhelitpro.com; s=rhelitpro; t=1781638114; x=1782242914; darn=freebsd.org; h=mime-version:content-language:accept-language:message-id:date :thread-index:thread-topic:subject:to:from:from:to:cc:subject:date :message-id:reply-to; bh=XNaDTG3YZyrOou312g9NKQDK0Q3u6MI2KL5DPdkl8eg=; b=sDHj8wR+cKir1uScuH19jQ0ufKXEZ3LILc3JkzK95WfrgRtsewNND2xw156Ym4+uwh arGl/TcgMbj7HR88xe/2PNccjeYLBONqVDmY9u1n9zU/dfk540ai0/RRi44UrEs1wRrS /K+rD5U9lKUJVvAG8o61h9X1zZTNZs1wMR1eQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781638114; x=1782242914; h=mime-version:content-language:accept-language:message-id:date :thread-index:thread-topic:subject:to:from:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=XNaDTG3YZyrOou312g9NKQDK0Q3u6MI2KL5DPdkl8eg=; b=DbccWzfk7pZAwPDLnjIfgcOEZw6qRdEN9ujOzLN6sj0JgMBXrq/1Y2V3OersD+8khV 9gilJfhR740cuPYjC0GMp5JQkyTy17pUk0Eo5fXY5ZpdXDUdajRUo+ZBY12grzcSVFDu OHRBvGL3K0Kj0pce2vR4bwynsRm7p3pMrs39MesXHpI9XITft7hS3Bmi/MYDKx0r7SHr 7KKym9k0LwqbXo++vN8EfhEw0v1Dzw0ZHLO1d4A9/9nWkiWsL16Xuf6uopx3KFp7d1Z9 K/yKqhBOX4O0OJfwIohgzPUn0I3jdPHM9zjjrohjPY3wzQSdVGdqn1xe3N9tIccFIS4j iI8Q== X-Gm-Message-State: AOJu0Yy3egUdxz8q8+HkMzplkb81MPgq09PK5KbWd5YszmkaEmv9J2oL NdAJbmvnc1/3WFIOMXPC7LsfiLalnRhnMRRzgxiw85itmzvyOvNnEOjdE1mr2PEXXX9Gzs00pQ1 t8o6mPw== X-Gm-Gg: AfdE7cmiugZW34+honItfpCYP+PG4EyzeQxp/rC1dfHtniRVmyQJCTFoJDtGRktr3GM MEnG5qnkNkpYsPtzFzcw0ePovyO3vfyMuv0Ez0Cu9Ujk/VF2LrRgyv2WHevZWC3vNGkB761P/wd eaogGbqHnqFZmgnb2cS1ni/hptSQyZCBamM3YBd8i7+wcLayH0dawalwV0YAZdZLuJ/faRZ87VX zOH6v/PfD02437zjY6cXlzEJXDqEsV6OZdFkYU33OXjkZL9taJ3JVH04qf2V2rcJjeyOiK0e289 CUX0B9eQb8qC2eKK/xVfRCEXPQtucXNOmXTzdHwc1uOMGve9AWD1ZqHV+rLhij6RuY+7Kjqh+lR aN9TUNxJPhmJeV4QAtuwTawbVtkSRq3G0RHnSsIujk6WyawjFPnyCewQIuVuK6fSFxjfHJIlyVg Oj2Bn/8F59beBK8LPPGK6AgsOT2j12L5bDbdlDejpmCQY13ZwSW9MACqINkiYrMyJK2z7kvh4kZ zl5kmVZtJZcNZvYHli5 X-Received: by 2002:a05:690c:ed5:b0:79a:5fb9:62ad with SMTP id 00721157ae682-7fe5e4997famr4066747b3.43.1781638114330; Tue, 16 Jun 2026 12:28:34 -0700 (PDT) Received: from CH3PR12MB8187.namprd12.prod.outlook.com ([2603:1036:304:3005::5]) by smtp.gmail.com with ESMTPSA id 00721157ae682-7fd33743bc6sm18221897b3.18.2026.06.16.12.28.33 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 Jun 2026 12:28:34 -0700 (PDT) From: William Mckenzie To: "freebsd-virtualization@FreeBSD.org" Subject: Bhyve live migration, virtio-ballooning, kvm-clock Thread-Topic: Bhyve live migration, virtio-ballooning, kvm-clock Thread-Index: AQHc/cQXwI75fn0BTkOmVX/aGFkWIA== X-MS-Exchange-MessageSentRepresentingType: 1 Date: Tue, 16 Jun 2026 19:28:33 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-Exchange-Organization-SCL: -1 X-MS-TNEF-Correlator: X-MS-Exchange-Organization-RecordReviewCfmType: 0 x-ms-reactions: allow Content-Type: multipart/alternative; boundary="_000_CH3PR12MB8187F88C7668D06E506DEB5CFEE52CH3PR12MB8187namp_" List-Id: Discussion List-Archive: https://lists.freebsd.org/archives/freebsd-virtualization List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-virtualization@freebsd.org Sender: owner-freebsd-virtualization@FreeBSD.org List-Id: List-Post: List-Help: List-Subscribe: List-Unsubscribe: List-Owner: Precedence: list MIME-Version: 1.0 X-Spamd-Result: default: False [-3.50 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4864::/56]; R_DKIM_ALLOW(-0.20)[rhelitpro.com:s=rhelitpro]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; MIME_TRACE(0.00)[0:+,1:+,2:~]; MISSING_XM_UA(0.00)[]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; DMARC_NA(0.00)[rhelitpro.com]; MLMMJ_DEST(0.00)[freebsd-virtualization@freebsd.org]; RCVD_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::112d:from]; TO_DN_EQ_ADDR_ALL(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-virtualization@freebsd.org]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCVD_TLS_LAST(0.00)[]; DKIM_TRACE(0.00)[rhelitpro.com:+] X-Spamd-Bar: --- X-Rspamd-Queue-Id: 4gfxqX1HqZz3Tpv --_000_CH3PR12MB8187F88C7668D06E506DEB5CFEE52CH3PR12MB8187namp_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Hi all, For some time we have been working on getting bhyve live vm-migration worki= ng. We have developed, deployed, and validated three feature series against= the FreeBSD base system (15.0) and we would like to contribute them upstre= am. I=92m writing to ask whether a member of the virtualization team would = be willing to act as champion/mentor for these series through the review pr= ocess. What we=92ve done: 1) bhyve live migration (vmm + libvmmapi + bhyve + bhyvectl; 10 commits, th= e kernel engine decomposed into four buildable commits). Live migration of a running guest between two hosts: a versioned VM_MIGRATE_* ioctl surface in vmm(4), iterative RAM precopy driven by EPT/NPT dirty-bit harvesting, vCPU/device/timer state transfer reusing t= he existing vm_snapshot machinery, "bhyve -M send/recv" as the userland mover, and a set of restore-correctness fixes (vCPU allocation order, authoritative RIP, PIT re-arm, vm_restore_time on finalize, TSC/vHPET co-anchoring). The PCI BAR re-registration fix is a standalone commit because it also repairs a pre-existing bug in stock bhyvectl(8) --checkpoint/restore, independent of migration. Validated end-to-end on = a two-host physical Intel lab as a transparent live handoff: a running Rocky Linux = 9 guest migrates in both directions keeping its boot_id, uptime, processes= , AND live network sessions across the cutover, at ~0.4 s idle downtime; 2= 0/20 bidirectional runs with zero failures, and a stress run (4 GB / 24 GB gu= est under ~2 GB/s memory churn during the migration) stayed correct with downtime scaling as expected with the at-pause dirty set. One read-only ioctl is added to the capsicum allow-list; all state-changing ioctls sta= y outside the sandbox. 2) bhyve virtio-balloon (usr.sbin/bhyve; 1 commit). A virtio-balloon (type 5) device emulation: inflate/deflate virtqueues with host reclaim via paddr_guest2host() + madvise(MADV_FREE), standard num_pages/actual config space, VIRTIO_BALLOON_F_STATS_VQ guest telemetry= , and a per-VM control socket created before cap_enter(). Guest-validated against FreeBSD virtio_balloon(4) on two nodes (inflate/deflate tracked exactly; mid-flight readings prove the values are guest-driven) and a Linux guest for the stats queue. 3) bhyve kvm-clock (vmm; 4 commits, gated behind hw.vmm.kvmclock, default off). A KVM-compatible paravirtual clock: KVM CPUID signature at 0x40000100 (bhyve's own signature leaf untouched), MSR_KVM_SYSTEM_TIME_NEW / MSR_KVM_WALL_CLOCK_NEW on both VMX and SVM paths, publishing standard pvclock structures through vm_gpa_hold_global(). This is the durable fix for Linux guests marking the TSC unstable and degrading to hpet after any snapshot/restore or migration. Validated on hardware: guests register kvm-clock and survive repeated bidirectional migrations with zero TSC-unstable events (the pre-kvmclock baseline reliably degraded on the same hardware). I=92ve got a full submission document (design, per-failure bring-up history= , complete test matrix, untested-areas inventory, and security analysis) an= d the git-format-patch series (against releng/15.0, where they are validate= d). I=92ve tested many rounds of live vm-migrations across hosts (AMD using KVM= nested virtualization and Intel physical systems) and have finally gotten = it to a stable state with 30+ live migrations without packets dropping. I = intend to do further testing (specifically with AMD physical boxes). Bhyve is phenomenal. If there is no interest in a champion, I still intend = to at least attempt to see the process through (acceptance or not). Happy t= o provide the documentation/requested info. Thanks for the consideration. William Mckenzie wmckenzie@rhelitpro.com --_000_CH3PR12MB8187F88C7668D06E506DEB5CFEE52CH3PR12MB8187namp_ Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable
Hi all, 

For some time we have been working on getting bhyve live vm-migration worki= ng. We have developed, deployed, and validated three feature series against= the FreeBSD base system (15.0) and we would like to contribute them upstre= am. I=92m writing to ask whether a member of the virtualization team would be willing to act as champion/ment= or for these series through the review process.

What we=92ve done: 

1) bhyve live migration (vmm + libvmmapi + bhyve + bhyvectl; 10 commits, th= e
   kernel engine decomposed into four buildable commits).
   Live migration of a running guest between two hosts: a version= ed
   VM_MIGRATE_* ioctl surface in vmm(4), iterative RAM precopy dr= iven by
   EPT/NPT dirty-bit harvesting, vCPU/device/timer state transfer= reusing the
   existing vm_snapshot machinery, "bhyve -M send/recv"= as the userland
   mover, and a set of restore-correctness fixes (vCPU allocation= order,
   authoritative RIP, PIT re-arm, vm_restore_time on finalize, TS= C/vHPET
   co-anchoring). The PCI BAR re-registration fix is a standalone= commit
   because it also repairs a pre-existing bug in stock bhyvectl(8= )
   --checkpoint/restore, independent of migration. Validated end-= to-end on a two-host
   physical Intel lab as a transparent live handoff: a running Ro= cky Linux 9
   guest migrates in both directions keeping its boot_id, uptime,= processes,
   AND live network sessions across the cutover, at ~0.4 s idle d= owntime; 20/20
   bidirectional runs with zero failures, and a stress run (4 GB = / 24 GB guest
   under ~2 GB/s memory churn during the migration) stayed correc= t with
   downtime scaling as expected with the at-pause dirty set. One = read-only
   ioctl is added to the capsicum allow-list; all state-changing = ioctls stay
   outside the sandbox.

2) bhyve virtio-balloon (usr.sbin/bhyve; 1 commit).
   A virtio-balloon (type 5) device emulation: inflate/deflate vi= rtqueues
   with host reclaim via paddr_guest2host() + madvise(MADV_FREE),= standard
   num_pages/actual config space, VIRTIO_BALLOON_F_STATS_VQ guest= telemetry,
   and a per-VM control socket created before cap_enter(). Guest-= validated
   against FreeBSD virtio_balloon(4) on two nodes (inflate/deflat= e tracked
   exactly; mid-flight readings prove the values are guest-driven= ) and a
   Linux guest for the stats queue.

3) bhyve kvm-clock (vmm; 4 commits, gated behind hw.vmm.kvmclock, default
   off). A KVM-compatible paravirtual clock: KVM CPUID signature = at
   0x40000100 (bhyve's own signature leaf untouched),
   MSR_KVM_SYSTEM_TIME_NEW / MSR_KVM_WALL_CLOCK_NEW on both VMX a= nd SVM
   paths, publishing standard pvclock structures through
   vm_gpa_hold_global(). This is the durable fix for Linux guests= marking
   the TSC unstable and degrading to hpet after any snapshot/rest= ore or
   migration. Validated on hardware: guests register kvm-clock an= d survive
   repeated bidirectional migrations with zero TSC-unstable event= s (the
   pre-kvmclock baseline reliably degraded on the same hardware).=


I=92ve got a full submission document (design, per-failure bring-up history= , complete test matrix, untested-areas inventory, and security analysis) an= d the git-format-patch series (against releng/15.0, where they are validate= d).

I=92ve tested many rounds of live vm-migrations across hosts (AMD using KVM= nested virtualization and Intel physical systems) and have finally gotten = it to a stable state with 30+ live migrations without packets dropping. &nb= sp;I intend to do further testing (specifically with AMD physical boxes).  

Bhyve is phenomenal. If there is no interest in a champion, I still intend = to at least attempt to see the process through (acceptance or not). Happy t= o provide the documentation/requested info.

Thanks for the consideration.

William Mckenzie
wmckenzie@rhelitpro.com

--_000_CH3PR12MB8187F88C7668D06E506DEB5CFEE52CH3PR12MB8187namp_--