From nobody Sat Jul 15 17:31:45 2023 X-Original-To: virtualization@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4R3Fl84sTVz4nL7G for ; Sat, 15 Jul 2023 17:31:48 +0000 (UTC) (envelope-from rob.fx907@gmail.com) Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4R3Fl76N1Kz45qN for ; Sat, 15 Jul 2023 17:31:47 +0000 (UTC) (envelope-from rob.fx907@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-wm1-x334.google.com with SMTP id 5b1f17b1804b1-3fbb634882dso6148675e9.0 for ; Sat, 15 Jul 2023 10:31:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1689442306; x=1692034306; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Fdaa5RQs1mSaLi0Jj7qr+PSMgD7AXH5Wc8zgBPVkxp4=; b=MFIiSVNGwSbU0NlJVTfTwtYULQUd+HrOTLg9p65vXReqFnLmN7Fw8woswRx/DwPfUJ JqpOPanTWAyJvmBP5JawINqZKJGrKlSu7fptmZojr5rijVznN+zvIfwA5N2dtgfPCNuB mM2klOXiGrvTCVsSE1nYTF+nmx0ADdtdiNOlfXOgJ/SpfNuP0f2+lDkDthONwHJ5QBSQ ns1hWdGhjZ2A17RTLeXlEuKetbGWluJ/df7JMwAL0u4ba3z3PwD1lixwBag3+n97mEAx GYC1eJBLw19/CgGSNW/2k1aGA41rVeBV4ROnx9gepv2JjbYde8DJ30OUHDxtf05ArLDB gM4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689442306; x=1692034306; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Fdaa5RQs1mSaLi0Jj7qr+PSMgD7AXH5Wc8zgBPVkxp4=; b=FBdZRsUxqOFQR60E4U4zelx0XB7sRwX8g/JMPmTvGHT+QHTGfnVxCwR53n2WlZ+QU/ OSTnb1Gb7xX2SFqihFH1qNxnr2oQakr214KxqHl6NrxLAH77BmeCWz5rX8uhvjwkJ0El 3jJSsV0T6imau8WrO78v84VzJANL77q9GW8oxXdgHGnAzccKanN+5Fjh4OYvoY2Ik4sH uzVZxtcgBipYh0mWnnrrMePjlHJogC6cDPTYlVjztvllBHLz4Y4kp3mnhvu/HJvObuP1 K1V7p5ySFIU5wM7ECKhes+VJBGzlZU16Ze3kUYBojrUQkcphbprSjSSvxDeSIIszX8D/ rlmg== X-Gm-Message-State: ABy/qLY4LDQht/MUSPLq9wwwvOJlvFmlsRsZaEVZ79K/R2MEMRS+u8NL j4zaQNZ+P4m6pufH5hhZZriXg5bulINVpm4FpZKapZX4 X-Google-Smtp-Source: APBJJlFmxVYvCJdwWiG6TtlRSEMag7KRSpfrbfq2xQ3MsPoIH2oULSaloLD+iAs75Hr3egnxmoAYgeQkuHONV++ckq4= X-Received: by 2002:a05:600c:348d:b0:3fb:a651:c153 with SMTP id a13-20020a05600c348d00b003fba651c153mr2158406wmq.2.1689442306160; Sat, 15 Jul 2023 10:31:46 -0700 (PDT) List-Id: Discussion List-Archive: https://lists.freebsd.org/archives/freebsd-virtualization List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-virtualization@freebsd.org X-BeenThere: freebsd-virtualization@freebsd.org MIME-Version: 1.0 Received: by 2002:a17:907:970d:b0:982:69cd:7365 with HTTP; Sat, 15 Jul 2023 10:31:45 -0700 (PDT) In-Reply-To: <3a037482-2e6c-667f-1979-d5b612e506ec@shrew.net> References: <67FDC8A8-86A6-4AE4-85F0-FF7BEF9F2F06@gmail.com> <6b98da58a5bd8e83bc466efa20b5a900298210aa.camel@FreeBSD.org> <8387AC83-6667-48E5-A3FA-11475EA96A5F@gmail.com> <986A83D8-E0E0-4030-9369-A5B3500E27C6@gmail.com> <79fabe94-b800-c713-48ea-518da1f74e4d@shrew.net> <3973013d-c183-360f-d7ca-ca822859c23d@shrew.net> <3a037482-2e6c-667f-1979-d5b612e506ec@shrew.net> From: Rob Wing Date: Sat, 15 Jul 2023 09:31:45 -0800 Message-ID: Subject: Re: BHYVE SNAPSHOT image format proposal To: Matthew Grooms Cc: "virtualization@freebsd.org" Content-Type: multipart/alternative; boundary="00000000000086fca1060089f0d4" X-Rspamd-Queue-Id: 4R3Fl76N1Kz45qN X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; TAGGED_FROM(0.00)[] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N --00000000000086fca1060089f0d4 Content-Type: text/plain; charset="UTF-8" On Saturday, July 15, 2023, Matthew Grooms wrote: > > > We'll overlook the fact that it does attempt to consolidate files ( 2 vs 3 > ) and that there's no feedback requesting further consolidation after being > open for two years, but noted. You may as well abandon https://reviews.freebsd.org/D29262, for these reasons alone: - JSON doesn't support storing binary data - a snapshot should be contained in a single file The UPB patch addresses the above. Vitaliy's patch does nothing to address > any of it. If one is going to be proposed as an alternative to the other, > it better solve the same problems as then some. The UBP patch is not an alternative - show me another hypervisor that uses JSON as their snapshot format. > The silence is real. No, it's not. I've given you feedback multiple times. --00000000000086fca1060089f0d4 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

On Saturday, July 15, 2023, Matthew Grooms <mgrooms@shrew.net> wrote:

We'll overlook the fact that it does attempt to consolidate files ( 2 v= s 3 ) and that there's no feedback requesting further consolidation aft= er being open for two years, but noted.

You= may as well abandon=C2=A0ht= tps://reviews.freebsd.org/D29262, for these reasons alone:
- JSON doesn't support storing binary data
- a s= napshot should be contained in a single file

The UPB patch addresses the above. Vitaliy's patch does nothing to addr= ess any of it. If one is going to be proposed as an alternative to the othe= r, it better solve the same problems as then some.

The UBP patch is not an alternative - show me another hypervisor th= at uses JSON as their snapshot format.
=C2=A0
The silence is real.

No, it's not. I= 9;ve given you feedback multiple times.
--00000000000086fca1060089f0d4-- From nobody Sun Jul 16 01:13:00 2023 X-Original-To: virtualization@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4R3RzY6gQzz4nVbr for ; Sun, 16 Jul 2023 01:13:13 +0000 (UTC) (envelope-from mgrooms@shrew.net) Received: from mx2.shrew.net (mx2.shrew.net [38.97.5.132]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4R3RzX6RY8z3Q4b for ; Sun, 16 Jul 2023 01:13:12 +0000 (UTC) (envelope-from mgrooms@shrew.net) Authentication-Results: mx1.freebsd.org; dkim=none; spf=pass (mx1.freebsd.org: domain of mgrooms@shrew.net designates 38.97.5.132 as permitted sender) smtp.mailfrom=mgrooms@shrew.net; dmarc=none Received: from mail.shrew.net (mail.shrew.prv [10.24.10.20]) by mx2.shrew.net (8.15.2/8.15.2) with ESMTP id 36G1D6Ma010924 for ; Sat, 15 Jul 2023 20:13:06 -0500 (CDT) (envelope-from mgrooms@shrew.net) Received: from [10.22.200.32] (unknown [136.49.230.220]) by mail.shrew.net (Postfix) with ESMTPSA id 32EAE194C30 for ; Sat, 15 Jul 2023 20:13:01 -0500 (CDT) Message-ID: Date: Sat, 15 Jul 2023 20:13:00 -0500 List-Id: Discussion List-Archive: https://lists.freebsd.org/archives/freebsd-virtualization List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-virtualization@freebsd.org X-BeenThere: freebsd-virtualization@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: BHYVE SNAPSHOT image format proposal Content-Language: en-US To: virtualization@freebsd.org References: <67FDC8A8-86A6-4AE4-85F0-FF7BEF9F2F06@gmail.com> <6b98da58a5bd8e83bc466efa20b5a900298210aa.camel@FreeBSD.org> <8387AC83-6667-48E5-A3FA-11475EA96A5F@gmail.com> <986A83D8-E0E0-4030-9369-A5B3500E27C6@gmail.com> <79fabe94-b800-c713-48ea-518da1f74e4d@shrew.net> <3973013d-c183-360f-d7ca-ca822859c23d@shrew.net> <3a037482-2e6c-667f-1979-d5b612e506ec@shrew.net> From: Matthew Grooms In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (mx2.shrew.net [10.24.10.11]); Sat, 15 Jul 2023 20:13:06 -0500 (CDT) X-Spamd-Result: default: False [-3.29 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.99)[-0.994]; R_SPF_ALLOW(-0.20)[+mx]; MIME_GOOD(-0.10)[text/plain]; R_DKIM_NA(0.00)[]; ASN(0.00)[asn:174, ipnet:38.0.0.0/8, country:US]; MLMMJ_DEST(0.00)[virtualization@freebsd.org]; FROM_EQ_ENVFROM(0.00)[]; BLOCKLISTDE_FAIL(0.00)[136.49.230.220:server fail,38.97.5.132:server fail]; MIME_TRACE(0.00)[0:+]; DMARC_NA(0.00)[shrew.net]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCVD_TLS_LAST(0.00)[]; FROM_HAS_DN(0.00)[]; ARC_NA(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_ALL(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[virtualization@freebsd.org]; TO_DN_NONE(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; MID_RHS_MATCH_FROM(0.00)[] X-Rspamd-Queue-Id: 4R3RzX6RY8z3Q4b X-Spamd-Bar: --- X-ThisMailContainsUnwantedMimeParts: N On 7/15/23 12:31, Rob Wing wrote: > > On Saturday, July 15, 2023, Matthew Grooms > wrote: > > > We'll overlook the fact that it does attempt to consolidate files ( > 2 vs 3 ) and that there's no feedback requesting further > consolidation after being open for two years, but noted. > > You may as well abandon https://reviews.freebsd.org/D29262 > , for these reasons alone: > Thanks for the feedback. In the future, please feel free to add your concerns to the code review. It would be extremely helpful. > - JSON doesn't support storing binary data I don't know how to state it more clearly: Making binary copies of the data structures is the problem. Something tells me that you'll continue to ignore this, so I'll stop saying it. > - a snapshot should be contained in a single file Heard. A single file. > > The UPB patch addresses the above. Vitaliy's patch does nothing to > address any of it. If one is going to be proposed as an alternative > to the other, it better solve the same problems as then some. > > The UBP patch is not an alternative - show me another hypervisor that > uses JSON as their snapshot format. > Heard. JSON is bad. > The silence is real. > > > No, it's not. I've given you feedback multiple times. I'm very aware. The silence I was referring to wasn't yours. Feel free to re-read for context. Rob, I'm not hear to argue with you. I've shared all the opinions I feel are relevant to the file format proposal and would prefer not to waste the list's time. Thanks, -Matthew From nobody Sun Jul 16 21:00:44 2023 X-Original-To: virtualization@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4R3yKn3DTkz4n4T6 for ; Sun, 16 Jul 2023 21:00:45 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4R3yKm6L6dz3tTd for ; Sun, 16 Jul 2023 21:00:44 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1689541244; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=uqZnf9U5MioYsmseLlwefhOyAazcqtKJ41O4gcRoIJ0=; b=VVe2U4zrkpEZzjUfb0QLA/Oturyx5Dr+iFtEYAoUA3xoS+GiRkk5pqSadQYv2fwAy/cmu5 8tqyi30+KhsJs32adQrTv4a1HRyy87gul8Cdy4YzKUpf14CqQ1Ud3f9dTKqjDGRwczdcfp XOGM00x9d/ViZ3R6WwbDXpzdO7SV8ZTdOdOqK8na9awRDloSvYHinMXNAF86gCgBO3L0gw PPsKsy5dYYrdTWC6VSsR4hptiJbEN5O0Hg8wNuLgU28jzgvmw3H2TB4WfdOQru66c3ftD7 wo+zrjBxtq2yvvMbqn8M+XQtpXNlbdB+HFlyTUrWgraM2hm00qwF7zQGQDEuVQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1689541244; a=rsa-sha256; cv=none; b=iys3ldUSxHVsHC/zVllyp50Gwcj0lZY4t8yOhPEQncY4rzp0M6X2kH4xV7TI5oyPyp5JMd oJ6nXG3NzPZVJUcHBeNIHa6SP6F1iNPrb63PXajEWydQeI78uaW7VIg5O5M1Ng0nJP9svX ESut7US6DjJ5qbMFzf/RULS172ThkARhFb+1c1MzIA8HUqdqe/fgcxDK4zduCB8Owr6lHz 5JahwQuohfUot0PecIcXIiNzFWVbkyKlcN94UXJiTWKVWRs+GLT3TfDC0aLLpktiNzqJTi aSdQ2hsVPSLYX8xHmSP6T2ZKlFO7wnhu1hTva1hAGpo7FM75UnYdOIFtJ9Bsrw== Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4R3yKm5Pm1zxTw for ; Sun, 16 Jul 2023 21:00:44 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 36GL0i3i071104 for ; Sun, 16 Jul 2023 21:00:44 GMT (envelope-from bugzilla-noreply@FreeBSD.org) Received: (from bugzilla@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 36GL0iU7071103 for virtualization@FreeBSD.org; Sun, 16 Jul 2023 21:00:44 GMT (envelope-from bugzilla-noreply@FreeBSD.org) Message-Id: <202307162100.36GL0iU7071103@kenobi.freebsd.org> X-Authentication-Warning: kenobi.freebsd.org: bugzilla set sender to bugzilla-noreply@FreeBSD.org using -f From: bugzilla-noreply@FreeBSD.org To: virtualization@FreeBSD.org Subject: Problem reports for virtualization@FreeBSD.org that need special attention Date: Sun, 16 Jul 2023 21:00:44 +0000 List-Id: Discussion List-Archive: https://lists.freebsd.org/archives/freebsd-virtualization List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-virtualization@freebsd.org X-BeenThere: freebsd-virtualization@freebsd.org MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="16895412446.14EEd45.67715" Content-Transfer-Encoding: 7bit X-ThisMailContainsUnwantedMimeParts: N --16895412446.14EEd45.67715 Date: Sun, 16 Jul 2023 21:00:44 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" To view an individual PR, use: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=(Bug Id). The following is a listing of current problems submitted by FreeBSD users, which need special attention. These represent problem reports covering all versions including experimental development code and obsolete releases. Status | Bug Id | Description ------------+-----------+--------------------------------------------------- In Progress | 247208 | mpt(4): VMWare virtualized LSI controller panics New | 240945 | [hyper-v] [netvsc] hn network driver incorrectly Open | 244838 | "bectl activate -t" does not honor the -t flag in 3 problems total for which you should take action. --16895412446.14EEd45.67715 Date: Sun, 16 Jul 2023 21:00:44 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8"
The following is a listing of current problems submitted by FreeBSD users,
which need special attention. These represent problem reports covering
all versions including experimental development code and obsolete releases.

Status      |    Bug Id | Description
------------+-----------+---------------------------------------------------
In Progress |    247208 | mpt(4): VMWare virtualized LSI controller panics 
New         |    240945 | [hyper-v] [netvsc] hn network driver incorrectly 
Open        |    244838 | "bectl activate -t" does not honor the -t flag in

3 problems total for which you should take action.
--16895412446.14EEd45.67715-- From nobody Mon Jul 17 00:29:47 2023 X-Original-To: virtualization@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4R42z329WKz4mqSZ for ; Mon, 17 Jul 2023 00:29:51 +0000 (UTC) (envelope-from rob.fx907@gmail.com) Received: from mail-ed1-x536.google.com (mail-ed1-x536.google.com [IPv6:2a00:1450:4864:20::536]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4R42z267S2z3rL5 for ; Mon, 17 Jul 2023 00:29:50 +0000 (UTC) (envelope-from rob.fx907@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-ed1-x536.google.com with SMTP id 4fb4d7f45d1cf-5216cf475e9so739565a12.0 for ; Sun, 16 Jul 2023 17:29:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1689553788; x=1692145788; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Yvx9FJRCWKHZfg3qqTZh37+DR/3jPZRMWViveBDkeYg=; b=ULZ9hq5AjR5MqfepAVZ+tjolcWyyFiR4np8B67vQjoCb3yTY73fclq15dobbLNp7nj ueeTbEIUpxS51Tev5V+6XGpyVuRQHgGV8ckvxxziKlAdhRTqwdpSL66T4W/q6LpCDzeP UNfzIWukVZFzvMVPLD/9m5mR4FpgbNSU/O+OM1pgpqgwEqXoX+saY3uIGXYjD6/nOBC2 A8Xmw/3Aqg6lztVlX7pw8suk/O0obaZVvi6wXfqrlpcsx7boBk4jKEUz260mbZ+CZsGE dD/wdEKLWdWOxEo37Wv+aBOrEArB6mHif54K3p7YK1ooofOzAwpQoAXkQ1jJy2IVuD3w grLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689553788; x=1692145788; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Yvx9FJRCWKHZfg3qqTZh37+DR/3jPZRMWViveBDkeYg=; b=eMi9Exdc8BidwISWMeFnGf1591W3kAeComQBYHwr9lU5gb61v3NFU0DXa51++3PRuo esQ4QoTAPhwTWCAZf19WnjW9UxzUN+2bz10lZV2QMfOHKg6VbRBPVIdlrBLwDQsiurCT uxWTEHJZgAhPc4QHaW+vhhL7qjz+u1IOa0qcKzsGsBIgWdQuUdlVorGhBlmLAFRjbjlp ERwL4aF/hWYkprhnAl8tLsPOL9PxLuzurpLaT/kaqEOTY56CP6VaReNk/uFh1thZ7IjT rTaLlNVnwz+JmGBe3z7LNWwNtWkdzRtqPG/xGUkxi9eT6BDGAseKrA8sP+VmEspqcbtq JOog== X-Gm-Message-State: ABy/qLZ294VSnhbSkoCeqZUmuR09eQHchq7/D62ZHnofnqenRGta/G6A QWlydXSgEKkUNxdR0ZR+qNIuUl1n4oc3NhiggUT3sFIh X-Google-Smtp-Source: APBJJlHF28qsBbox4o86+lD1dpXuUBdyj9zsl4lDQAXb36gn9L8LNQ5O+BStJvXoRJAKdZkCdd852wOcMThfTJ6SRsc= X-Received: by 2002:a17:906:749e:b0:994:539d:f98b with SMTP id e30-20020a170906749e00b00994539df98bmr4371305ejl.6.1689553787793; Sun, 16 Jul 2023 17:29:47 -0700 (PDT) List-Id: Discussion List-Archive: https://lists.freebsd.org/archives/freebsd-virtualization List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-virtualization@freebsd.org X-BeenThere: freebsd-virtualization@freebsd.org MIME-Version: 1.0 Received: by 2002:a17:907:970d:b0:982:69cd:7365 with HTTP; Sun, 16 Jul 2023 17:29:47 -0700 (PDT) In-Reply-To: References: <67FDC8A8-86A6-4AE4-85F0-FF7BEF9F2F06@gmail.com> <6b98da58a5bd8e83bc466efa20b5a900298210aa.camel@FreeBSD.org> <8387AC83-6667-48E5-A3FA-11475EA96A5F@gmail.com> <986A83D8-E0E0-4030-9369-A5B3500E27C6@gmail.com> <79fabe94-b800-c713-48ea-518da1f74e4d@shrew.net> <3973013d-c183-360f-d7ca-ca822859c23d@shrew.net> <3a037482-2e6c-667f-1979-d5b612e506ec@shrew.net> From: Rob Wing Date: Sun, 16 Jul 2023 16:29:47 -0800 Message-ID: Subject: Re: BHYVE SNAPSHOT image format proposal To: Matthew Grooms Cc: "virtualization@freebsd.org" Content-Type: multipart/alternative; boundary="00000000000059b8050600a3e5fc" X-Rspamd-Queue-Id: 4R42z267S2z3rL5 X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; TAGGED_FROM(0.00)[] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N --00000000000059b8050600a3e5fc Content-Type: text/plain; charset="UTF-8" On Saturday, July 15, 2023, Matthew Grooms wrote: > > > I don't know how to state it more clearly: Making binary copies of the > data structures is the problem. Something tells me that you'll continue to > ignore this, so I'll stop saying it. Maybe this is where our disconnect is? Can you give me a pointer to the code for the data structures you're thinking of? When I say binary data, I'm thinking of the guest memory being saved.. Rob, I'm not hear to argue with you. Likewise, I don't feel like we are arguing..I look at this as trying to hash out a solution to the problem. I understand your stance is that the UPB patch solves the problem we're discussing. And I've given my reasons why the patch falls short. I've shared all the opinions I feel are relevant to the file format > proposal and would prefer not to waste the list's time. > I don't see how we are wasting the lists time. So far, we've stayed on topic and have kept it civil. --00000000000059b8050600a3e5fc Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Saturday, July 15, 2023, Matthew Grooms <mgrooms@shrew.net> wrote:

I don't know how to state it more clearly: Making binary copies of the = data structures is the problem. Something tells me that you'll continue= to ignore this, so I'll stop saying it.

Maybe this is where our disconnect is?

Can you g= ive me a pointer to the code for the data structures you're thinking of= ?

When I say binary data, I'm thinking of the = guest memory being saved..

Rob, I'm not hear to argue with you.

Li= kewise, I don't feel like we are arguing..I look at this as trying to h= ash out a solution to the problem.

I understand yo= ur stance is that the UPB patch solves the problem we're discussing. An= d I've given my reasons why the patch falls short.

=
I've shared all the opinions I feel are = relevant to the file format proposal and would prefer not to waste the list= 's time.

=C2=A0I don't see how = we are wasting the lists time. So far, we've stayed on topic and have k= ept it civil.

=C2=A0
--00000000000059b8050600a3e5fc-- From nobody Mon Jul 17 16:43:54 2023 X-Original-To: freebsd-virtualization@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4R4Sbb5G93z4n1yq for ; Mon, 17 Jul 2023 16:44:27 +0000 (UTC) (envelope-from elenamihailescu22@gmail.com) Received: from mail-ed1-x536.google.com (mail-ed1-x536.google.com [IPv6:2a00:1450:4864:20::536]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4R4SbZ4BfRz4GxM; Mon, 17 Jul 2023 16:44:26 +0000 (UTC) (envelope-from elenamihailescu22@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20221208 header.b=QJ2avby8; spf=pass (mx1.freebsd.org: domain of elenamihailescu22@gmail.com designates 2a00:1450:4864:20::536 as permitted sender) smtp.mailfrom=elenamihailescu22@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ed1-x536.google.com with SMTP id 4fb4d7f45d1cf-51e566b1774so6173401a12.1; Mon, 17 Jul 2023 09:44:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1689612262; x=1692204262; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=N/+ZWd8LICYQcFM7ZgW5GbE4sNOns933Vt+sV3oKe1s=; b=QJ2avby8lH4bOIdzInau8ZP2FR1rXyVVfw+SDVtmxgaKKLvrheFaEwFeXWD6oYctfr QTV0C3hSVzMZQATjiN9pUo2xwaCv/o7oidt7NLnvSH6IG9E2IBFZ8Vgo11h4EXlws3DP zOF4ZRRtQs6ZgDgaeeDdTKNGNaNSy5I4/XPNeGGgR4VAoii1NWmozyvgcOs3lLAr+MGj qFLORhR2jITadbZxyQ4UjL5tvLSjdxwHTKygM1jwXzUZcIrE2dqZDijxVAfDqe3ilsbV i6t4rodyfHOX1NFfRh5MwDrtE0+IOv61AsC1MTV9Y4YHLRsiL7tqhS3gIYX+LP7BTphA 0LSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689612262; x=1692204262; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=N/+ZWd8LICYQcFM7ZgW5GbE4sNOns933Vt+sV3oKe1s=; b=j9an4EWOAwTK1vcJ346/a1zaix1bPbJ9gjmZdBBVQv0XVAt2kLTLzMaHMilS1gPFQi Z+3q3LQb0ZdNsDVgl+dbYZzYE7flMka1sW4DUf7+HwGT1oiqjixYGiSnXz951ch+umy8 0xfhVlwp662dMkna2mFtFQkHdHyWS0WW07FT5GgQr0+GALCo9Ry7rVrqos5WXcSoHkEi wDAANB8CMlf0j1fZxyCyX+tQ8MBxjJIrDDqEg7byt1+mo1m9D+zEMLMwTodfOuQ0o7SN 1IwnDBdmVeyNe/fjjRqVwzF1tynyGighV2/bcFZVuAR5eI5ltlrCWL94e6gqJ1wZRhMY DGTQ== X-Gm-Message-State: ABy/qLY3f9addWCebEXZO6KlEfhW55iZhITOLPN4uy+kGtDgBOoOa/KX vp+eF1DZmzBugB24MQ1brg+HAY8TTtZmF6863TkHhbdLyVQ= X-Google-Smtp-Source: APBJJlH1Vvqk6RR/SwbD5jRr94hHKsn5DGfp91QI4JS95M0sG6hMIHtIObq2AOgvGU2nGiEgLvNuejUiG11nxW+AlkQ= X-Received: by 2002:aa7:dcc6:0:b0:51d:87c6:bf28 with SMTP id w6-20020aa7dcc6000000b0051d87c6bf28mr11933241edu.3.1689612261987; Mon, 17 Jul 2023 09:44:21 -0700 (PDT) List-Id: Discussion List-Archive: https://lists.freebsd.org/archives/freebsd-virtualization List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-virtualization@freebsd.org X-BeenThere: freebsd-virtualization@freebsd.org MIME-Version: 1.0 References: <3d7ee1f6ff98fe9aede5a85702b906fc3014b6b6.camel@FreeBSD.org> In-Reply-To: From: Elena Mihailescu Date: Mon, 17 Jul 2023 18:43:54 +0200 Message-ID: Subject: Re: Warm and Live Migration Implementation for bhyve To: =?UTF-8?Q?Corvin_K=C3=B6hne?= Cc: freebsd-virtualization@freebsd.org, Mihai Carabas , Matthew Grooms Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spamd-Result: default: False [-2.64 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-0.96)[-0.964]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; NEURAL_SPAM_SHORT(0.32)[0.322]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20221208]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36:c]; MIME_GOOD(-0.10)[text/plain]; TAGGED_RCPT(0.00)[]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::536:from]; MLMMJ_DEST(0.00)[freebsd-virtualization@freebsd.org]; TO_DN_SOME(0.00)[]; ARC_NA(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; DKIM_TRACE(0.00)[gmail.com:+]; MID_RHS_MATCH_FROMTLD(0.00)[]; FREEMAIL_FROM(0.00)[gmail.com]; RCVD_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; FREEMAIL_CC(0.00)[freebsd.org,gmail.com,shrew.net] X-Rspamd-Queue-Id: 4R4SbZ4BfRz4GxM X-Spamd-Bar: -- X-ThisMailContainsUnwantedMimeParts: N Hi Corvin, On Mon, 3 Jul 2023 at 09:35, Corvin K=C3=B6hne wrote: > > On Tue, 2023-06-27 at 16:35 +0300, Elena Mihailescu wrote: > > Hi Corvin, > > > > Thank you for the questions! I'll respond to them inline. > > > > On Mon, 26 Jun 2023 at 10:16, Corvin K=C3=B6hne > > wrote: > > > > > > Hi Elena, > > > > > > thanks for posting this proposal here. > > > > > > Some open questions from my side: > > > > > > 1. How is the data send to the target? Does the host send a > > > complete > > > dump and the target parses it? Or does the target request data one > > > by > > > one und the host sends it as response? > > > > > It's not a dump of the guest's state, it's transmitted in steps. > > However, some parts may be migrated as a chunk (e.g., the emulated > > devices' state is transmitted as the buffer generated from the > > snapshot functions). > > > > How does the receiver know which chunk relates to which device? It > would be nice if you can start bhyve on the receiver side without > parameters e.g. `bhyve --receive=3D127.0.0.1:1234`. Therefore, the > protocol has to carry some information about the device configuration. > Regarding your first question, we send a chunk of data (a buffer) with the state: we resume the data in the same order we saved it. It relies on save/restore. We currently do not support migrating between different versions of suspend&resume/migration. It would be nice to have something like `bhyve --receive=3D127.0.0.1:1234`, but I don't think it is possible at this point mainly because of the following two reasons: - the guest image must be shared (e.g., via NFS) between the source and destination hosts. If the mounting points differ between the two, opening the disk at the destination will fail (also, we must suppose that the user used an absolute path since a relative one won't work) - if the VM uses a network adapter, we must specify the tap interface on the destination host (e.g., if on the source host the VM uses `tap0`, on the destination host, `tap0` may not exist or may be used by other VMs). > > > I'll try to describe a bit the protocol we have implemented for > > migration, maybe it can partially respond to the second and third > > questions. > > > > The destination host waits for the source host to connect (through a > > socket). > > After that, the source sends its system specifications (hw_machine, > > hw_model, hw_pagesize). If the source and destination hosts have > > identical hardware configurations, the migration can take place. > > > > Then, if we have live migration, we migrate the memory in rounds > > (i.e., we get a list of the pages that have the dirty bit set, send > > it > > to the destination to know what pages will be received, then send the > > pages through the socket; this process is repeated until the last > > round). > > > > Next, we stop the guest's vcpus, send the remaining memory (for live > > migration) or the guest's memory from vmctx->baseaddr for warm > > migration. Then, based on the suspend/resume feature, we get the > > state > > of the virtualized devices (the ones from the kernel space) and send > > this buffer to the destination. We repeat this for the emulated > > devices as well (the ones from the userspace). > > > > On the receiver host, we get the memory pages and set them to their > > according position in the guest's memory, use the restore functions > > for the state of the devices and start the guest's execution. > > > > Excluding the guest's memory transfer, the rest is based on the > > suspend/resume feature. We snapshot the guest's state, but instead of > > saving the data locally, we send it via network to the destination. > > On > > the destination host, we start a new virtual machine, but instead of > > reading/getting the state from the disk (i.e., the snapshot files) we > > get this state via the network from the source host. > > > > If the destination can properly resume the guest activity, it will > > send an "OK" to the source host so it can destroy/remove the guest > > from its end. > > > > Both warm and live migration are based on "cold migration". Cold > > migration means we suspend the guest on the source host, and restore > > the guest on the destination host from the snapshot files. Warm > > migration only does this using a socket, while live migration changes > > the way the memory is migrated. > > > > > 2. What happens if we add a new data section? > > > > > What are you referring to with a new data section? Is this question > > related to the third one? If so, see my answer below. > > > > > 3. What happens if the bhyve version differs on host and target > > > machine? > > > > The two hosts must be identical for migration, that's why we have the > > part where we check the specifications between the two migration > > hosts. They are expected to have the same version of bhyve and > > FreeBSD. We will add an additional check in the check specs part to > > see if we have the same FreeBSD build. > > > > As long as the changes in the virtual memory subsystem won't affect > > bhyve (and how the virtual machine sees/uses the memory), the > > migration constraints should only be related to suspend/resume. The > > state of the virtual devices is handled by the snapshot system, so if > > it is able to accommodate changes in the data structures, the > > migration process will not be affected. > > > > Thank you, > > Elena > > > > > > > > > > > -- > > > Kind regards, > > > Corvin > > > > > > On Fri, 2023-06-23 at 13:00 +0300, Elena Mihailescu wrote: > > > > Hello, > > > > > > > > This mail presents the migration feature we have implemented for > > > > bhyve. Any feedback from the community is much appreciated. > > > > > > > > We have opened a stack of reviews on Phabricator > > > > (https://reviews.freebsd.org/D34717) that is meant to split the > > > > code > > > > in smaller parts so it can be more easily reviewed. A brief > > > > history > > > > of > > > > the implementation can be found at the bottom of this email. > > > > > > > > The migration mechanism we propose needs two main components in > > > > order > > > > to move a virtual machine from one host to another: > > > > 1. the guest's state (vCPUs, emulated and virtualized devices) > > > > 2. the guest's memory > > > > > > > > For the first part, we rely on the suspend/resume feature. We > > > > call > > > > the > > > > same functions as the ones used by suspend/resume, but instead of > > > > saving the data in files, we send it via the network. > > > > > > > > The most time consuming aspect of migration is transmitting guest > > > > memory. The UPB team has implemented two options to accomplish > > > > this: > > > > 1. Warm Migration: The guest execution is suspended on the source > > > > host > > > > while the memory is sent to the destination host. This method is > > > > less > > > > complex but may cause extended downtime. > > > > 2. Live Migration: The guest continues to execute on the source > > > > host > > > > while the memory is transmitted to the destination host. This > > > > method > > > > is more complex but offers reduced downtime. > > > > > > > > The proposed live migration procedure (pre-copy live migration) > > > > migrates the memory in rounds: > > > > 1. In the initial round, we migrate all the guest memory (all > > > > pages > > > > that are allocated) > > > > 2. In the subsequent rounds, we migrate only the pages that were > > > > modified since the previous round started > > > > 3. In the final round, we suspend the guest, migrate the > > > > remaining > > > > pages that were modified from the previous round and the guest's > > > > internal state (vCPU, emulated and virtualized devices). > > > > > > > > To detect the pages that were modified between rounds, we propose > > > > an > > > > additional dirty bit (virtualization dirty bit) for each memory > > > > page. > > > > This bit would be set every time the page's dirty bit is set. > > > > However, > > > > this virtualization dirty bit is reset only when the page is > > > > migrated. > > > > > > > > The proposed implementation is split in two parts: > > > > 1. The first one, the warm migration, is just a wrapper on the > > > > suspend/resume feature which, instead of saving the suspended > > > > state > > > > on > > > > disk, sends it via the network to the destination > > > > 2. The second part, the live migration, uses the layer previously > > > > presented, but sends the guest's memory in rounds, as described > > > > above. > > > > > > > > The migration process works as follows: > > > > 1. we identify: > > > > - VM_NAME - the name of the virtual machine which will be > > > > migrated > > > > - SRC_IP - the IP address of the source host > > > > - DST_IP - the IP address of the destination host (default is > > > > 24983) > > > > - DST_PORT - the port we want to use for migration > > > > 2. we start a virtual machine on the destination host that will > > > > wait > > > > for a migration. Here, we must specify SRC_IP (and the port we > > > > want > > > > to > > > > open for migration, default is 24983). > > > > e.g.: bhyve ... -R SRC_IP:24983 guest_vm_dst > > > > 3. using bhyvectl on the source host, we start the migration > > > > process. > > > > e.g.: bhyvectl --migrate=3DDST_IP:24983 --vm=3Dguest_vm > > > > > > > > A full tutorial on this can be found here: > > > > https://github.com/FreeBSD-UPB/freebsd-src/wiki/Virtual-Machine-Mig= ration-using-bhyve > > > > > > > > For sending the migration request to a virtual machine, we use > > > > the > > > > same thread/socket that is used for suspend. > > > > For receiving a migration request, we used a similar approach to > > > > the > > > > resume process. > > > > > > > > As some of you may remember seeing similar emails from our part > > > > on > > > > the > > > > freebsd-virtualization list, I'll present a brief history of this > > > > project: > > > > The first part of the project was the suspend/resume > > > > implementation > > > > which landed in bhyve in 2020, under the BHYVE_SNAPSHOT guard > > > > (https://reviews.freebsd.org/D19495). > > > > After that, we focused on two tracks: > > > > 1. adding various suspend/resume features (multiple device > > > > support - > > > > https://reviews.freebsd.org/D26387, CAPSICUM support - > > > > https://reviews.freebsd.org/D30471, having an uniform file format > > > > - > > > > at > > > > that time, during the bhyve bi-weekly calls, we concluded that > > > > the > > > > JSON format was the most suitable at that time - > > > > https://reviews.freebsd.org/D29262) so we can remove the #ifdef > > > > BHYVE_SNAPSHOT guard. > > > > 2. implementing the migration feature for bhyve. Since this one > > > > relies > > > > on the save/restore, but does not modify its behaviour, we > > > > considered > > > > we can go in parallel with both tracks. > > > > We had various presentations in the FreeBSD Community on these > > > > topics: > > > > AsiaBSDCon2018, AsiaBSDCon2019, BSDCan2019, BSDCan2020, > > > > AsiaBSDCon2023. > > > > > > > > The first patches for warm and live migration were opened in > > > > 2021: > > > > https://reviews.freebsd.org/D28270, > > > > https://reviews.freebsd.org/D30954. However, the general feedback > > > > on > > > > these was that the patches are too big to be reviewed, so we > > > > should > > > > split them in smaller chunks (this was also true for some of the > > > > suspend/resume improvements). Thus, we split them into smaller > > > > parts. > > > > Also, as things changed in bhyve (i.e., capsicum support for > > > > suspend/resume was added this year), we rebased and updated our > > > > reviews. > > > > > > > > Thank you, > > > > Elena > > > > > > > > > -- > Kind regards, > Corvin Thanks, Elena From nobody Mon Jul 17 17:08:27 2023 X-Original-To: freebsd-virtualization@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4R4T7K72fbz4nD7V for ; Mon, 17 Jul 2023 17:08:29 +0000 (UTC) (envelope-from rob.fx907@gmail.com) Received: from mail-oo1-xc2c.google.com (mail-oo1-xc2c.google.com [IPv6:2607:f8b0:4864:20::c2c]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4R4T7K5C9Pz4QLF; Mon, 17 Jul 2023 17:08:29 +0000 (UTC) (envelope-from rob.fx907@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-oo1-xc2c.google.com with SMTP id 006d021491bc7-5671db01ee0so377154eaf.1; Mon, 17 Jul 2023 10:08:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1689613709; x=1690218509; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=txDw3xnLH98/eBGl4m9nbROBGF0zxcs4iz9aVGqtzX4=; b=RXIGEyqT1smCGyIWIcUMH32nGpi4QpcBUzZY5szYnSAuP59hoYXrVdbQZe6+WnWORy /lTgLC0fCPE/Rp8zc+WreymmeLk0Dx05pb5U7N3kIQ8bj9NkBfGoBAfNnH8C4gy+4+bm V4FE4d02Xh3JPbjpDdXB1cwx4/E8LfijpC7QenIxb613gzkzzVCLVOS/RAVoGCEaR5zA a5dR5hOX3gTgnELUPJUO1xJ550GTXHgNpTx6MKdkr5BUcDidSCjk3SKkVrHOHK/8YcK5 Av6YCuOEEe+ivFS0b8TvtkEuDE+SU+9Rc+tpCAM8ImPZzcpr6O4baOKmCX7TnJFUaTuO 41ig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689613709; x=1690218509; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=txDw3xnLH98/eBGl4m9nbROBGF0zxcs4iz9aVGqtzX4=; b=aby7OnhcNR2UkZvVvIscIAXpn2nCSEFaNvXGpjqSmTTd0TupeTIaSIdrPGphLuC7Y1 UNd547n44sf1H6/NJPcIBhhF6mxZadGQ5ZuB5JyjPazNrx4J+rbS2XJGkzapz2JZC2fn n5vKFncYc/TZKovM9878IuYcXBg/Hz73dujc8ffv6+4QJxRUHlKRz7L1vAY7C+AUwQnu M1JVnhAWdLc3WrygfGI4cJrBY+CPbUv+ZX6OrgK89W3UxKZFdBnt0GKSjRCLGVzHaVVe Sb+Czsq8oj/d5ycZpHPvoefn+lxrckRN3H80ZFFQh3DvmGVrqxR7ZTL2FCXximVHzqXj RJCA== X-Gm-Message-State: ABy/qLaIWz2inuhiX6dukOmdbv5Nu97UH2eHDwJa4yXOV1POynUgixkN GkSCfJ7O13zOm2XQDZgJqVSRnswLSeM2IFgHRwo= X-Google-Smtp-Source: APBJJlGM/m8UMjiNSWjMrjgc0rDIPMgh7/nxwVb39Lb3e8SsA/5KAf1aERnuaa/Pp0+jBH2abw6S52HX8DUy+IejhSE= X-Received: by 2002:a05:6359:b96:b0:133:b86:cbe8 with SMTP id gf22-20020a0563590b9600b001330b86cbe8mr1547095rwb.1.1689613708546; Mon, 17 Jul 2023 10:08:28 -0700 (PDT) List-Id: Discussion List-Archive: https://lists.freebsd.org/archives/freebsd-virtualization List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-virtualization@freebsd.org X-BeenThere: freebsd-virtualization@freebsd.org MIME-Version: 1.0 Received: by 2002:a05:7000:b1c1:b0:4f3:bac5:bc52 with HTTP; Mon, 17 Jul 2023 10:08:27 -0700 (PDT) In-Reply-To: References: <3d7ee1f6ff98fe9aede5a85702b906fc3014b6b6.camel@FreeBSD.org> From: Rob Wing Date: Mon, 17 Jul 2023 09:08:27 -0800 Message-ID: Subject: Re: Warm and Live Migration Implementation for bhyve To: Elena Mihailescu Cc: =?UTF-8?Q?Corvin_K=C3=B6hne?= , "freebsd-virtualization@freebsd.org" , Mihai Carabas , Matthew Grooms Content-Type: multipart/alternative; boundary="000000000000e7d7220600b1d87d" X-Rspamd-Queue-Id: 4R4T7K5C9Pz4QLF X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; TAGGED_RCPT(0.00)[]; TAGGED_FROM(0.00)[] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N --000000000000e7d7220600b1d87d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I'm curious why the stream send bits are rolled into bhyve as opposed to using netcat/ssh to do the network transfer? sort of how one would do a zfs send/recv between hosts On Monday, July 17, 2023, Elena Mihailescu wrote: > Hi Corvin, > > On Mon, 3 Jul 2023 at 09:35, Corvin K=C3=B6hne wrot= e: > > > > On Tue, 2023-06-27 at 16:35 +0300, Elena Mihailescu wrote: > > > Hi Corvin, > > > > > > Thank you for the questions! I'll respond to them inline. > > > > > > On Mon, 26 Jun 2023 at 10:16, Corvin K=C3=B6hne > > > wrote: > > > > > > > > Hi Elena, > > > > > > > > thanks for posting this proposal here. > > > > > > > > Some open questions from my side: > > > > > > > > 1. How is the data send to the target? Does the host send a > > > > complete > > > > dump and the target parses it? Or does the target request data one > > > > by > > > > one und the host sends it as response? > > > > > > > It's not a dump of the guest's state, it's transmitted in steps. > > > However, some parts may be migrated as a chunk (e.g., the emulated > > > devices' state is transmitted as the buffer generated from the > > > snapshot functions). > > > > > > > How does the receiver know which chunk relates to which device? It > > would be nice if you can start bhyve on the receiver side without > > parameters e.g. `bhyve --receive=3D127.0.0.1:1234`. Therefore, the > > protocol has to carry some information about the device configuration. > > > > Regarding your first question, we send a chunk of data (a buffer) with > the state: we resume the data in the same order we saved it. It relies > on save/restore. We currently do not support migrating between > different versions of suspend&resume/migration. > > It would be nice to have something like `bhyve > --receive=3D127.0.0.1:1234`, but I don't think it is possible at this > point mainly because of the following two reasons: > - the guest image must be shared (e.g., via NFS) between the source > and destination hosts. If the mounting points differ between the two, > opening the disk at the destination will fail (also, we must suppose > that the user used an absolute path since a relative one won't work) > - if the VM uses a network adapter, we must specify the tap interface > on the destination host (e.g., if on the source host the VM uses > `tap0`, on the destination host, `tap0` may not exist or may be used > by other VMs). > > > > > > > I'll try to describe a bit the protocol we have implemented for > > > migration, maybe it can partially respond to the second and third > > > questions. > > > > > > The destination host waits for the source host to connect (through a > > > socket). > > > After that, the source sends its system specifications (hw_machine, > > > hw_model, hw_pagesize). If the source and destination hosts have > > > identical hardware configurations, the migration can take place. > > > > > > Then, if we have live migration, we migrate the memory in rounds > > > (i.e., we get a list of the pages that have the dirty bit set, send > > > it > > > to the destination to know what pages will be received, then send the > > > pages through the socket; this process is repeated until the last > > > round). > > > > > > Next, we stop the guest's vcpus, send the remaining memory (for live > > > migration) or the guest's memory from vmctx->baseaddr for warm > > > migration. Then, based on the suspend/resume feature, we get the > > > state > > > of the virtualized devices (the ones from the kernel space) and send > > > this buffer to the destination. We repeat this for the emulated > > > devices as well (the ones from the userspace). > > > > > > On the receiver host, we get the memory pages and set them to their > > > according position in the guest's memory, use the restore functions > > > for the state of the devices and start the guest's execution. > > > > > > Excluding the guest's memory transfer, the rest is based on the > > > suspend/resume feature. We snapshot the guest's state, but instead of > > > saving the data locally, we send it via network to the destination. > > > On > > > the destination host, we start a new virtual machine, but instead of > > > reading/getting the state from the disk (i.e., the snapshot files) we > > > get this state via the network from the source host. > > > > > > If the destination can properly resume the guest activity, it will > > > send an "OK" to the source host so it can destroy/remove the guest > > > from its end. > > > > > > Both warm and live migration are based on "cold migration". Cold > > > migration means we suspend the guest on the source host, and restore > > > the guest on the destination host from the snapshot files. Warm > > > migration only does this using a socket, while live migration changes > > > the way the memory is migrated. > > > > > > > 2. What happens if we add a new data section? > > > > > > > What are you referring to with a new data section? Is this question > > > related to the third one? If so, see my answer below. > > > > > > > 3. What happens if the bhyve version differs on host and target > > > > machine? > > > > > > The two hosts must be identical for migration, that's why we have the > > > part where we check the specifications between the two migration > > > hosts. They are expected to have the same version of bhyve and > > > FreeBSD. We will add an additional check in the check specs part to > > > see if we have the same FreeBSD build. > > > > > > As long as the changes in the virtual memory subsystem won't affect > > > bhyve (and how the virtual machine sees/uses the memory), the > > > migration constraints should only be related to suspend/resume. The > > > state of the virtual devices is handled by the snapshot system, so if > > > it is able to accommodate changes in the data structures, the > > > migration process will not be affected. > > > > > > Thank you, > > > Elena > > > > > > > > > > > > > > > -- > > > > Kind regards, > > > > Corvin > > > > > > > > On Fri, 2023-06-23 at 13:00 +0300, Elena Mihailescu wrote: > > > > > Hello, > > > > > > > > > > This mail presents the migration feature we have implemented for > > > > > bhyve. Any feedback from the community is much appreciated. > > > > > > > > > > We have opened a stack of reviews on Phabricator > > > > > (https://reviews.freebsd.org/D34717) that is meant to split the > > > > > code > > > > > in smaller parts so it can be more easily reviewed. A brief > > > > > history > > > > > of > > > > > the implementation can be found at the bottom of this email. > > > > > > > > > > The migration mechanism we propose needs two main components in > > > > > order > > > > > to move a virtual machine from one host to another: > > > > > 1. the guest's state (vCPUs, emulated and virtualized devices) > > > > > 2. the guest's memory > > > > > > > > > > For the first part, we rely on the suspend/resume feature. We > > > > > call > > > > > the > > > > > same functions as the ones used by suspend/resume, but instead of > > > > > saving the data in files, we send it via the network. > > > > > > > > > > The most time consuming aspect of migration is transmitting guest > > > > > memory. The UPB team has implemented two options to accomplish > > > > > this: > > > > > 1. Warm Migration: The guest execution is suspended on the source > > > > > host > > > > > while the memory is sent to the destination host. This method is > > > > > less > > > > > complex but may cause extended downtime. > > > > > 2. Live Migration: The guest continues to execute on the source > > > > > host > > > > > while the memory is transmitted to the destination host. This > > > > > method > > > > > is more complex but offers reduced downtime. > > > > > > > > > > The proposed live migration procedure (pre-copy live migration) > > > > > migrates the memory in rounds: > > > > > 1. In the initial round, we migrate all the guest memory (all > > > > > pages > > > > > that are allocated) > > > > > 2. In the subsequent rounds, we migrate only the pages that were > > > > > modified since the previous round started > > > > > 3. In the final round, we suspend the guest, migrate the > > > > > remaining > > > > > pages that were modified from the previous round and the guest's > > > > > internal state (vCPU, emulated and virtualized devices). > > > > > > > > > > To detect the pages that were modified between rounds, we propose > > > > > an > > > > > additional dirty bit (virtualization dirty bit) for each memory > > > > > page. > > > > > This bit would be set every time the page's dirty bit is set. > > > > > However, > > > > > this virtualization dirty bit is reset only when the page is > > > > > migrated. > > > > > > > > > > The proposed implementation is split in two parts: > > > > > 1. The first one, the warm migration, is just a wrapper on the > > > > > suspend/resume feature which, instead of saving the suspended > > > > > state > > > > > on > > > > > disk, sends it via the network to the destination > > > > > 2. The second part, the live migration, uses the layer previously > > > > > presented, but sends the guest's memory in rounds, as described > > > > > above. > > > > > > > > > > The migration process works as follows: > > > > > 1. we identify: > > > > > - VM_NAME - the name of the virtual machine which will be > > > > > migrated > > > > > - SRC_IP - the IP address of the source host > > > > > - DST_IP - the IP address of the destination host (default is > > > > > 24983) > > > > > - DST_PORT - the port we want to use for migration > > > > > 2. we start a virtual machine on the destination host that will > > > > > wait > > > > > for a migration. Here, we must specify SRC_IP (and the port we > > > > > want > > > > > to > > > > > open for migration, default is 24983). > > > > > e.g.: bhyve ... -R SRC_IP:24983 guest_vm_dst > > > > > 3. using bhyvectl on the source host, we start the migration > > > > > process. > > > > > e.g.: bhyvectl --migrate=3DDST_IP:24983 --vm=3Dguest_vm > > > > > > > > > > A full tutorial on this can be found here: > > > > > https://github.com/FreeBSD-UPB/freebsd-src/wiki/Virtual- > Machine-Migration-using-bhyve > > > > > > > > > > For sending the migration request to a virtual machine, we use > > > > > the > > > > > same thread/socket that is used for suspend. > > > > > For receiving a migration request, we used a similar approach to > > > > > the > > > > > resume process. > > > > > > > > > > As some of you may remember seeing similar emails from our part > > > > > on > > > > > the > > > > > freebsd-virtualization list, I'll present a brief history of this > > > > > project: > > > > > The first part of the project was the suspend/resume > > > > > implementation > > > > > which landed in bhyve in 2020, under the BHYVE_SNAPSHOT guard > > > > > (https://reviews.freebsd.org/D19495). > > > > > After that, we focused on two tracks: > > > > > 1. adding various suspend/resume features (multiple device > > > > > support - > > > > > https://reviews.freebsd.org/D26387, CAPSICUM support - > > > > > https://reviews.freebsd.org/D30471, having an uniform file format > > > > > - > > > > > at > > > > > that time, during the bhyve bi-weekly calls, we concluded that > > > > > the > > > > > JSON format was the most suitable at that time - > > > > > https://reviews.freebsd.org/D29262) so we can remove the #ifdef > > > > > BHYVE_SNAPSHOT guard. > > > > > 2. implementing the migration feature for bhyve. Since this one > > > > > relies > > > > > on the save/restore, but does not modify its behaviour, we > > > > > considered > > > > > we can go in parallel with both tracks. > > > > > We had various presentations in the FreeBSD Community on these > > > > > topics: > > > > > AsiaBSDCon2018, AsiaBSDCon2019, BSDCan2019, BSDCan2020, > > > > > AsiaBSDCon2023. > > > > > > > > > > The first patches for warm and live migration were opened in > > > > > 2021: > > > > > https://reviews.freebsd.org/D28270, > > > > > https://reviews.freebsd.org/D30954. However, the general feedback > > > > > on > > > > > these was that the patches are too big to be reviewed, so we > > > > > should > > > > > split them in smaller chunks (this was also true for some of the > > > > > suspend/resume improvements). Thus, we split them into smaller > > > > > parts. > > > > > Also, as things changed in bhyve (i.e., capsicum support for > > > > > suspend/resume was added this year), we rebased and updated our > > > > > reviews. > > > > > > > > > > Thank you, > > > > > Elena > > > > > > > > > > > > > -- > > Kind regards, > > Corvin > > Thanks, > Elena > > --000000000000e7d7220600b1d87d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I'm curious why the stream send bits are rolled into bhyve as opposed t= o using netcat/ssh to do the network transfer?

sort of h= ow one would do a zfs send/recv between hosts

On Monday, July 17, 20= 23, Elena Mihailescu <ele= namihailescu22@gmail.com> wrote:
H= i Corvin,

On Mon, 3 Jul 2023 at 09:35, Corvin K=C3=B6hne <corvink@freebsd.org> wrote:
>
> On Tue, 2023-06-27 at 16:35 +0300, Elena Mihailescu wrote:
> > Hi Corvin,
> >
> > Thank you for the questions! I'll respond to them inline.
> >
> > On Mon, 26 Jun 2023 at 10:16, Corvin K=C3=B6hne <corvink@freebsd.org>
> > wrote:
> > >
> > > Hi Elena,
> > >
> > > thanks for posting this proposal here.
> > >
> > > Some open questions from my side:
> > >
> > > 1. How is the data send to the target? Does the host send a<= br> > > > complete
> > > dump and the target parses it? Or does the target request da= ta one
> > > by
> > > one und the host sends it as response?
> > >
> > It's not a dump of the guest's state, it's transmitte= d in steps.
> > However, some parts may be migrated as a chunk (e.g., the emulate= d
> > devices' state is transmitted as the buffer generated from th= e
> > snapshot functions).
> >
>
> How does the receiver know which chunk relates to which device? It
> would be nice if you can start bhyve on the receiver side without
> parameters e.g. `bhyve --receive=3D127.0.0.1:1234`. Therefore, the
> protocol has to carry some information about the device configuration.=
>

Regarding your first question, we send a chunk of data (a buffer) with
the state: we resume the data in the same order we saved it. It relies
on save/restore. We currently do not support migrating between
different versions of suspend&resume/migration.

It would be nice to have something like `bhyve
--receive=3D127.0.0.1:1234`, but I don't think it is possible at this point mainly because of the following two reasons:
- the guest image must be shared (e.g., via NFS) between the source
and destination hosts. If the mounting points differ between the two,
opening the disk at the destination will fail (also, we must suppose
that the user used an absolute path since a relative one won't work) - if the VM uses a network adapter, we must specify the tap interface
on the destination host (e.g., if on the source host the VM uses
`tap0`, on the destination host, `tap0` may not exist or may be used
by other VMs).


>
> > I'll try to describe a bit the protocol we have implemented f= or
> > migration, maybe it can partially respond to the second and third=
> > questions.
> >
> > The destination host waits for the source host to connect (throug= h a
> > socket).
> > After that, the source sends its system specifications (hw_machin= e,
> > hw_model, hw_pagesize). If the source and destination hosts have<= br> > > identical hardware configurations, the migration can take place.<= br> > >
> > Then, if we have live migration, we migrate the memory in rounds<= br> > > (i.e., we get a list of the pages that have the dirty bit set, se= nd
> > it
> > to the destination to know what pages will be received, then send= the
> > pages through the socket; this process is repeated until the last=
> > round).
> >
> > Next, we stop the guest's vcpus, send the remaining memory (f= or live
> > migration) or the guest's memory from vmctx->baseaddr for = warm
> > migration. Then, based on the suspend/resume feature, we get the<= br> > > state
> > of the virtualized devices (the ones from the kernel space) and s= end
> > this buffer to the destination. We repeat this for the emulated > > devices as well (the ones from the userspace).
> >
> > On the receiver host, we get the memory pages and set them to the= ir
> > according position in the guest's memory, use the restore fun= ctions
> > for the state of the devices and start the guest's execution.=
> >
> > Excluding the guest's memory transfer, the rest is based on t= he
> > suspend/resume feature. We snapshot the guest's state, but in= stead of
> > saving the data locally, we send it via network to the destinatio= n.
> > On
> > the destination host, we start a new virtual machine, but instead= of
> > reading/getting the state from the disk (i.e., the snapshot files= ) we
> > get this state via the network from the source host.
> >
> > If the destination can properly resume the guest activity, it wil= l
> > send an "OK" to the source host so it can destroy/remov= e the guest
> > from its end.
> >
> > Both warm and live migration are based on "cold migration&qu= ot;. Cold
> > migration means we suspend the guest on the source host, and rest= ore
> > the guest on the destination host from the snapshot files. Warm > > migration only does this using a socket, while live migration cha= nges
> > the way the memory is migrated.
> >
> > > 2. What happens if we add a new data section?
> > >
> > What are you referring to with a new data section? Is this questi= on
> > related to the third one? If so, see my answer below.
> >
> > > 3. What happens if the bhyve version differs on host and tar= get
> > > machine?
> >
> > The two hosts must be identical for migration, that's why we = have the
> > part where we check the specifications between the two migration<= br> > > hosts. They are expected to have the same version of bhyve and > > FreeBSD. We will add an additional check in the check specs part = to
> > see if we have the same FreeBSD build.
> >
> > As long as the changes in the virtual memory subsystem won't = affect
> > bhyve (and how the virtual machine sees/uses the memory), the
> > migration constraints should only be related to suspend/resume. T= he
> > state of the virtual devices is handled by the snapshot system, s= o if
> > it is able to accommodate changes in the data structures, the
> > migration process will not be affected.
> >
> > Thank you,
> > Elena
> >
> > >
> > >
> > > --
> > > Kind regards,
> > > Corvin
> > >
> > > On Fri, 2023-06-23 at 13:00 +0300, Elena Mihailescu wrote: > > > > Hello,
> > > >
> > > > This mail presents the migration feature we have implem= ented for
> > > > bhyve. Any feedback from the community is much apprecia= ted.
> > > >
> > > > We have opened a stack of reviews on Phabricator
> > > > (https://reviews.freebsd.org/D34717) that is meant to s= plit the
> > > > code
> > > > in smaller parts so it can be more easily reviewed. A b= rief
> > > > history
> > > > of
> > > > the implementation can be found at the bottom of this e= mail.
> > > >
> > > > The migration mechanism we propose needs two main compo= nents in
> > > > order
> > > > to move a virtual machine from one host to another:
> > > > 1. the guest's state (vCPUs, emulated and virtualiz= ed devices)
> > > > 2. the guest's memory
> > > >
> > > > For the first part, we rely on the suspend/resume featu= re. We
> > > > call
> > > > the
> > > > same functions as the ones used by suspend/resume, but = instead of
> > > > saving the data in files, we send it via the network. > > > >
> > > > The most time consuming aspect of migration is transmit= ting guest
> > > > memory. The UPB team has implemented two options to acc= omplish
> > > > this:
> > > > 1. Warm Migration: The guest execution is suspended on = the source
> > > > host
> > > > while the memory is sent to the destination host. This = method is
> > > > less
> > > > complex but may cause extended downtime.
> > > > 2. Live Migration: The guest continues to execute on th= e source
> > > > host
> > > > while the memory is transmitted to the destination host= . This
> > > > method
> > > > is more complex but offers reduced downtime.
> > > >
> > > > The proposed live migration procedure (pre-copy live mi= gration)
> > > > migrates the memory in rounds:
> > > > 1. In the initial round, we migrate all the guest memor= y (all
> > > > pages
> > > > that are allocated)
> > > > 2. In the subsequent rounds, we migrate only the pages = that were
> > > > modified since the previous round started
> > > > 3. In the final round, we suspend the guest, migrate th= e
> > > > remaining
> > > > pages that were modified from the previous round and th= e guest's
> > > > internal state (vCPU, emulated and virtualized devices)= .
> > > >
> > > > To detect the pages that were modified between rounds, = we propose
> > > > an
> > > > additional dirty bit (virtualization dirty bit) for eac= h memory
> > > > page.
> > > > This bit would be set every time the page's dirty b= it is set.
> > > > However,
> > > > this virtualization dirty bit is reset only when the pa= ge is
> > > > migrated.
> > > >
> > > > The proposed implementation is split in two parts:
> > > > 1. The first one, the warm migration, is just a wrapper= on the
> > > > suspend/resume feature which, instead of saving the sus= pended
> > > > state
> > > > on
> > > > disk, sends it via the network to the destination
> > > > 2. The second part, the live migration, uses the layer = previously
> > > > presented, but sends the guest's memory in rounds, = as described
> > > > above.
> > > >
> > > > The migration process works as follows:
> > > > 1. we identify:
> > > >=C2=A0 - VM_NAME - the name of the virtual machine which= will be
> > > > migrated
> > > >=C2=A0 - SRC_IP - the IP address of the source host
> > > >=C2=A0 - DST_IP - the IP address of the destination host= (default is
> > > > 24983)
> > > >=C2=A0 - DST_PORT - the port we want to use for migratio= n
> > > > 2. we start a virtual machine on the destination host t= hat will
> > > > wait
> > > > for a migration. Here, we must specify SRC_IP (and the = port we
> > > > want
> > > > to
> > > > open for migration, default is 24983).
> > > > e.g.: bhyve ... -R SRC_IP:24983 guest_vm_dst
> > > > 3. using bhyvectl on the source host, we start the migr= ation
> > > > process.
> > > > e.g.: bhyvectl --migrate=3DDST_IP:24983 --vm=3Dguest_vm=
> > > >
> > > > A full tutorial on this can be found here:
> > > > https://github= .com/FreeBSD-UPB/freebsd-src/wiki/Virtual-Machine-Migration-using= -bhyve
> > > >
> > > > For sending the migration request to a virtual machine,= we use
> > > > the
> > > > same thread/socket that is used for suspend.
> > > > For receiving a migration request, we used a similar ap= proach to
> > > > the
> > > > resume process.
> > > >
> > > > As some of you may remember seeing similar emails from = our part
> > > > on
> > > > the
> > > > freebsd-virtualization list, I'll present a brief h= istory of this
> > > > project:
> > > > The first part of the project was the suspend/resume > > > > implementation
> > > > which landed in bhyve in 2020, under the BHYVE_SNAPSHOT= guard
> > > > (https://reviews.freebsd.org/D19495).
> > > > After that, we focused on two tracks:
> > > > 1. adding various suspend/resume features (multiple dev= ice
> > > > support -
> > > > https://reviews.freebsd.org/D26387, CAPSICUM support -=
> > > > https://reviews.freebsd.org/D30471, having an uniform = file format
> > > > -
> > > > at
> > > > that time, during the bhyve bi-weekly calls, we conclud= ed that
> > > > the
> > > > JSON format was the most suitable at that time -
> > > > https://reviews.freebsd.org/D29262) so we can remove t= he #ifdef
> > > > BHYVE_SNAPSHOT guard.
> > > > 2. implementing the migration feature for bhyve. Since = this one
> > > > relies
> > > > on the save/restore, but does not modify its behaviour,= we
> > > > considered
> > > > we can go in parallel with both tracks.
> > > > We had various presentations in the FreeBSD Community o= n these
> > > > topics:
> > > > AsiaBSDCon2018, AsiaBSDCon2019, BSDCan2019, BSDCan2020,=
> > > > AsiaBSDCon2023.
> > > >
> > > > The first patches for warm and live migration were open= ed in
> > > > 2021:
> > > > https://reviews.freebsd.org/D28270,
> > > > https://reviews.freebsd.org/D30954. However, the gener= al feedback
> > > > on
> > > > these was that the patches are too big to be reviewed, = so we
> > > > should
> > > > split them in smaller chunks (this was also true for so= me of the
> > > > suspend/resume improvements). Thus, we split them into = smaller
> > > > parts.
> > > > Also, as things changed in bhyve (i.e., capsicum suppor= t for
> > > > suspend/resume was added this year), we rebased and upd= ated our
> > > > reviews.
> > > >
> > > > Thank you,
> > > > Elena
> > > >
> > >
>
> --
> Kind regards,
> Corvin

Thanks,
Elena

--000000000000e7d7220600b1d87d-- From nobody Mon Jul 17 17:26:46 2023 X-Original-To: freebsd-virtualization@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4R4TY11x1Nz4nPKt for ; Mon, 17 Jul 2023 17:27:17 +0000 (UTC) (envelope-from elenamihailescu22@gmail.com) Received: from mail-lj1-x230.google.com (mail-lj1-x230.google.com [IPv6:2a00:1450:4864:20::230]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4R4TY03yw3z4WYM; Mon, 17 Jul 2023 17:27:16 +0000 (UTC) (envelope-from elenamihailescu22@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20221208 header.b=gniGaS9U; spf=pass (mx1.freebsd.org: domain of elenamihailescu22@gmail.com designates 2a00:1450:4864:20::230 as permitted sender) smtp.mailfrom=elenamihailescu22@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-lj1-x230.google.com with SMTP id 38308e7fff4ca-2b71ae5fa2fso69880961fa.0; Mon, 17 Jul 2023 10:27:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1689614835; x=1692206835; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=IKskgnL9Zhd7XyLuRb6Hl27Jf/FTYmnrw41GmpvpmW0=; b=gniGaS9UIIznKEBPAyqfwD4TtqWqlu5faU8Zrz3msAlWHPUONJbE6xIKhoyVq3YnkT aCA2iImnOy5w75deUpGJYBsXz0KHeM8b83x6JUeRwCwWNNcCZnoRgvi0W4dIsyD7zYmj RMxCT124x0TqBN+EpHoTc8aUkv+4VgHuQ6GwLg6V3gFmGsQoQSnwNDhbVbeW9to2M1XT Qyn8J0x3wf6lMdlQOn7s5gIPlkKu34gw6uL9ZQdDjgydzYZxTlkLj8ibeyQc+PzKsXZk T9IjDxo3fFSM3QOR+H+Lre4dzazFJw2BA1memu6d4cxuL4w2X8Vct8bgm1Wa5mhSb55m hG0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689614835; x=1692206835; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IKskgnL9Zhd7XyLuRb6Hl27Jf/FTYmnrw41GmpvpmW0=; b=crEVhg2WxztO3xguxFzmxkDF7wDBx++CswqC1pgqDZ0832kfxL93n5vct3vjjRHeFx zEPJCEa97UM5CdoRRdO0ukdL2TdqSp5O6qdmUDSf3Nv8TPpm+Psazdd0Fb5YLMXDPMWU X2b61lMA/9SuCJ2p6nNiZYj1jvxtjXDHzNGIxyreCx36r8mPzty2om6e9hTUkqzVptJ2 hFG2ZjbL2FJtflgfNUKl9B2RpOF6a57fL0pwpILRMkiUNUMI/p++pWqfpXsGwGkd/NgU EHdFpRrVgz2kfC1LvQbYwA5mE18upFHjAWWXJAPbLzn6VCeS+Z5egYAouR6wTnCIkqpJ m+Hw== X-Gm-Message-State: ABy/qLYNrWBSrBbMJEaL8Pfkn662fciEXdDW+m4dpmjaNzKcmyQ25hM9 PyMrdvwdeedYiEUaNWqkpHFvmHbhC0HRFAP5/acGS9jfE9M= X-Google-Smtp-Source: APBJJlEGi/SiyKy0FlaMbFtqDaKvItagZM2A1Ds/X5Svcmtsik3RIb66Eua/jdRtC1v+Mw9z7Vej/xa0V+6wOGRBSAM= X-Received: by 2002:a05:6512:110a:b0:4fd:c715:5667 with SMTP id l10-20020a056512110a00b004fdc7155667mr1270068lfg.20.1689614834247; Mon, 17 Jul 2023 10:27:14 -0700 (PDT) List-Id: Discussion List-Archive: https://lists.freebsd.org/archives/freebsd-virtualization List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-virtualization@freebsd.org X-BeenThere: freebsd-virtualization@freebsd.org MIME-Version: 1.0 References: <3d7ee1f6ff98fe9aede5a85702b906fc3014b6b6.camel@FreeBSD.org> In-Reply-To: From: Elena Mihailescu Date: Mon, 17 Jul 2023 19:26:46 +0200 Message-ID: Subject: Re: Warm and Live Migration Implementation for bhyve To: Rob Wing Cc: =?UTF-8?Q?Corvin_K=C3=B6hne?= , "freebsd-virtualization@freebsd.org" , Mihai Carabas , Matthew Grooms Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spamd-Result: default: False [-2.44 / 15.00]; SUSPICIOUS_RECIPS(1.50)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-0.999]; NEURAL_HAM_SHORT(-0.94)[-0.940]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36:c]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20221208]; MIME_GOOD(-0.10)[text/plain]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; FROM_HAS_DN(0.00)[]; MLMMJ_DEST(0.00)[freebsd-virtualization@freebsd.org]; TAGGED_RCPT(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::230:from]; ARC_NA(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; FREEMAIL_ENVFROM(0.00)[gmail.com]; RCPT_COUNT_FIVE(0.00)[5]; MID_RHS_MATCH_FROMTLD(0.00)[]; FREEMAIL_FROM(0.00)[gmail.com]; TO_DN_SOME(0.00)[]; FREEMAIL_CC(0.00)[freebsd.org,gmail.com,shrew.net]; DKIM_TRACE(0.00)[gmail.com:+]; FROM_EQ_ENVFROM(0.00)[]; FREEMAIL_TO(0.00)[gmail.com]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; TO_DN_EQ_ADDR_SOME(0.00)[] X-Rspamd-Queue-Id: 4R4TY03yw3z4WYM X-Spamd-Bar: -- X-ThisMailContainsUnwantedMimeParts: N Hi Rob, On Mon, 17 Jul 2023 at 19:08, Rob Wing wrote: > > I'm curious why the stream send bits are rolled into bhyve as opposed to = using netcat/ssh to do the network transfer? > > sort of how one would do a zfs send/recv between hosts Mainly because we need a bidirectional communication way between source and destination. If the destination cannot restore something, or the specification does not match, a response is sent to the source host to abandon the migration and continue running on the same host. Also, we thought using this we can easily add other enhancements to the migration (i.e., limit the migration rate to not overflow the network, or set a deadline for migration) or other improvements that may require a more complex communication between the source and destination. > > On Monday, July 17, 2023, Elena Mihailescu = wrote: >> >> Hi Corvin, >> >> On Mon, 3 Jul 2023 at 09:35, Corvin K=C3=B6hne wro= te: >> > >> > On Tue, 2023-06-27 at 16:35 +0300, Elena Mihailescu wrote: >> > > Hi Corvin, >> > > >> > > Thank you for the questions! I'll respond to them inline. >> > > >> > > On Mon, 26 Jun 2023 at 10:16, Corvin K=C3=B6hne >> > > wrote: >> > > > >> > > > Hi Elena, >> > > > >> > > > thanks for posting this proposal here. >> > > > >> > > > Some open questions from my side: >> > > > >> > > > 1. How is the data send to the target? Does the host send a >> > > > complete >> > > > dump and the target parses it? Or does the target request data one >> > > > by >> > > > one und the host sends it as response? >> > > > >> > > It's not a dump of the guest's state, it's transmitted in steps. >> > > However, some parts may be migrated as a chunk (e.g., the emulated >> > > devices' state is transmitted as the buffer generated from the >> > > snapshot functions). >> > > >> > >> > How does the receiver know which chunk relates to which device? It >> > would be nice if you can start bhyve on the receiver side without >> > parameters e.g. `bhyve --receive=3D127.0.0.1:1234`. Therefore, the >> > protocol has to carry some information about the device configuration. >> > >> >> Regarding your first question, we send a chunk of data (a buffer) with >> the state: we resume the data in the same order we saved it. It relies >> on save/restore. We currently do not support migrating between >> different versions of suspend&resume/migration. >> >> It would be nice to have something like `bhyve >> --receive=3D127.0.0.1:1234`, but I don't think it is possible at this >> point mainly because of the following two reasons: >> - the guest image must be shared (e.g., via NFS) between the source >> and destination hosts. If the mounting points differ between the two, >> opening the disk at the destination will fail (also, we must suppose >> that the user used an absolute path since a relative one won't work) >> - if the VM uses a network adapter, we must specify the tap interface >> on the destination host (e.g., if on the source host the VM uses >> `tap0`, on the destination host, `tap0` may not exist or may be used >> by other VMs). >> >> >> > >> > > I'll try to describe a bit the protocol we have implemented for >> > > migration, maybe it can partially respond to the second and third >> > > questions. >> > > >> > > The destination host waits for the source host to connect (through a >> > > socket). >> > > After that, the source sends its system specifications (hw_machine, >> > > hw_model, hw_pagesize). If the source and destination hosts have >> > > identical hardware configurations, the migration can take place. >> > > >> > > Then, if we have live migration, we migrate the memory in rounds >> > > (i.e., we get a list of the pages that have the dirty bit set, send >> > > it >> > > to the destination to know what pages will be received, then send th= e >> > > pages through the socket; this process is repeated until the last >> > > round). >> > > >> > > Next, we stop the guest's vcpus, send the remaining memory (for live >> > > migration) or the guest's memory from vmctx->baseaddr for warm >> > > migration. Then, based on the suspend/resume feature, we get the >> > > state >> > > of the virtualized devices (the ones from the kernel space) and send >> > > this buffer to the destination. We repeat this for the emulated >> > > devices as well (the ones from the userspace). >> > > >> > > On the receiver host, we get the memory pages and set them to their >> > > according position in the guest's memory, use the restore functions >> > > for the state of the devices and start the guest's execution. >> > > >> > > Excluding the guest's memory transfer, the rest is based on the >> > > suspend/resume feature. We snapshot the guest's state, but instead o= f >> > > saving the data locally, we send it via network to the destination. >> > > On >> > > the destination host, we start a new virtual machine, but instead of >> > > reading/getting the state from the disk (i.e., the snapshot files) w= e >> > > get this state via the network from the source host. >> > > >> > > If the destination can properly resume the guest activity, it will >> > > send an "OK" to the source host so it can destroy/remove the guest >> > > from its end. >> > > >> > > Both warm and live migration are based on "cold migration". Cold >> > > migration means we suspend the guest on the source host, and restore >> > > the guest on the destination host from the snapshot files. Warm >> > > migration only does this using a socket, while live migration change= s >> > > the way the memory is migrated. >> > > >> > > > 2. What happens if we add a new data section? >> > > > >> > > What are you referring to with a new data section? Is this question >> > > related to the third one? If so, see my answer below. >> > > >> > > > 3. What happens if the bhyve version differs on host and target >> > > > machine? >> > > >> > > The two hosts must be identical for migration, that's why we have th= e >> > > part where we check the specifications between the two migration >> > > hosts. They are expected to have the same version of bhyve and >> > > FreeBSD. We will add an additional check in the check specs part to >> > > see if we have the same FreeBSD build. >> > > >> > > As long as the changes in the virtual memory subsystem won't affect >> > > bhyve (and how the virtual machine sees/uses the memory), the >> > > migration constraints should only be related to suspend/resume. The >> > > state of the virtual devices is handled by the snapshot system, so i= f >> > > it is able to accommodate changes in the data structures, the >> > > migration process will not be affected. >> > > >> > > Thank you, >> > > Elena >> > > >> > > > >> > > > >> > > > -- >> > > > Kind regards, >> > > > Corvin >> > > > >> > > > On Fri, 2023-06-23 at 13:00 +0300, Elena Mihailescu wrote: >> > > > > Hello, >> > > > > >> > > > > This mail presents the migration feature we have implemented for >> > > > > bhyve. Any feedback from the community is much appreciated. >> > > > > >> > > > > We have opened a stack of reviews on Phabricator >> > > > > (https://reviews.freebsd.org/D34717) that is meant to split the >> > > > > code >> > > > > in smaller parts so it can be more easily reviewed. A brief >> > > > > history >> > > > > of >> > > > > the implementation can be found at the bottom of this email. >> > > > > >> > > > > The migration mechanism we propose needs two main components in >> > > > > order >> > > > > to move a virtual machine from one host to another: >> > > > > 1. the guest's state (vCPUs, emulated and virtualized devices) >> > > > > 2. the guest's memory >> > > > > >> > > > > For the first part, we rely on the suspend/resume feature. We >> > > > > call >> > > > > the >> > > > > same functions as the ones used by suspend/resume, but instead o= f >> > > > > saving the data in files, we send it via the network. >> > > > > >> > > > > The most time consuming aspect of migration is transmitting gues= t >> > > > > memory. The UPB team has implemented two options to accomplish >> > > > > this: >> > > > > 1. Warm Migration: The guest execution is suspended on the sourc= e >> > > > > host >> > > > > while the memory is sent to the destination host. This method is >> > > > > less >> > > > > complex but may cause extended downtime. >> > > > > 2. Live Migration: The guest continues to execute on the source >> > > > > host >> > > > > while the memory is transmitted to the destination host. This >> > > > > method >> > > > > is more complex but offers reduced downtime. >> > > > > >> > > > > The proposed live migration procedure (pre-copy live migration) >> > > > > migrates the memory in rounds: >> > > > > 1. In the initial round, we migrate all the guest memory (all >> > > > > pages >> > > > > that are allocated) >> > > > > 2. In the subsequent rounds, we migrate only the pages that were >> > > > > modified since the previous round started >> > > > > 3. In the final round, we suspend the guest, migrate the >> > > > > remaining >> > > > > pages that were modified from the previous round and the guest's >> > > > > internal state (vCPU, emulated and virtualized devices). >> > > > > >> > > > > To detect the pages that were modified between rounds, we propos= e >> > > > > an >> > > > > additional dirty bit (virtualization dirty bit) for each memory >> > > > > page. >> > > > > This bit would be set every time the page's dirty bit is set. >> > > > > However, >> > > > > this virtualization dirty bit is reset only when the page is >> > > > > migrated. >> > > > > >> > > > > The proposed implementation is split in two parts: >> > > > > 1. The first one, the warm migration, is just a wrapper on the >> > > > > suspend/resume feature which, instead of saving the suspended >> > > > > state >> > > > > on >> > > > > disk, sends it via the network to the destination >> > > > > 2. The second part, the live migration, uses the layer previousl= y >> > > > > presented, but sends the guest's memory in rounds, as described >> > > > > above. >> > > > > >> > > > > The migration process works as follows: >> > > > > 1. we identify: >> > > > > - VM_NAME - the name of the virtual machine which will be >> > > > > migrated >> > > > > - SRC_IP - the IP address of the source host >> > > > > - DST_IP - the IP address of the destination host (default is >> > > > > 24983) >> > > > > - DST_PORT - the port we want to use for migration >> > > > > 2. we start a virtual machine on the destination host that will >> > > > > wait >> > > > > for a migration. Here, we must specify SRC_IP (and the port we >> > > > > want >> > > > > to >> > > > > open for migration, default is 24983). >> > > > > e.g.: bhyve ... -R SRC_IP:24983 guest_vm_dst >> > > > > 3. using bhyvectl on the source host, we start the migration >> > > > > process. >> > > > > e.g.: bhyvectl --migrate=3DDST_IP:24983 --vm=3Dguest_vm >> > > > > >> > > > > A full tutorial on this can be found here: >> > > > > https://github.com/FreeBSD-UPB/freebsd-src/wiki/Virtual-Machine-= Migration-using-bhyve >> > > > > >> > > > > For sending the migration request to a virtual machine, we use >> > > > > the >> > > > > same thread/socket that is used for suspend. >> > > > > For receiving a migration request, we used a similar approach to >> > > > > the >> > > > > resume process. >> > > > > >> > > > > As some of you may remember seeing similar emails from our part >> > > > > on >> > > > > the >> > > > > freebsd-virtualization list, I'll present a brief history of thi= s >> > > > > project: >> > > > > The first part of the project was the suspend/resume >> > > > > implementation >> > > > > which landed in bhyve in 2020, under the BHYVE_SNAPSHOT guard >> > > > > (https://reviews.freebsd.org/D19495). >> > > > > After that, we focused on two tracks: >> > > > > 1. adding various suspend/resume features (multiple device >> > > > > support - >> > > > > https://reviews.freebsd.org/D26387, CAPSICUM support - >> > > > > https://reviews.freebsd.org/D30471, having an uniform file forma= t >> > > > > - >> > > > > at >> > > > > that time, during the bhyve bi-weekly calls, we concluded that >> > > > > the >> > > > > JSON format was the most suitable at that time - >> > > > > https://reviews.freebsd.org/D29262) so we can remove the #ifdef >> > > > > BHYVE_SNAPSHOT guard. >> > > > > 2. implementing the migration feature for bhyve. Since this one >> > > > > relies >> > > > > on the save/restore, but does not modify its behaviour, we >> > > > > considered >> > > > > we can go in parallel with both tracks. >> > > > > We had various presentations in the FreeBSD Community on these >> > > > > topics: >> > > > > AsiaBSDCon2018, AsiaBSDCon2019, BSDCan2019, BSDCan2020, >> > > > > AsiaBSDCon2023. >> > > > > >> > > > > The first patches for warm and live migration were opened in >> > > > > 2021: >> > > > > https://reviews.freebsd.org/D28270, >> > > > > https://reviews.freebsd.org/D30954. However, the general feedbac= k >> > > > > on >> > > > > these was that the patches are too big to be reviewed, so we >> > > > > should >> > > > > split them in smaller chunks (this was also true for some of the >> > > > > suspend/resume improvements). Thus, we split them into smaller >> > > > > parts. >> > > > > Also, as things changed in bhyve (i.e., capsicum support for >> > > > > suspend/resume was added this year), we rebased and updated our >> > > > > reviews. >> > > > > >> > > > > Thank you, >> > > > > Elena >> > > > > >> > > > >> > >> > -- >> > Kind regards, >> > Corvin >> >> Thanks, >> Elena >> Elena From nobody Tue Jul 18 03:24:17 2023 X-Original-To: virtualization@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4R4knt14zBz4nWZx for ; Tue, 18 Jul 2023 03:24:18 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4R4kns6w55z46Ck for ; Tue, 18 Jul 2023 03:24:17 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1689650658; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+k2dkDy98OAjKmfF1cd26zwkUhT3DnJDYauS+d8kWAs=; b=ebguYepOaNkh6EHEufOlu5ZbmYmqw/eHFgI0Ykvk3Kq3ZlJqNHRgofr066PIfEZnP0FRhS H1e1NDaP1DVXOjxcxI+shRvaWoHWaJyzSwq4jE7Q4I4UaHKjiR+g4iAl0JqmynUZ2yCfIn NVzCdFm77RQRzhefJ/nA40Bdy6Gm/jwqonKh3nouum3by7XcOBygGM/BBWy4l29whlycIf km1Hyqoq3pk4rkE2C/awui2chljQnX0XZl9RVXK+9OahSsiHDYHq8s9eQiBSixkzPb40qr y2dcJraXCA6AXZvKgVbFLYQm2DPDKHCTveBe1dEWFUNqOWiDJ0v35b65GsIUdg== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1689650658; a=rsa-sha256; cv=none; b=QExPXWpNEFzMMYEJ6sB8v/JWD8UpnTvejsc3KfVjjNn8NedtPVQ9RzAJgVB2SXJljF9Hsh pjfPu7r5gE3zK+/NevU9RnX5HdYOYORXRA+a1/WrdYZv05PYRfL5DgMScNrXuqon48D8j9 oAPfpscnxI/WFdZPZb5RNfyMgyFKLxH8ICplzWsInJS0cllgISBuMahB1Lpc6Ib5x2164E AkKI5fJUtCNMWoQlPjNcq9voDri/KDgrDYtRUbuHk2YLsAEuFXiBkNpD05Ij0EWnFqW5cF XzbiUwqr8DEakzO/s5r6867JKqUPgCrGFezhNS3Vle5RhFs8T04Q5OcW16erJw== Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4R4kns5z8nzqGl for ; Tue, 18 Jul 2023 03:24:17 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 36I3OHLF078681 for ; Tue, 18 Jul 2023 03:24:17 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 36I3OHlN078680 for virtualization@FreeBSD.org; Tue, 18 Jul 2023 03:24:17 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: virtualization@FreeBSD.org Subject: [Bug 243640] QEMU / KVM Q35 V4.X PCIe Virtual and Physical (Passthrough) Devices not detected Date: Tue, 18 Jul 2023 03:24:17 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 12.1-STABLE X-Bugzilla-Keywords: regression X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: drum@graphica.com.au X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: virtualization@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Discussion List-Archive: https://lists.freebsd.org/archives/freebsd-virtualization List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-virtualization@freebsd.org X-BeenThere: freebsd-virtualization@freebsd.org MIME-Version: 1.0 X-ThisMailContainsUnwantedMimeParts: N https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D243640 --- Comment #10 from John Hartley --- (In reply to John Hartley from comment #9) H Mina Galic, I have now done further testing with 13.2. You can get this up and running Q35 V4.2 & 6.2 but you need to ensure that = is is using specific OVMF (UEFI) firmwaare. My libvirt configuraton to achieve this: ... hvm /usr/share/OVMF/OVMF_CODE.fd /home/XXXX/Documents/current.dev.freebsd/OVMF_VARS.fd ... The default UEFI / OVMF libvirt configuration is: ... hvm ... This result in it loading a different OVMF version, which then fails UEFI b= oot. With the working boot e1000e (Intel 1GbE on PCIe bus) if found ok. So it looks like 13.2 has fixed the problem with PCIe Devices, but I have n= ot tested with real devices via PCIe passthrough... Cheers, John Hartley. --=20 You are receiving this mail because: You are the assignee for the bug.=