Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 26 Jun 2023 09:16:08 +0200
From:      Corvin =?ISO-8859-1?Q?K=F6hne?= <corvink@FreeBSD.org>
To:        Elena Mihailescu <elenamihailescu22@gmail.com>,  freebsd-virtualization@freebsd.org
Cc:        Mihai Carabas <mihai.carabas@gmail.com>, Matthew Grooms <mgrooms@shrew.net>
Subject:   Re: Warm and Live Migration Implementation for bhyve
Message-ID:  <3d7ee1f6ff98fe9aede5a85702b906fc3014b6b6.camel@FreeBSD.org>
In-Reply-To: <CAGOCPLhJrNrysBM1vc87vfkX5jZLCmnyfGf%2Bcv2wmHFF1UhC-w@mail.gmail.com>
References:   <CAGOCPLhJrNrysBM1vc87vfkX5jZLCmnyfGf%2Bcv2wmHFF1UhC-w@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--=-gt6pZqTTXiN27mLWVuKb
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi Elena,

thanks for posting this proposal here.

Some open questions from my side:

1. How is the data send to the target? Does the host send a complete
dump and the target parses it? Or does the target request data one by
one und the host sends it as response?

2. What happens if we add a new data section?

3. What happens if the bhyve version differs on host and target
machine?


--=20
Kind regards,
Corvin

On Fri, 2023-06-23 at 13:00 +0300, Elena Mihailescu wrote:
> Hello,
>=20
> This mail presents the migration feature we have implemented for
> bhyve. Any feedback from the community is much appreciated.
>=20
> We have opened a stack of reviews on Phabricator
> (https://reviews.freebsd.org/D34717) that is meant to split the code
> in smaller parts so it can be more easily reviewed. A brief history
> of
> the implementation can be found at the bottom of this email.
>=20
> The migration mechanism we propose needs two main components in order
> to move a virtual machine from one host to another:
> 1. the guest's state (vCPUs, emulated and virtualized devices)
> 2. the guest's memory
>=20
> For the first part, we rely on the suspend/resume feature. We call
> the
> same functions as the ones used by suspend/resume, but instead of
> saving the data in files, we send it via the network.
>=20
> The most time consuming aspect of migration is transmitting guest
> memory. The UPB team has implemented two options to accomplish this:
> 1. Warm Migration: The guest execution is suspended on the source
> host
> while the memory is sent to the destination host. This method is less
> complex but may cause extended downtime.
> 2. Live Migration: The guest continues to execute on the source host
> while the memory is transmitted to the destination host. This method
> is more complex but offers reduced downtime.
>=20
> The proposed live migration procedure (pre-copy live migration)
> migrates the memory in rounds:
> 1. In the initial round, we migrate all the guest memory (all pages
> that are allocated)
> 2. In the subsequent rounds, we migrate only the pages that were
> modified since the previous round started
> 3. In the final round, we suspend the guest, migrate the remaining
> pages that were modified from the previous round and the guest's
> internal state (vCPU, emulated and virtualized devices).
>=20
> To detect the pages that were modified between rounds, we propose an
> additional dirty bit (virtualization dirty bit) for each memory page.
> This bit would be set every time the page's dirty bit is set.
> However,
> this virtualization dirty bit is reset only when the page is
> migrated.
>=20
> The proposed implementation is split in two parts:
> 1. The first one, the warm migration, is just a wrapper on the
> suspend/resume feature which, instead of saving the suspended state
> on
> disk, sends it via the network to the destination
> 2. The second part, the live migration, uses the layer previously
> presented, but sends the guest's memory in rounds, as described
> above.
>=20
> The migration process works as follows:
> 1. we identify:
> =C2=A0- VM_NAME - the name of the virtual machine which will be migrated
> =C2=A0- SRC_IP - the IP address of the source host
> =C2=A0- DST_IP - the IP address of the destination host (default is 24983=
)
> =C2=A0- DST_PORT - the port we want to use for migration
> 2. we start a virtual machine on the destination host that will wait
> for a migration. Here, we must specify SRC_IP (and the port we want
> to
> open for migration, default is 24983).
> e.g.: bhyve ... -R SRC_IP:24983 guest_vm_dst
> 3. using bhyvectl on the source host, we start the migration process.
> e.g.: bhyvectl --migrate=3DDST_IP:24983 --vm=3Dguest_vm
>=20
> A full tutorial on this can be found here:
> https://github.com/FreeBSD-UPB/freebsd-src/wiki/Virtual-Machine-Migration=
-using-bhyve
>=20
> For sending the migration request to a virtual machine, we use the
> same thread/socket that is used for suspend.
> For receiving a migration request, we used a similar approach to the
> resume process.
>=20
> As some of you may remember seeing similar emails from our part on
> the
> freebsd-virtualization list, I'll present a brief history of this
> project:
> The first part of the project was the suspend/resume implementation
> which landed in bhyve in 2020, under the BHYVE_SNAPSHOT guard
> (https://reviews.freebsd.org/D19495).
> After that, we focused on two tracks:
> 1. adding various suspend/resume features (multiple device support -
> https://reviews.freebsd.org/D26387, CAPSICUM support -
> https://reviews.freebsd.org/D30471, having an uniform file format -
> at
> that time, during the bhyve bi-weekly calls, we concluded that the
> JSON format was the most suitable at that time -
> https://reviews.freebsd.org/D29262) so we can remove the #ifdef
> BHYVE_SNAPSHOT guard.
> 2. implementing the migration feature for bhyve. Since this one
> relies
> on the save/restore, but does not modify its behaviour, we considered
> we can go in parallel with both tracks.
> We had various presentations in the FreeBSD Community on these
> topics:
> AsiaBSDCon2018, AsiaBSDCon2019, BSDCan2019, BSDCan2020,
> AsiaBSDCon2023.
>=20
> The first patches for warm and live migration were opened in 2021:
> https://reviews.freebsd.org/D28270,
> https://reviews.freebsd.org/D30954. However, the general feedback on
> these was that the patches are too big to be reviewed, so we should
> split them in smaller chunks (this was also true for some of the
> suspend/resume improvements). Thus, we split them into smaller parts.
> Also, as things changed in bhyve (i.e., capsicum support for
> suspend/resume was added this year), we rebased and updated our
> reviews.
>=20
> Thank you,
> Elena
>=20


--=-gt6pZqTTXiN27mLWVuKb
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEEgvRSla3m2t/H2U9G2FTaVjFeAmoFAmSZOzgACgkQ2FTaVjFe
Amp7ChAAhVnqrbTjXV4R9N+UjDVvfGZCBMuV4lekBKi/N1dj8dr9P0EaMmMgQGcq
WtXyMFo0BsC+GKhmQe83Go5EnrdpowHVgOtqPP/9WlsKetmIBU0dCtYoPuQeUKek
mOOr95yfOrv8HH2aKXL5MF7jml2OV0WsafJk7Im5NWmvWAxoDfyigxgtkMj3EV/5
hHwgT5/SDppyPmTyVP5XGYZjfsuJmOr3LM2smzcwcTfz4LZCJSs7WEFk63ZdNuYF
5QY5cQYQQrec976Fomrbc6KHGEZcNqFU/b6QfkZ9Cb5QEYxh93AAKcbqkiL34kiO
izdAfXclZzVy/6qCvXb88FUUj1+oc4QjbAvzsZR3AoJZBBWwFHL4gXc1hv9CFfPC
or3+zu2HwIaw4Dove6EtA8UMXHPVjCTeJf45JRt4r5UYvh/4gG7obysf99DFTpE7
GiVnsVoxNi7o5/0Pqbi8WTQ/aWrdRWA7XrYpmwJQohIGynqNAdFeZ3H/xJjvA09R
EXylKJd5ST0BXE5jQOFJFapNgs6rOsRRhtXMVI+m6VcjIVNYPzCHMw2kKqy2IPrD
3fOWOdHZPsu4aG5wbaKgRrN530Kq+iDPBqC1GkH3iQ0ls2m7pW2PS0rfXuLwTkLp
XfKji9Wn02iiXaY4bhyCxGIxz5wwfMp9JOeUlm6Vb3PGiLgmkrg=
=sMQq
-----END PGP SIGNATURE-----

--=-gt6pZqTTXiN27mLWVuKb--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3d7ee1f6ff98fe9aede5a85702b906fc3014b6b6.camel>