Date: Mon, 26 Jun 2023 09:16:08 +0200 From: Corvin =?ISO-8859-1?Q?K=F6hne?= <corvink@FreeBSD.org> To: Elena Mihailescu <elenamihailescu22@gmail.com>, freebsd-virtualization@freebsd.org Cc: Mihai Carabas <mihai.carabas@gmail.com>, Matthew Grooms <mgrooms@shrew.net> Subject: Re: Warm and Live Migration Implementation for bhyve Message-ID: <3d7ee1f6ff98fe9aede5a85702b906fc3014b6b6.camel@FreeBSD.org> In-Reply-To: <CAGOCPLhJrNrysBM1vc87vfkX5jZLCmnyfGf%2Bcv2wmHFF1UhC-w@mail.gmail.com> References: <CAGOCPLhJrNrysBM1vc87vfkX5jZLCmnyfGf%2Bcv2wmHFF1UhC-w@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--=-gt6pZqTTXiN27mLWVuKb Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Elena, thanks for posting this proposal here. Some open questions from my side: 1. How is the data send to the target? Does the host send a complete dump and the target parses it? Or does the target request data one by one und the host sends it as response? 2. What happens if we add a new data section? 3. What happens if the bhyve version differs on host and target machine? --=20 Kind regards, Corvin On Fri, 2023-06-23 at 13:00 +0300, Elena Mihailescu wrote: > Hello, >=20 > This mail presents the migration feature we have implemented for > bhyve. Any feedback from the community is much appreciated. >=20 > We have opened a stack of reviews on Phabricator > (https://reviews.freebsd.org/D34717) that is meant to split the code > in smaller parts so it can be more easily reviewed. A brief history > of > the implementation can be found at the bottom of this email. >=20 > The migration mechanism we propose needs two main components in order > to move a virtual machine from one host to another: > 1. the guest's state (vCPUs, emulated and virtualized devices) > 2. the guest's memory >=20 > For the first part, we rely on the suspend/resume feature. We call > the > same functions as the ones used by suspend/resume, but instead of > saving the data in files, we send it via the network. >=20 > The most time consuming aspect of migration is transmitting guest > memory. The UPB team has implemented two options to accomplish this: > 1. Warm Migration: The guest execution is suspended on the source > host > while the memory is sent to the destination host. This method is less > complex but may cause extended downtime. > 2. Live Migration: The guest continues to execute on the source host > while the memory is transmitted to the destination host. This method > is more complex but offers reduced downtime. >=20 > The proposed live migration procedure (pre-copy live migration) > migrates the memory in rounds: > 1. In the initial round, we migrate all the guest memory (all pages > that are allocated) > 2. In the subsequent rounds, we migrate only the pages that were > modified since the previous round started > 3. In the final round, we suspend the guest, migrate the remaining > pages that were modified from the previous round and the guest's > internal state (vCPU, emulated and virtualized devices). >=20 > To detect the pages that were modified between rounds, we propose an > additional dirty bit (virtualization dirty bit) for each memory page. > This bit would be set every time the page's dirty bit is set. > However, > this virtualization dirty bit is reset only when the page is > migrated. >=20 > The proposed implementation is split in two parts: > 1. The first one, the warm migration, is just a wrapper on the > suspend/resume feature which, instead of saving the suspended state > on > disk, sends it via the network to the destination > 2. The second part, the live migration, uses the layer previously > presented, but sends the guest's memory in rounds, as described > above. >=20 > The migration process works as follows: > 1. we identify: > =C2=A0- VM_NAME - the name of the virtual machine which will be migrated > =C2=A0- SRC_IP - the IP address of the source host > =C2=A0- DST_IP - the IP address of the destination host (default is 24983= ) > =C2=A0- DST_PORT - the port we want to use for migration > 2. we start a virtual machine on the destination host that will wait > for a migration. Here, we must specify SRC_IP (and the port we want > to > open for migration, default is 24983). > e.g.: bhyve ... -R SRC_IP:24983 guest_vm_dst > 3. using bhyvectl on the source host, we start the migration process. > e.g.: bhyvectl --migrate=3DDST_IP:24983 --vm=3Dguest_vm >=20 > A full tutorial on this can be found here: > https://github.com/FreeBSD-UPB/freebsd-src/wiki/Virtual-Machine-Migration= -using-bhyve >=20 > For sending the migration request to a virtual machine, we use the > same thread/socket that is used for suspend. > For receiving a migration request, we used a similar approach to the > resume process. >=20 > As some of you may remember seeing similar emails from our part on > the > freebsd-virtualization list, I'll present a brief history of this > project: > The first part of the project was the suspend/resume implementation > which landed in bhyve in 2020, under the BHYVE_SNAPSHOT guard > (https://reviews.freebsd.org/D19495). > After that, we focused on two tracks: > 1. adding various suspend/resume features (multiple device support - > https://reviews.freebsd.org/D26387, CAPSICUM support - > https://reviews.freebsd.org/D30471, having an uniform file format - > at > that time, during the bhyve bi-weekly calls, we concluded that the > JSON format was the most suitable at that time - > https://reviews.freebsd.org/D29262) so we can remove the #ifdef > BHYVE_SNAPSHOT guard. > 2. implementing the migration feature for bhyve. Since this one > relies > on the save/restore, but does not modify its behaviour, we considered > we can go in parallel with both tracks. > We had various presentations in the FreeBSD Community on these > topics: > AsiaBSDCon2018, AsiaBSDCon2019, BSDCan2019, BSDCan2020, > AsiaBSDCon2023. >=20 > The first patches for warm and live migration were opened in 2021: > https://reviews.freebsd.org/D28270, > https://reviews.freebsd.org/D30954. However, the general feedback on > these was that the patches are too big to be reviewed, so we should > split them in smaller chunks (this was also true for some of the > suspend/resume improvements). Thus, we split them into smaller parts. > Also, as things changed in bhyve (i.e., capsicum support for > suspend/resume was added this year), we rebased and updated our > reviews. >=20 > Thank you, > Elena >=20 --=-gt6pZqTTXiN27mLWVuKb Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEgvRSla3m2t/H2U9G2FTaVjFeAmoFAmSZOzgACgkQ2FTaVjFe Amp7ChAAhVnqrbTjXV4R9N+UjDVvfGZCBMuV4lekBKi/N1dj8dr9P0EaMmMgQGcq WtXyMFo0BsC+GKhmQe83Go5EnrdpowHVgOtqPP/9WlsKetmIBU0dCtYoPuQeUKek mOOr95yfOrv8HH2aKXL5MF7jml2OV0WsafJk7Im5NWmvWAxoDfyigxgtkMj3EV/5 hHwgT5/SDppyPmTyVP5XGYZjfsuJmOr3LM2smzcwcTfz4LZCJSs7WEFk63ZdNuYF 5QY5cQYQQrec976Fomrbc6KHGEZcNqFU/b6QfkZ9Cb5QEYxh93AAKcbqkiL34kiO izdAfXclZzVy/6qCvXb88FUUj1+oc4QjbAvzsZR3AoJZBBWwFHL4gXc1hv9CFfPC or3+zu2HwIaw4Dove6EtA8UMXHPVjCTeJf45JRt4r5UYvh/4gG7obysf99DFTpE7 GiVnsVoxNi7o5/0Pqbi8WTQ/aWrdRWA7XrYpmwJQohIGynqNAdFeZ3H/xJjvA09R EXylKJd5ST0BXE5jQOFJFapNgs6rOsRRhtXMVI+m6VcjIVNYPzCHMw2kKqy2IPrD 3fOWOdHZPsu4aG5wbaKgRrN530Kq+iDPBqC1GkH3iQ0ls2m7pW2PS0rfXuLwTkLp XfKji9Wn02iiXaY4bhyCxGIxz5wwfMp9JOeUlm6Vb3PGiLgmkrg= =sMQq -----END PGP SIGNATURE----- --=-gt6pZqTTXiN27mLWVuKb--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3d7ee1f6ff98fe9aede5a85702b906fc3014b6b6.camel>