From nobody Fri Jun 23 10:00:32 2023 X-Original-To: freebsd-virtualization@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4QnXnB5Zzsz4hK4s for ; Fri, 23 Jun 2023 10:01:02 +0000 (UTC) (envelope-from elenamihailescu22@gmail.com) Received: from mail-ed1-x533.google.com (mail-ed1-x533.google.com [IPv6:2a00:1450:4864:20::533]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4QnXn94hqtz3H4R for ; Fri, 23 Jun 2023 10:01:01 +0000 (UTC) (envelope-from elenamihailescu22@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20221208 header.b=Yciu3KGG; spf=pass (mx1.freebsd.org: domain of elenamihailescu22@gmail.com designates 2a00:1450:4864:20::533 as permitted sender) smtp.mailfrom=elenamihailescu22@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ed1-x533.google.com with SMTP id 4fb4d7f45d1cf-51be840891dso440941a12.0 for ; Fri, 23 Jun 2023 03:01:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687514460; x=1690106460; h=cc:to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=j2yuCZyn+NHTySxj/OGGMRM/Jvi5YX8aMCeZsD2pA8I=; b=Yciu3KGGL5MxJTyxNIZZkaAFX6KOsZ5inK3GAKWw2vDMd3WbB6HwfTlyW3DbzcrOgp KhqxAblQtnXgHKnmVuKOc4oyHYlHnU2SC2FsLYLBoMpEbZqS9VZ8bmnrOJzpj9MNA1YT Ourje2OtTPrQCaYXTPYnzD8KJ3DjnAyBX91cXqF4hptXsP/yagU3/5FkZUscI4yX5Whu a599kLbZWMns5cWgfqhCksDND2AMbyR22TdsfBJg0ydUt3YtueJtupyv28gnoz1A3+sM lll1a0VWCVTixIuhfhD7rCU9ND34+OfSMKNhfQ4gWzqK2O7JLWSOzg10I2Wy5HrchBVC v/6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687514460; x=1690106460; h=cc:to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=j2yuCZyn+NHTySxj/OGGMRM/Jvi5YX8aMCeZsD2pA8I=; b=bqg+E+O+RhvedzD2P6SLVQsJnKZEdrKsWKWveBgt+qYMu93b2jaZlGjyBPFbdWyTln PwfBaQNsSOxJrdCVoHvu6VaHFyMRGx8ghwv4leU6lNJE6jW5DTdWJn35NRkBaucaVu3u wWLzfw+jnyTGX/BufuqhnRXCs0j3azbIZ16pzS2EwBfsjf4LvI6n4FRO5wUFzhT+GcNQ +72s30618S9FL6CQ+V54bSTCdjKmufpD0vhlHe90+1S6Vkltt9q3JsDmHNuMhvtRCKTb 6GfY94TtoQ5jpa46gVIEO+KLSx7x8mG9GrmXgi/mx0gF/aNgCdf7YJCdRsSboDgu19Sf wfcQ== X-Gm-Message-State: AC+VfDzM2KZwA+PVRnO74iR5xXFkagRO8ZYU+9bXmhwz+sFdWf4bBYYf pMmG+aV7SYBdpM9/wYaDC+iNHViUKawG2G8zAYw7Yfay X-Google-Smtp-Source: ACHHUZ6C9G1b+4Yb6wz9+HRVxaFooKLYAQzkrLdmbLzlpvz+zXiNhgtTrbcEPa5wdTZ1+fvYNSePTWZYsL6aT5V6oqA= X-Received: by 2002:aa7:d951:0:b0:51a:53e8:f579 with SMTP id l17-20020aa7d951000000b0051a53e8f579mr9523335eds.0.1687514459559; Fri, 23 Jun 2023 03:00:59 -0700 (PDT) List-Id: Discussion List-Archive: https://lists.freebsd.org/archives/freebsd-virtualization List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-virtualization@freebsd.org X-BeenThere: freebsd-virtualization@freebsd.org MIME-Version: 1.0 From: Elena Mihailescu Date: Fri, 23 Jun 2023 13:00:32 +0300 Message-ID: Subject: Warm and Live Migration Implementation for bhyve To: freebsd-virtualization@freebsd.org Cc: Mihai Carabas , Matthew Grooms Content-Type: text/plain; charset="UTF-8" X-Spamd-Result: default: False [-2.03 / 15.00]; NEURAL_HAM_LONG(-1.00)[-0.998]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; NEURAL_HAM_MEDIUM(-0.32)[-0.322]; NEURAL_SPAM_SHORT(0.29)[0.293]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20221208]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36:c]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-virtualization@freebsd.org]; RCPT_COUNT_THREE(0.00)[3]; FROM_HAS_DN(0.00)[]; MLMMJ_DEST(0.00)[freebsd-virtualization@freebsd.org]; TAGGED_RCPT(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::533:from]; TO_DN_SOME(0.00)[]; ARC_NA(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; FROM_EQ_ENVFROM(0.00)[]; MID_RHS_MATCH_FROMTLD(0.00)[]; FREEMAIL_FROM(0.00)[gmail.com]; RCVD_COUNT_TWO(0.00)[2]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; FREEMAIL_ENVFROM(0.00)[gmail.com]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; RCVD_TLS_LAST(0.00)[]; FREEMAIL_CC(0.00)[gmail.com,shrew.net] X-Rspamd-Queue-Id: 4QnXn94hqtz3H4R X-Spamd-Bar: -- X-ThisMailContainsUnwantedMimeParts: N Hello, This mail presents the migration feature we have implemented for bhyve. Any feedback from the community is much appreciated. We have opened a stack of reviews on Phabricator (https://reviews.freebsd.org/D34717) that is meant to split the code in smaller parts so it can be more easily reviewed. A brief history of the implementation can be found at the bottom of this email. The migration mechanism we propose needs two main components in order to move a virtual machine from one host to another: 1. the guest's state (vCPUs, emulated and virtualized devices) 2. the guest's memory For the first part, we rely on the suspend/resume feature. We call the same functions as the ones used by suspend/resume, but instead of saving the data in files, we send it via the network. The most time consuming aspect of migration is transmitting guest memory. The UPB team has implemented two options to accomplish this: 1. Warm Migration: The guest execution is suspended on the source host while the memory is sent to the destination host. This method is less complex but may cause extended downtime. 2. Live Migration: The guest continues to execute on the source host while the memory is transmitted to the destination host. This method is more complex but offers reduced downtime. The proposed live migration procedure (pre-copy live migration) migrates the memory in rounds: 1. In the initial round, we migrate all the guest memory (all pages that are allocated) 2. In the subsequent rounds, we migrate only the pages that were modified since the previous round started 3. In the final round, we suspend the guest, migrate the remaining pages that were modified from the previous round and the guest's internal state (vCPU, emulated and virtualized devices). To detect the pages that were modified between rounds, we propose an additional dirty bit (virtualization dirty bit) for each memory page. This bit would be set every time the page's dirty bit is set. However, this virtualization dirty bit is reset only when the page is migrated. The proposed implementation is split in two parts: 1. The first one, the warm migration, is just a wrapper on the suspend/resume feature which, instead of saving the suspended state on disk, sends it via the network to the destination 2. The second part, the live migration, uses the layer previously presented, but sends the guest's memory in rounds, as described above. The migration process works as follows: 1. we identify: - VM_NAME - the name of the virtual machine which will be migrated - SRC_IP - the IP address of the source host - DST_IP - the IP address of the destination host (default is 24983) - DST_PORT - the port we want to use for migration 2. we start a virtual machine on the destination host that will wait for a migration. Here, we must specify SRC_IP (and the port we want to open for migration, default is 24983). e.g.: bhyve ... -R SRC_IP:24983 guest_vm_dst 3. using bhyvectl on the source host, we start the migration process. e.g.: bhyvectl --migrate=DST_IP:24983 --vm=guest_vm A full tutorial on this can be found here: https://github.com/FreeBSD-UPB/freebsd-src/wiki/Virtual-Machine-Migration-using-bhyve For sending the migration request to a virtual machine, we use the same thread/socket that is used for suspend. For receiving a migration request, we used a similar approach to the resume process. As some of you may remember seeing similar emails from our part on the freebsd-virtualization list, I'll present a brief history of this project: The first part of the project was the suspend/resume implementation which landed in bhyve in 2020, under the BHYVE_SNAPSHOT guard (https://reviews.freebsd.org/D19495). After that, we focused on two tracks: 1. adding various suspend/resume features (multiple device support - https://reviews.freebsd.org/D26387, CAPSICUM support - https://reviews.freebsd.org/D30471, having an uniform file format - at that time, during the bhyve bi-weekly calls, we concluded that the JSON format was the most suitable at that time - https://reviews.freebsd.org/D29262) so we can remove the #ifdef BHYVE_SNAPSHOT guard. 2. implementing the migration feature for bhyve. Since this one relies on the save/restore, but does not modify its behaviour, we considered we can go in parallel with both tracks. We had various presentations in the FreeBSD Community on these topics: AsiaBSDCon2018, AsiaBSDCon2019, BSDCan2019, BSDCan2020, AsiaBSDCon2023. The first patches for warm and live migration were opened in 2021: https://reviews.freebsd.org/D28270, https://reviews.freebsd.org/D30954. However, the general feedback on these was that the patches are too big to be reviewed, so we should split them in smaller chunks (this was also true for some of the suspend/resume improvements). Thus, we split them into smaller parts. Also, as things changed in bhyve (i.e., capsicum support for suspend/resume was added this year), we rebased and updated our reviews. Thank you, Elena