From owner-freebsd-amd64@freebsd.org Mon Jul 30 11:56:59 2018 Return-Path: Delivered-To: freebsd-amd64@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 51E971053CA1; Mon, 30 Jul 2018 11:56:59 +0000 (UTC) (envelope-from elenamihailescu22@gmail.com) Received: from mail-oi0-x22e.google.com (mail-oi0-x22e.google.com [IPv6:2607:f8b0:4003:c06::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D8F6D95C00; Mon, 30 Jul 2018 11:56:58 +0000 (UTC) (envelope-from elenamihailescu22@gmail.com) Received: by mail-oi0-x22e.google.com with SMTP id q11-v6so20658373oic.12; Mon, 30 Jul 2018 04:56:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to:cc; bh=64uKS1lktvxfxsm1mHU7il6rxsNopLY9JC2V9LQz0Aw=; b=gw63Xk55yZ6BRFLEFfX1oFIaeQOMirfZEH3sp6JTVX7fZAzXJhVhzmBBFs8OHDHRNE 8S0X48T/mL+leXs8dT78IoCEdwzUeOPLlfHyYoiGwZlvDe2lM4ywO9zpPOjtps4vSM83 4FQyLy74rukQmtw3nSQKWfFNBSTIdEGm7FHpEDjHieTdgw9/indjnX9tyZMsPCirkJmi hNre5xAWFRPUDRxC92AEays9TKjBwfX+1IWy18oMD5n/j7xL1birEyrHatALWVrZlF1T KuNfcFeLyhJmXwI4ILZ6tFbOpIBRUXceVHYw+85qDja7XYgHscdFnO+7/t+xq0JH+/bl K7QA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=64uKS1lktvxfxsm1mHU7il6rxsNopLY9JC2V9LQz0Aw=; b=smvKzCglQsoEprIHGNoRCnFGtUOqsT1CDmyHHAqhSbgFlna1G4ThiE60xFWW+xDrCb 9pBTEPvbFRfHxT2WJLBVvRSYF1ruG5mNDmL6030P27XxmhNIswxRBFwbxPNTEme5noll c9SC+PNbBXgY1KOMwzfTKm1I0h6Q5JJcfRxNwr6jAlzR5xsUCFlyKIgeYfjHeDoxn+J/ M9bUraB6V0VnRGjbvP6OduSVWebOP06PYkyy3htIb0O3jjbvPPPziwk88VjY9qtseKmk ctOmSplaivA3INCBTxxS6wWirhlaSPYwhpe9JsIP+qdec5ce8lHRFrE3gjLxnJjV8wMu 8PNw== X-Gm-Message-State: AOUpUlENWPVgcjhck5s/BQt9HXxAYm1im4zZ7Gta4gLavJkFwlu9DN2u Vb+gPTQQ2qAvuhtXBry1P3QPnFkxZkVgpkUMd20CUw== X-Google-Smtp-Source: AAOMgpdA/mA45z9B5XuQm7vzaxGDJze9fOt9240GW3y1oiGBlamLbOrRihU4Ogpd9AtlxIPIjfPl+dd37Oj8h3HAnQU= X-Received: by 2002:aca:ed45:: with SMTP id l66-v6mr18550434oih.40.1532951817444; Mon, 30 Jul 2018 04:56:57 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a9d:3c4e:0:0:0:0:0 with HTTP; Mon, 30 Jul 2018 04:56:27 -0700 (PDT) From: Elena Mihailescu Date: Mon, 30 Jul 2018 14:56:27 +0300 Message-ID: Subject: bhyve internals related to FreeBSD memory mechanisms To: freebsd-virtualization@freebsd.org, freebsd-amd64@freebsd.org, freebsd-hackers@freebsd.org Cc: Mihai Carabas , Darius Mihai Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Jul 2018 11:56:59 -0000 Hello, I'll start by giving some context before asking my questions. Currently, we are trying to implement a live migration feature for bhyve. In order to do that, we want to mark the guest memory copy-on-write. As we've previously discussed this problem on the freebsd-virtualization list [1], the vm_entry structure that contains the guest memory needs to have the MAP_ENTRY_COW and MAP_ENTRY_NEEDS_COPY flags set. The way the migration mechanism will work in "rounds" is described bellow (it is a pre-copy live migration technique): - in the first round, after starting the procedure and checking the compatibility of the two systems, the entire guest memory will be sent at the destination - in the second round, after the transfer is completed, only the differences (pages that were written/dirtied since the first round started) will be sent to the destination ... - in the n-th round, only the differences between this round and the round (n - 1) will be sent. Before the last round, the guest will be stopped, and the remaining memory will be sent along with the CPU state. We need the COW mechanism to determine changes between rounds. As for the number of rounds, there could be maximum 10 rounds. the value is set without running any benchmarks to limit potential overhead). The number of rounds will be decided later based on the test results. The number of new objects on the source will be also not very high since the number of steps is limited. After the migration process is completed, the guest from the source system will be destroyed and those new created object will be no longer needed (will be discarded). ---- As we could see by inspecting and debugging the bhyve code, the object (currently, we are using a 512MB bhyve guest, for more physical memory assigned to the virtual machine, there could be more objects) that describes the guest memory, is pointed to by two vm_entry structures from two different vmspace structures: - the first one is the vmspace that describes host's virtual memory - the second one is a separated vmspace structure created by the hypervisor when creating the virtual machine. I have several questions about the FreeBSD's memory management and bhyve's internals because I wasn't able to determine this yet by myself. The first one is that if someone knows whether that object that describes the guest memory is contained only by the two vm_entry structures, or whether it is contained by other entries. We could not find if it so or not. As far as I could tell, the COW can be set only for vm_entry structures. Is there a way to set as copy-on-write just certain pages or maybe just the object and not its vm_entry structure? I want to know if there is a more fine grain mechanism to set just parts of the memory as COW. We need a finer granularity when setting pages as copy-on-write because we encountered some issues: - virtio mechanisms are working by having a shared memory region between host and guest and while transferring the guest memory state, the pages that are involved in the virtio communication do not need to be set COW. - if we are trying to mark the vm_entry that contains the guest memory as COW from the host vmspace, the virtio devices will crash the guest eventually (some assertions about IOV and operation types will fail). We know that we should not set that memory as copy-on-write because it is not the way the guest sees its physical memory but, - if we set the vm_entry that contains the object with guest memory as COW from the dedicated vmspace created for the guest, the virtio devices will not fail assertions anymore, but after some time it seems that the guest filesystem is corrupted. Usually, we can start the guest normally after entering in single user mode and running fsck. Sometime, we need to install the virtual machine again. Also, after setting the vm_entry from the guest dedicated vmspace as COW, the two vm_entry will have different views of the guest memory: - the vm_entry from the guest dedicated vmspace will point to a new object (of course, after a first write access) - the vm_entry from the host vmspace that contains the guest memory will point to the old object that now has as backing object the new created object. Another question might be if it is ok to "change" the object from the host's vm_entry to point to the backing object. In this case, the two entries will point again to the same object. This might imply to remap/redo the references contained by the old object to point to the new object. [1] http://freebsd.1045724.x6.nabble.com/Inspect-pages-created-after-a-vm-object-is-marked-as-copy-on-write-td6266552.html Thank you, Elena