From owner-freebsd-virtualization@freebsd.org Sun Dec 3 04:53:04 2017 Return-Path: Delivered-To: freebsd-virtualization@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A83C6DEDE17 for ; Sun, 3 Dec 2017 04:53:04 +0000 (UTC) (envelope-from allanjude@freebsd.org) Received: from mx1.scaleengine.net (mx1.scaleengine.net [209.51.186.6]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 87AA36A611; Sun, 3 Dec 2017 04:53:04 +0000 (UTC) (envelope-from allanjude@freebsd.org) Received: from [10.1.1.2] (Seawolf.HML3.ScaleEngine.net [209.51.186.28]) (Authenticated sender: allanjude.freebsd@scaleengine.com) by mx1.scaleengine.net (Postfix) with ESMTPSA id EF0B913D7C; Sun, 3 Dec 2017 04:53:02 +0000 (UTC) Subject: Re: bhyve uses all available memory during IO-intensive operations To: "K. Macy" Cc: "freebsd-virtualization@freebsd.org" References: <59DFCE5F-029F-4585-B0BA-8FABC43357F2@ebureau.com> <11e6e55d-9802-c9fc-859c-37c026eaba2b@freebsd.org> From: Allan Jude Message-ID: <571ab0b4-ec6c-2bc4-438b-d3dce35cd775@freebsd.org> Date: Sat, 2 Dec 2017 23:53:02 -0500 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Dec 2017 04:53:04 -0000 On 2017-12-02 20:21, K. Macy wrote: > On Sat, Dec 2, 2017 at 5:16 PM, Allan Jude wrote: >> On 12/02/2017 00:23, Dustin Wenz wrote: >>> I have noticed significant storage amplification for my zvols; that could very well be the reason. I would like to know more about why it happens. >>> >>> Since the volblocksize is 512 bytes, I certainly expect extra cpu overhead (and maybe an extra 1k or so worth of checksums for each 128k block in the vm), but how do you get a 10X expansion in stored data? >>> >>> What is the recommended zvol block size for a FreeBSD/ZFS guest? Perhaps 4k, to match the most common mass storage sector size? >>> >>> - .Dustin >>> >>>> On Dec 1, 2017, at 9:18 PM, K. Macy wrote: >>>> >>>> One thing to watch out for with chyves if your virtual disk is more >>>> than 20G is the fact that it uses 512 byte blocks for the zvols it >>>> creates. I ended up using up 1.4TB only half filling up a 250G zvol. >>>> Chyves is quick and easy, but it's not exactly production ready. >>>> >>>> -M >>>> >>>> >>>> >>>>> On Thu, Nov 30, 2017 at 3:15 PM, Dustin Wenz wrote: >>>>> I'm using chyves on FreeBSD 11.1 RELEASE to manage a few VMs (guest OS is also FreeBSD 11.1). Their sole purpose is to house some medium-sized Postgres databases (100-200GB). The host system has 64GB of real memory and 112GB of swap. I have configured each guest to only use 16GB of memory, yet while doing my initial database imports in the VMs, bhyve will quickly grow to use all available system memory and then be killed by the kernel: >>>>> >>>>> kernel: swap_pager: I/O error - pageout failed; blkno 1735,size 4096, error 12 >>>>> kernel: swap_pager: I/O error - pageout failed; blkno 1610,size 4096, error 12 >>>>> kernel: swap_pager: I/O error - pageout failed; blkno 1763,size 4096, error 12 >>>>> kernel: pid 41123 (bhyve), uid 0, was killed: out of swap space >>>>> >>>>> The OOM condition seems related to doing moderate IO within the VM, though nothing within the VM itself shows high memory usage. This is the chyves config for one of them: >>>>> >>>>> bargs -A -H -P -S >>>>> bhyve_disk_type virtio-blk >>>>> bhyve_net_type virtio-net >>>>> bhyveload_flags >>>>> chyves_guest_version 0300 >>>>> cpu 4 >>>>> creation Created on Mon Oct 23 16:17:04 CDT 2017 by chyves v0.2.0 2016/09/11 using __create() >>>>> loader bhyveload >>>>> net_ifaces tap51 >>>>> os default >>>>> ram 16G >>>>> rcboot 0 >>>>> revert_to_snapshot >>>>> revert_to_snapshot_method off >>>>> serial nmdm51 >>>>> template no >>>>> uuid 8495a130-b837-11e7-b092-0025909a8b56 >>>>> >>>>> >>>>> I've also tried using different bhyve_disk_types, with no improvement. How is it that bhyve can use far more memory that I'm specifying? >>>>> >>>>> - .Dustin >>> _______________________________________________ >>> freebsd-virtualization@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization >>> To unsubscribe, send any mail to "freebsd-virtualization-unsubscribe@freebsd.org" >>> >> >> Storage amplification usually has to do with ZFS RAID-Z padding. If your >> ZVOL block size does not make sense with your disk sector size, and >> RAID-Z level, you can get pretty silly numbers. > > That's not what I'm talking about here. If your volblocksize is too > small you end up using (vastly) more space for indirect blocks than > data blocks. > > -M > In addition, if you have say, 4k sectors, and a RAID-Z2, it means every allocation of 4k or less, requires 12k of disk space. Allocations of 8k are worse in this case, since all allocations must be in units of 1+p, where p is the parity level. So allocating 8kb of space (2x 4k sectors), plus 2x 4k parity sectors = 4 sectors, Rounded up the to the next multiple of 3 is 6. That means 8k of data took: 8kb for data + 8kb for parity + 8kb for padding = 24kb of space. If you were using RAID-Z1, it would have been just 12kb (8kb data, 4kb parity, 0kb padding) Or if you used 16kb record size on the zvol: 4 sectors data, 2 sectors parity = 6, which is a multiple of 3, so no padding required. -- Allan Jude