Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Aug 2015 15:53:42 -0400
From:      "Chad J. Milios" <milios@ccsys.com>
To:        Paul Vixie <paul@redbarn.org>
Cc:        Matt Churchyard <matt.churchyard@userve.net>, Vick Khera <vivek@khera.org>, allanjude@freebsd.org, "freebsd-virtualization@freebsd.org" <freebsd-virtualization@freebsd.org>, freebsd-fs@freebsd.org
Subject:   Re: Options for zfs inside a VM backed by zfs on the host
Message-ID:  <453A5A6F-E347-41AE-8CBC-9E0F4DA49D38@ccsys.com>
In-Reply-To: <55DF46F5.4070406@redbarn.org>
References:  <CALd%2BdcfJ%2BT-f5gk_pim39BSF7nhBqHC3ab7dXgW8fH43VvvhvA@mail.gmail.com> <20150827061044.GA10221@blazingdot.com> <20150827062015.GA10272@blazingdot.com> <1a6745e27d184bb99eca7fdbdc90c8b5@SERVER.ad.usd-group.com> <55DF46F5.4070406@redbarn.org>

next in thread | previous in thread | raw e-mail | index | archive | help
> On Aug 27, 2015, at 10:46 AM, Allan Jude <allanjude@freebsd.org> =
wrote:
>=20
> On 2015-08-27 02:10, Marcus Reid wrote:
>> On Wed, Aug 26, 2015 at 05:25:52PM -0400, Vick Khera wrote:
>>> I'm running FreeBSD inside a VM that is providing the virtual disks =
backed
>>> by several ZFS zvols on the host. I want to run ZFS on the VM itself =
too
>>> for simplified management and backup purposes.
>>>=20
>>> The question I have is on the VM guest, do I really need to run a =
raid-z or
>>> mirror or can I just use a single virtual disk (or even a stripe)? =
Given
>>> that the underlying storage for the virtual disk is a zvol on a =
raid-z
>>> there should not really be too much worry for data corruption, I =
would
>>> think. It would be equivalent to using a hardware raid for each =
component
>>> of my zfs pool.
>>>=20
>>> Opinions? Preferably well-reasoned ones. :)
>>=20
>> This is a frustrating situation, because none of the options that I =
can
>> think of look particularly appealing.  Single-vdev pools would be the
>> best option, your redundancy is already taken care of by the host's
>> pool.  The overhead of checksumming, etc. twice is probably not super
>> bad.  However, having the ARC eating up lots of memory twice seems
>> pretty bletcherous.  You can probably do some tuning to reduce that, =
but
>> I never liked tuning the ARC much.
>>=20
>> All the nice features ZFS brings to the table is hard to give up once
>> you get used to having them around, so I understand your quandry.
>>=20
>> Marcus
>=20
> You can just:
>=20
> zfs set primarycache=3Dmetadata poolname
>=20
> And it will only cache metadata in the ARC inside the VM, and avoid
> caching data blocks, which will be cached outside the VM. You could =
even
> turn the primarycache off entirely.
>=20
> --=20
> Allan Jude

> On Aug 27, 2015, at 1:20 PM, Paul Vixie <paul@redbarn.org> wrote:
>=20
> let me ask a related question: i'm using FFS in the guest, zvol on the
> host. should i be telling my guest kernel to not bother with an FFS
> buffer cache at all, or to use a smaller one, or what?


Whether we are talking ffs, ntfs or zpool atop zvol, unfortunately there =
are really no simple answers. You must consider your use case, the host =
and vm hardware/software configuration, perform meaningful benchmarks =
and, if you care about data integrity, thorough tests of the likely =
failure modes (all far more easily said than done). I=E2=80=99m curious =
to hear more about your use case(s) and setups so as to offer better =
insight on what alternatives may make more/less sense for you. =
Performance needs? Are you striving for lower individual latency or =
higher combined throughput? How critical are integrity and availability? =
How do you prefer your backup routine? Do you handle that in guest or =
host? Want features like dedup and/or L2ARC up in the mix? (Then =
everything bears reconsideration, just about triple your research and =
testing efforts.)

Sorry, I=E2=80=99m really not trying to scare anyone away from ZFS. It =
is awesome and capable of providing amazing solutions with very reliable =
and sensible behavior if handled with due respect, fear, monitoring and =
upkeep. :)

There are cases to be made for caching [meta-]data in the child, in the =
parent, checksumming in the child/parent/both, compressing in the =
child/parent. I believe `gstat` along with your custom-made benchmark or =
test load will greatly help guide you.

ZFS on ZFS seems to be a hardly studied, seldom reported, never =
documented, tedious exercise. Prepare for accelerated greying and =
balding of your hair. The parent's volblocksize, child's ashift, =
alignment, interactions involving raidz stripes (if used) can lead to =
problems from slightly decreased performance and storage efficiency to =
pathological write amplification within ZFS, performance and =
responsiveness crashing and sinking to the bottom of the ocean. Some =
datasets can become veritable black holes to vfs system calls. You may =
see ZFS reporting elusive errors, deadlocking or panicing in the child =
or parent altogether. With diligence though, stable and performant =
setups can be discovered for many production situations.

For example, for a zpool (whether used by a VM or not, locally, thru =
iscsi, ggate[cd], or whatever) atop zvol which sits on parent zpool with =
no redundancy, I would set primarycache=3Dmetadata checksum=3Doff =
compression=3Doff for the zvol(s) on the host(s) and for the most part =
just use the same zpool settings and sysctl tunings in the VM (or child =
zpool, whatever role it may conduct) that i would otherwise use on bare =
cpu and bare drives (defaults + compression=3Dlz4 atime=3Doff). However, =
that simple case is likely not yours.

With ufs/ffs/ntfs/ext4 and most other filesystems atop a zvol i use =
checksums on the parent zvol, and compression too if the child doesn=E2=80=
=99t support it (as ntfs can), but still caching only metadata on the =
host and letting the child vm/fs cache real data.

My use case involves charging customers for their memory use so =
admittedly that is one motivating factor, LOL. Plus, i certainly don=E2=80=
=99t want one rude VM marching through host ARC unfairly evacuating and =
starving the other polite neighbors.

VM=E2=80=99s swap space becomes another consideration and I treat it =
like any other =E2=80=98dumb=E2=80=99 filesystem with compression and =
checksumming done by the parent but recent versions of many operating =
systems may be paging out only already compressed data, so investigate =
your guest OS. I=E2=80=99ve found lz4=E2=80=99s claims of an =
almost-no-penalty early-abort to be vastly overstated when dealing with =
zvols, small block sizes and high throughput so if you can be certain =
you=E2=80=99ll be dealing with only compressed data then turn it off. =
For the virtual memory pagers in most current-day OS=E2=80=99s though =
set compression on the swap=E2=80=99s backing zvol to lz4.

Another factor is the ZIL. One VM can hoard your synchronous write =
performance. Solutions are beyond the scope of this already-too-long =
email :) but I=E2=80=99d be happy to elaborate if queried.

And then there=E2=80=99s always netbooting guests from NFS mounts served =
by the host and giving the guest no virtual disks, don=E2=80=99t forget =
to consider that option.

Hope this provokes some fruitful ideas for you. Glad to philosophize =
about ZFS setups with ya=E2=80=99ll :)

-chad=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?453A5A6F-E347-41AE-8CBC-9E0F4DA49D38>