Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Aug 2015 19:47:22 -0400
From:      Tenzin Lhakhang <tenzin.lhakhang@gmail.com>
To:        "Chad J. Milios" <milios@ccsys.com>
Cc:        Paul Vixie <paul@redbarn.org>, freebsd-fs@freebsd.org, Vick Khera <vivek@khera.org>,  Matt Churchyard <matt.churchyard@userve.net>,  "freebsd-virtualization@freebsd.org" <freebsd-virtualization@freebsd.org>, allanjude@freebsd.org
Subject:   Re: Options for zfs inside a VM backed by zfs on the host
Message-ID:  <CALcn87yArcBs0ybrZBBxaxDU0y6s=wM8di0RmaSCJCgOjUHq9w@mail.gmail.com>
In-Reply-To: <453A5A6F-E347-41AE-8CBC-9E0F4DA49D38@ccsys.com>
References:  <CALd%2BdcfJ%2BT-f5gk_pim39BSF7nhBqHC3ab7dXgW8fH43VvvhvA@mail.gmail.com> <20150827061044.GA10221@blazingdot.com> <20150827062015.GA10272@blazingdot.com> <1a6745e27d184bb99eca7fdbdc90c8b5@SERVER.ad.usd-group.com> <55DF46F5.4070406@redbarn.org> <453A5A6F-E347-41AE-8CBC-9E0F4DA49D38@ccsys.com>

next in thread | previous in thread | raw e-mail | index | archive | help
That was a really awesome read!  The idea of turning metadata on at the
backend zpool and then data on the VM was interesting, I will give that a
try. Please can you elaborate more on the ZILs and synchronous writes by
VMs.. that seems like a great topic.
-
I am right now exploring the question: are SSD ZILs necessary in an all SSD
pool? and then the question of NVMe SSD ZILs onto of an all SSD pool.  My
guess at the moment is that SSD ZILs are not necessary at all in an SSD
pool during intensive IO.  I've been told that ZILs are always there to
help you, but when your pool aggregate IOPs is greater than the a ZIL, it
doesn't seem to make sense.. Or is it the latency of writing to a single
disk vs striping across your "fast" vdevs?

Thanks,
Tenzin

On Thu, Aug 27, 2015 at 3:53 PM, Chad J. Milios <milios@ccsys.com> wrote:

> > On Aug 27, 2015, at 10:46 AM, Allan Jude <allanjude@freebsd.org> wrote:
> >
> > On 2015-08-27 02:10, Marcus Reid wrote:
> >> On Wed, Aug 26, 2015 at 05:25:52PM -0400, Vick Khera wrote:
> >>> I'm running FreeBSD inside a VM that is providing the virtual disks
> backed
> >>> by several ZFS zvols on the host. I want to run ZFS on the VM itself
> too
> >>> for simplified management and backup purposes.
> >>>
> >>> The question I have is on the VM guest, do I really need to run a
> raid-z or
> >>> mirror or can I just use a single virtual disk (or even a stripe)?
> Given
> >>> that the underlying storage for the virtual disk is a zvol on a raid-=
z
> >>> there should not really be too much worry for data corruption, I woul=
d
> >>> think. It would be equivalent to using a hardware raid for each
> component
> >>> of my zfs pool.
> >>>
> >>> Opinions? Preferably well-reasoned ones. :)
> >>
> >> This is a frustrating situation, because none of the options that I ca=
n
> >> think of look particularly appealing.  Single-vdev pools would be the
> >> best option, your redundancy is already taken care of by the host's
> >> pool.  The overhead of checksumming, etc. twice is probably not super
> >> bad.  However, having the ARC eating up lots of memory twice seems
> >> pretty bletcherous.  You can probably do some tuning to reduce that, b=
ut
> >> I never liked tuning the ARC much.
> >>
> >> All the nice features ZFS brings to the table is hard to give up once
> >> you get used to having them around, so I understand your quandry.
> >>
> >> Marcus
> >
> > You can just:
> >
> > zfs set primarycache=3Dmetadata poolname
> >
> > And it will only cache metadata in the ARC inside the VM, and avoid
> > caching data blocks, which will be cached outside the VM. You could eve=
n
> > turn the primarycache off entirely.
> >
> > --
> > Allan Jude
>
> > On Aug 27, 2015, at 1:20 PM, Paul Vixie <paul@redbarn.org> wrote:
> >
> > let me ask a related question: i'm using FFS in the guest, zvol on the
> > host. should i be telling my guest kernel to not bother with an FFS
> > buffer cache at all, or to use a smaller one, or what?
>
>
> Whether we are talking ffs, ntfs or zpool atop zvol, unfortunately there
> are really no simple answers. You must consider your use case, the host a=
nd
> vm hardware/software configuration, perform meaningful benchmarks and, if
> you care about data integrity, thorough tests of the likely failure modes
> (all far more easily said than done). I=E2=80=99m curious to hear more ab=
out your
> use case(s) and setups so as to offer better insight on what alternatives
> may make more/less sense for you. Performance needs? Are you striving for
> lower individual latency or higher combined throughput? How critical are
> integrity and availability? How do you prefer your backup routine? Do you
> handle that in guest or host? Want features like dedup and/or L2ARC up in
> the mix? (Then everything bears reconsideration, just about triple your
> research and testing efforts.)
>
> Sorry, I=E2=80=99m really not trying to scare anyone away from ZFS. It is=
 awesome
> and capable of providing amazing solutions with very reliable and sensibl=
e
> behavior if handled with due respect, fear, monitoring and upkeep. :)
>
> There are cases to be made for caching [meta-]data in the child, in the
> parent, checksumming in the child/parent/both, compressing in the
> child/parent. I believe `gstat` along with your custom-made benchmark or
> test load will greatly help guide you.
>
> ZFS on ZFS seems to be a hardly studied, seldom reported, never
> documented, tedious exercise. Prepare for accelerated greying and balding
> of your hair. The parent's volblocksize, child's ashift, alignment,
> interactions involving raidz stripes (if used) can lead to problems from
> slightly decreased performance and storage efficiency to pathological wri=
te
> amplification within ZFS, performance and responsiveness crashing and
> sinking to the bottom of the ocean. Some datasets can become veritable
> black holes to vfs system calls. You may see ZFS reporting elusive errors=
,
> deadlocking or panicing in the child or parent altogether. With diligence
> though, stable and performant setups can be discovered for many productio=
n
> situations.
>
> For example, for a zpool (whether used by a VM or not, locally, thru
> iscsi, ggate[cd], or whatever) atop zvol which sits on parent zpool with =
no
> redundancy, I would set primarycache=3Dmetadata checksum=3Doff compressio=
n=3Doff
> for the zvol(s) on the host(s) and for the most part just use the same
> zpool settings and sysctl tunings in the VM (or child zpool, whatever rol=
e
> it may conduct) that i would otherwise use on bare cpu and bare drives
> (defaults + compression=3Dlz4 atime=3Doff). However, that simple case is =
likely
> not yours.
>
> With ufs/ffs/ntfs/ext4 and most other filesystems atop a zvol i use
> checksums on the parent zvol, and compression too if the child doesn=E2=
=80=99t
> support it (as ntfs can), but still caching only metadata on the host and
> letting the child vm/fs cache real data.
>
> My use case involves charging customers for their memory use so admittedl=
y
> that is one motivating factor, LOL. Plus, i certainly don=E2=80=99t want =
one rude
> VM marching through host ARC unfairly evacuating and starving the other
> polite neighbors.
>
> VM=E2=80=99s swap space becomes another consideration and I treat it like=
 any
> other =E2=80=98dumb=E2=80=99 filesystem with compression and checksumming=
 done by the
> parent but recent versions of many operating systems may be paging out on=
ly
> already compressed data, so investigate your guest OS. I=E2=80=99ve found=
 lz4=E2=80=99s
> claims of an almost-no-penalty early-abort to be vastly overstated when
> dealing with zvols, small block sizes and high throughput so if you can b=
e
> certain you=E2=80=99ll be dealing with only compressed data then turn it =
off. For
> the virtual memory pagers in most current-day OS=E2=80=99s though set com=
pression
> on the swap=E2=80=99s backing zvol to lz4.
>
> Another factor is the ZIL. One VM can hoard your synchronous write
> performance. Solutions are beyond the scope of this already-too-long emai=
l
> :) but I=E2=80=99d be happy to elaborate if queried.
>
> And then there=E2=80=99s always netbooting guests from NFS mounts served =
by the
> host and giving the guest no virtual disks, don=E2=80=99t forget to consi=
der that
> option.
>
> Hope this provokes some fruitful ideas for you. Glad to philosophize abou=
t
> ZFS setups with ya=E2=80=99ll :)
>
> -chad
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CALcn87yArcBs0ybrZBBxaxDU0y6s=wM8di0RmaSCJCgOjUHq9w>