Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 30 Jan 2012 12:30:23 -0800
From:      Dennis Glatting <freebsd@pki2.com>
To:        Peter Maloney <peter.maloney@brockmann-consult.de>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS sync / ZIL clarification
Message-ID:  <1327955423.22960.0.camel@btw.pki2.com>
In-Reply-To: <4F264B27.6060502@brockmann-consult.de>
References:  <op.v8vqsgqq34t2sn@me-pc> <4F264B27.6060502@brockmann-consult.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 2012-01-30 at 08:47 +0100, Peter Maloney wrote:
> On 01/30/2012 05:30 AM, Mark Felder wrote:
> > I believe I was told something misleading a few weeks ago and I'd like
> > to have this officially clarified.
> >
> > NFS on ZFS is horrible unless you have sync = disabled. 
> With ESXi = true
> with others = depends on your definition of horrible
> 
> > I was told this was effectively disabling the ZIL, which is of course
> > naughty. Now I stumbled upon this tonight:
> >
> true only for the specific dataset you specified
> eg.
> zfs set sync=disabled tank/esxi
> 
> >> Just for the archives... sync=disabled won't disable disable the
> >> zil,it'll disable waiting for a disk-flush on fsync etc. 
> Same thing... "waiting for a disk-flush" is the only time the ZIL is
> used, from what I understand.
> 
> >> With a batterybacked controller cache, those flushes should go to
> >> cache, and bepretty mich free. You end up tossing away something for
> >> nothing.
> False I guess. Would be nice, but how do you battery back your RAM,
> which ZFS uses as a write cache? (If you know something I don't know,
> please share.)
> >
> > Is this accurate?
> 
> sync=disabled caused data corruption for me. So you need to have battery
> backed cache... unfortunately, the cache we are talking about is in RAM,
> not your IO controller. So put a UPS on there, and you are safe except
> when you get a kernel panic (which is what happened to cause my
> corruption). But if you get something like the Gigabyte iRAM or the
> Acard ANS-9010
> <http://www.acard.com.tw/english/fb01-product.jsp?prod_no=ANS-9010&type1_title=%20Solid%20State%20Drive&idno_no=270>,
> set it as your ZIL, and leave sync=standard, you should be safer. (I
> don't know if the iRAM works in FreeBSD, but someone
> <http://christopher-technicalmusings.blogspot.com/2011/06/speeding-up-freebsds-nfs-on-zfs-for-esx.html>;
> told me he uses the ANS-9010)
> 
> And NFS with ZFS is not horrible, except with ESXi's built in NFS client
> it uses for datastores.  (the same someone that said he uses the
> ANS-9010 also provides a 'patch' for the FreeBSD NFS server that
> disables ESXi's stupid behavior, without disabling sync entirely, but
> also possibly disables it for others that use it responsibly [a database
> perhaps])
> 
> here
> <http://www.citi.umich.edu/projects/nfs-perf/results/cel/write-throughput.html>;
> is a fantastic study about NFS; dunno if this study resulted in patches
> now in use or not, or how old it is [newest reference is 2002, so at
> most 10 years old]. In my experience, the write caching in use today
> still sucks. If I run async with sync=disabled, I can still see a huge
> improvement (20% on large files, up to 100% for smaller files <200MB)
> using an ESXi virtual disk (with ext4 doing write caching) compared to
> NFS directly.
> 
> 
> Here begins the rant about ESXi, which may be off topic:
> 

ESXi 3.5, 4.0, 4.1, 5.0, or all of the above?


> ESXi goes 7 MB/s with an SSD ZIL at 100% load, and 80 MB/s with a
> ramdisk ZIL at 100% load (pathetic!),
> something I can't reproduce (thought it was just a normal Linux client
> with "-o sync" over 10 Gbps ethernet) got over 70MB/s with the ZIL at
> 70-90% load,
> and other clients set to "-o sync,noatime,..." or "-o noatime,..."with
> the ZIL only randomly 0-5% load, but go faster than 100 MB/s. I didn't
> test "async", and without "sync", they seem to go the same speed.
> setting sync=disabled always goes around 100 MB/s, and changes the load
> on the ZIL to 0%.
> 
> The thing I can't reproduce might have been only possible on a pool that
> I created with FreeBSD 8.2-RELEASE and then upgraded, which I no longer
> have. Or maybe it was with "sync" without "noatime".
> 
> I am going to test with 9000 MTU, and if it is not much faster, I am
> giving up on NFS. My original plan was to use ESXi with a ZFS datastore
> with a replicated backup. That works terribly using the ESXi NFS client.
> Netbooting the OSses to bypass the ESXi client works much better, but
> still not good enough for many servers. NFS is poorly implemented, with
> terrible write caching on the client side. Now my plan is to use FreeBSD
> with VirtualBox and ZFS all in one system, and send replication
> snapshots from there. I wanted to use ESXi, but I guess I can't.
> 
> And the worst thing about ESXi, is if you have 1 client going 7MB/s, the
> second client has to share that 7MB/s, and non-ESXi clients will still
> go horribly slow. If you have 10 non-ESXi clients going at 100 MB/s,
> each one is limited to around 100 MB/s (again I only tested this with
> 1500 MTU so far), but together they can write much more.
> 
> Just now I tested 2 clients writing 100+100 MB/s (reported by GNU dd),
> and 3 writing 50+60+60 MB/s (reported by GNU dd)
> Output from "zpool iostat 5":
> two clients:
> tank        38.7T  4.76T      0  1.78K  25.5K   206M (matches 100+100)
> three clients:
> tank        38.7T  4.76T      1  2.44K   205K   245M (does not match
> 60+60+50)
> 
> (one client is a Linux netboot, and the others are using the Linux NFS
> client)
> 
> But I am not an 'official', so this cannot be considered 'officially
> clarified' ;)
> 
> 
> > _______________________________________________
> > freebsd-fs@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> 
> 





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1327955423.22960.0.camel>