Date: Mon, 30 Jan 2012 12:30:23 -0800 From: Dennis Glatting <freebsd@pki2.com> To: Peter Maloney <peter.maloney@brockmann-consult.de> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS sync / ZIL clarification Message-ID: <1327955423.22960.0.camel@btw.pki2.com> In-Reply-To: <4F264B27.6060502@brockmann-consult.de> References: <op.v8vqsgqq34t2sn@me-pc> <4F264B27.6060502@brockmann-consult.de>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 2012-01-30 at 08:47 +0100, Peter Maloney wrote: > On 01/30/2012 05:30 AM, Mark Felder wrote: > > I believe I was told something misleading a few weeks ago and I'd like > > to have this officially clarified. > > > > NFS on ZFS is horrible unless you have sync = disabled. > With ESXi = true > with others = depends on your definition of horrible > > > I was told this was effectively disabling the ZIL, which is of course > > naughty. Now I stumbled upon this tonight: > > > true only for the specific dataset you specified > eg. > zfs set sync=disabled tank/esxi > > >> Just for the archives... sync=disabled won't disable disable the > >> zil,it'll disable waiting for a disk-flush on fsync etc. > Same thing... "waiting for a disk-flush" is the only time the ZIL is > used, from what I understand. > > >> With a batterybacked controller cache, those flushes should go to > >> cache, and bepretty mich free. You end up tossing away something for > >> nothing. > False I guess. Would be nice, but how do you battery back your RAM, > which ZFS uses as a write cache? (If you know something I don't know, > please share.) > > > > Is this accurate? > > sync=disabled caused data corruption for me. So you need to have battery > backed cache... unfortunately, the cache we are talking about is in RAM, > not your IO controller. So put a UPS on there, and you are safe except > when you get a kernel panic (which is what happened to cause my > corruption). But if you get something like the Gigabyte iRAM or the > Acard ANS-9010 > <http://www.acard.com.tw/english/fb01-product.jsp?prod_no=ANS-9010&type1_title=%20Solid%20State%20Drive&idno_no=270>, > set it as your ZIL, and leave sync=standard, you should be safer. (I > don't know if the iRAM works in FreeBSD, but someone > <http://christopher-technicalmusings.blogspot.com/2011/06/speeding-up-freebsds-nfs-on-zfs-for-esx.html> > told me he uses the ANS-9010) > > And NFS with ZFS is not horrible, except with ESXi's built in NFS client > it uses for datastores. (the same someone that said he uses the > ANS-9010 also provides a 'patch' for the FreeBSD NFS server that > disables ESXi's stupid behavior, without disabling sync entirely, but > also possibly disables it for others that use it responsibly [a database > perhaps]) > > here > <http://www.citi.umich.edu/projects/nfs-perf/results/cel/write-throughput.html> > is a fantastic study about NFS; dunno if this study resulted in patches > now in use or not, or how old it is [newest reference is 2002, so at > most 10 years old]. In my experience, the write caching in use today > still sucks. If I run async with sync=disabled, I can still see a huge > improvement (20% on large files, up to 100% for smaller files <200MB) > using an ESXi virtual disk (with ext4 doing write caching) compared to > NFS directly. > > > Here begins the rant about ESXi, which may be off topic: > ESXi 3.5, 4.0, 4.1, 5.0, or all of the above? > ESXi goes 7 MB/s with an SSD ZIL at 100% load, and 80 MB/s with a > ramdisk ZIL at 100% load (pathetic!), > something I can't reproduce (thought it was just a normal Linux client > with "-o sync" over 10 Gbps ethernet) got over 70MB/s with the ZIL at > 70-90% load, > and other clients set to "-o sync,noatime,..." or "-o noatime,..."with > the ZIL only randomly 0-5% load, but go faster than 100 MB/s. I didn't > test "async", and without "sync", they seem to go the same speed. > setting sync=disabled always goes around 100 MB/s, and changes the load > on the ZIL to 0%. > > The thing I can't reproduce might have been only possible on a pool that > I created with FreeBSD 8.2-RELEASE and then upgraded, which I no longer > have. Or maybe it was with "sync" without "noatime". > > I am going to test with 9000 MTU, and if it is not much faster, I am > giving up on NFS. My original plan was to use ESXi with a ZFS datastore > with a replicated backup. That works terribly using the ESXi NFS client. > Netbooting the OSses to bypass the ESXi client works much better, but > still not good enough for many servers. NFS is poorly implemented, with > terrible write caching on the client side. Now my plan is to use FreeBSD > with VirtualBox and ZFS all in one system, and send replication > snapshots from there. I wanted to use ESXi, but I guess I can't. > > And the worst thing about ESXi, is if you have 1 client going 7MB/s, the > second client has to share that 7MB/s, and non-ESXi clients will still > go horribly slow. If you have 10 non-ESXi clients going at 100 MB/s, > each one is limited to around 100 MB/s (again I only tested this with > 1500 MTU so far), but together they can write much more. > > Just now I tested 2 clients writing 100+100 MB/s (reported by GNU dd), > and 3 writing 50+60+60 MB/s (reported by GNU dd) > Output from "zpool iostat 5": > two clients: > tank 38.7T 4.76T 0 1.78K 25.5K 206M (matches 100+100) > three clients: > tank 38.7T 4.76T 1 2.44K 205K 245M (does not match > 60+60+50) > > (one client is a Linux netboot, and the others are using the Linux NFS > client) > > But I am not an 'official', so this cannot be considered 'officially > clarified' ;) > > > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1327955423.22960.0.camel>