From owner-freebsd-fs@FreeBSD.ORG Mon Jan 30 20:30:37 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 360C3106564A for ; Mon, 30 Jan 2012 20:30:37 +0000 (UTC) (envelope-from freebsd@pki2.com) Received: from btw.pki2.com (btw.pki2.com [IPv6:2001:470:a:6fd::2]) by mx1.freebsd.org (Postfix) with ESMTP id F3D298FC14 for ; Mon, 30 Jan 2012 20:30:36 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by btw.pki2.com (8.14.5/8.14.5) with ESMTP id q0UKUOnE024926; Mon, 30 Jan 2012 12:30:24 -0800 (PST) (envelope-from freebsd@pki2.com) From: Dennis Glatting To: Peter Maloney In-Reply-To: <4F264B27.6060502@brockmann-consult.de> References: <4F264B27.6060502@brockmann-consult.de> Content-Type: text/plain; charset="ISO-8859-1" Date: Mon, 30 Jan 2012 12:30:23 -0800 Message-ID: <1327955423.22960.0.camel@btw.pki2.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-yoursite-MailScanner-Information: Dennis Glatting X-yoursite-MailScanner-ID: q0UKUOnE024926 X-yoursite-MailScanner: Found to be clean X-MailScanner-From: freebsd@pki2.com Cc: freebsd-fs@freebsd.org Subject: Re: ZFS sync / ZIL clarification X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Jan 2012 20:30:37 -0000 On Mon, 2012-01-30 at 08:47 +0100, Peter Maloney wrote: > On 01/30/2012 05:30 AM, Mark Felder wrote: > > I believe I was told something misleading a few weeks ago and I'd like > > to have this officially clarified. > > > > NFS on ZFS is horrible unless you have sync = disabled. > With ESXi = true > with others = depends on your definition of horrible > > > I was told this was effectively disabling the ZIL, which is of course > > naughty. Now I stumbled upon this tonight: > > > true only for the specific dataset you specified > eg. > zfs set sync=disabled tank/esxi > > >> Just for the archives... sync=disabled won't disable disable the > >> zil,it'll disable waiting for a disk-flush on fsync etc. > Same thing... "waiting for a disk-flush" is the only time the ZIL is > used, from what I understand. > > >> With a batterybacked controller cache, those flushes should go to > >> cache, and bepretty mich free. You end up tossing away something for > >> nothing. > False I guess. Would be nice, but how do you battery back your RAM, > which ZFS uses as a write cache? (If you know something I don't know, > please share.) > > > > Is this accurate? > > sync=disabled caused data corruption for me. So you need to have battery > backed cache... unfortunately, the cache we are talking about is in RAM, > not your IO controller. So put a UPS on there, and you are safe except > when you get a kernel panic (which is what happened to cause my > corruption). But if you get something like the Gigabyte iRAM or the > Acard ANS-9010 > , > set it as your ZIL, and leave sync=standard, you should be safer. (I > don't know if the iRAM works in FreeBSD, but someone > > told me he uses the ANS-9010) > > And NFS with ZFS is not horrible, except with ESXi's built in NFS client > it uses for datastores. (the same someone that said he uses the > ANS-9010 also provides a 'patch' for the FreeBSD NFS server that > disables ESXi's stupid behavior, without disabling sync entirely, but > also possibly disables it for others that use it responsibly [a database > perhaps]) > > here > > is a fantastic study about NFS; dunno if this study resulted in patches > now in use or not, or how old it is [newest reference is 2002, so at > most 10 years old]. In my experience, the write caching in use today > still sucks. If I run async with sync=disabled, I can still see a huge > improvement (20% on large files, up to 100% for smaller files <200MB) > using an ESXi virtual disk (with ext4 doing write caching) compared to > NFS directly. > > > Here begins the rant about ESXi, which may be off topic: > ESXi 3.5, 4.0, 4.1, 5.0, or all of the above? > ESXi goes 7 MB/s with an SSD ZIL at 100% load, and 80 MB/s with a > ramdisk ZIL at 100% load (pathetic!), > something I can't reproduce (thought it was just a normal Linux client > with "-o sync" over 10 Gbps ethernet) got over 70MB/s with the ZIL at > 70-90% load, > and other clients set to "-o sync,noatime,..." or "-o noatime,..."with > the ZIL only randomly 0-5% load, but go faster than 100 MB/s. I didn't > test "async", and without "sync", they seem to go the same speed. > setting sync=disabled always goes around 100 MB/s, and changes the load > on the ZIL to 0%. > > The thing I can't reproduce might have been only possible on a pool that > I created with FreeBSD 8.2-RELEASE and then upgraded, which I no longer > have. Or maybe it was with "sync" without "noatime". > > I am going to test with 9000 MTU, and if it is not much faster, I am > giving up on NFS. My original plan was to use ESXi with a ZFS datastore > with a replicated backup. That works terribly using the ESXi NFS client. > Netbooting the OSses to bypass the ESXi client works much better, but > still not good enough for many servers. NFS is poorly implemented, with > terrible write caching on the client side. Now my plan is to use FreeBSD > with VirtualBox and ZFS all in one system, and send replication > snapshots from there. I wanted to use ESXi, but I guess I can't. > > And the worst thing about ESXi, is if you have 1 client going 7MB/s, the > second client has to share that 7MB/s, and non-ESXi clients will still > go horribly slow. If you have 10 non-ESXi clients going at 100 MB/s, > each one is limited to around 100 MB/s (again I only tested this with > 1500 MTU so far), but together they can write much more. > > Just now I tested 2 clients writing 100+100 MB/s (reported by GNU dd), > and 3 writing 50+60+60 MB/s (reported by GNU dd) > Output from "zpool iostat 5": > two clients: > tank 38.7T 4.76T 0 1.78K 25.5K 206M (matches 100+100) > three clients: > tank 38.7T 4.76T 1 2.44K 205K 245M (does not match > 60+60+50) > > (one client is a Linux netboot, and the others are using the Linux NFS > client) > > But I am not an 'official', so this cannot be considered 'officially > clarified' ;) > > > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > >