From owner-freebsd-fs@FreeBSD.ORG Mon Jan 30 07:47:55 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 12E1E1065672 for ; Mon, 30 Jan 2012 07:47:55 +0000 (UTC) (envelope-from peter.maloney@brockmann-consult.de) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.17.9]) by mx1.freebsd.org (Postfix) with ESMTP id 844E48FC0A for ; Mon, 30 Jan 2012 07:47:54 +0000 (UTC) Received: from [10.3.0.26] ([141.4.215.32]) by mrelayeu.kundenserver.de (node=mrbap0) with ESMTP (Nemesis) id 0MELXc-1RpcEh2sA4-00FsOh; Mon, 30 Jan 2012 08:47:51 +0100 Message-ID: <4F264B27.6060502@brockmann-consult.de> Date: Mon, 30 Jan 2012 08:47:51 +0100 From: Peter Maloney User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110922 Thunderbird/3.1.15 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: In-Reply-To: X-Enigmail-Version: 1.1.2 X-Provags-ID: V02:K0:IuM+mxJSp1oSplrzaQCsVkAjwQVjvDP9yDvBgv8eIQn 7NZROE1JY+bQrgAifdPzbG9xTzEV/lbGCTp98pGC9lxaU1GVEV cz0xpo+ejYZw8pC0f2Rc3i5XxfVrLwgISm/KE3DnTc+f4B9lc/ wU6HNJfy/tOb3OB5v0qVWA4CI1ZDvTISRnlM4kl7beNz5obpiK IBrSCM1+JPWIaaU7peD3Gyj+9WxK/txPzY0QGCuzda8BhWT/m1 2PatsrENs8EQSFgURZ2HU4z2LEZqXKRBfVppHp5FPcX1Yo+fFH 9qQbYfajKMjrsS8Ia55m6NTzT3j+ElBLwCYaaP6P+ici31TQjg O/QdLp7YevPb23CaGAgzTY8+NwgEsb0/HAytBOv/i Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Re: ZFS sync / ZIL clarification X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Jan 2012 07:47:55 -0000 On 01/30/2012 05:30 AM, Mark Felder wrote: > I believe I was told something misleading a few weeks ago and I'd like > to have this officially clarified. > > NFS on ZFS is horrible unless you have sync = disabled. With ESXi = true with others = depends on your definition of horrible > I was told this was effectively disabling the ZIL, which is of course > naughty. Now I stumbled upon this tonight: > true only for the specific dataset you specified eg. zfs set sync=disabled tank/esxi >> Just for the archives... sync=disabled won't disable disable the >> zil,it'll disable waiting for a disk-flush on fsync etc. Same thing... "waiting for a disk-flush" is the only time the ZIL is used, from what I understand. >> With a batterybacked controller cache, those flushes should go to >> cache, and bepretty mich free. You end up tossing away something for >> nothing. False I guess. Would be nice, but how do you battery back your RAM, which ZFS uses as a write cache? (If you know something I don't know, please share.) > > Is this accurate? sync=disabled caused data corruption for me. So you need to have battery backed cache... unfortunately, the cache we are talking about is in RAM, not your IO controller. So put a UPS on there, and you are safe except when you get a kernel panic (which is what happened to cause my corruption). But if you get something like the Gigabyte iRAM or the Acard ANS-9010 , set it as your ZIL, and leave sync=standard, you should be safer. (I don't know if the iRAM works in FreeBSD, but someone told me he uses the ANS-9010) And NFS with ZFS is not horrible, except with ESXi's built in NFS client it uses for datastores. (the same someone that said he uses the ANS-9010 also provides a 'patch' for the FreeBSD NFS server that disables ESXi's stupid behavior, without disabling sync entirely, but also possibly disables it for others that use it responsibly [a database perhaps]) here is a fantastic study about NFS; dunno if this study resulted in patches now in use or not, or how old it is [newest reference is 2002, so at most 10 years old]. In my experience, the write caching in use today still sucks. If I run async with sync=disabled, I can still see a huge improvement (20% on large files, up to 100% for smaller files <200MB) using an ESXi virtual disk (with ext4 doing write caching) compared to NFS directly. Here begins the rant about ESXi, which may be off topic: ESXi goes 7 MB/s with an SSD ZIL at 100% load, and 80 MB/s with a ramdisk ZIL at 100% load (pathetic!), something I can't reproduce (thought it was just a normal Linux client with "-o sync" over 10 Gbps ethernet) got over 70MB/s with the ZIL at 70-90% load, and other clients set to "-o sync,noatime,..." or "-o noatime,..."with the ZIL only randomly 0-5% load, but go faster than 100 MB/s. I didn't test "async", and without "sync", they seem to go the same speed. setting sync=disabled always goes around 100 MB/s, and changes the load on the ZIL to 0%. The thing I can't reproduce might have been only possible on a pool that I created with FreeBSD 8.2-RELEASE and then upgraded, which I no longer have. Or maybe it was with "sync" without "noatime". I am going to test with 9000 MTU, and if it is not much faster, I am giving up on NFS. My original plan was to use ESXi with a ZFS datastore with a replicated backup. That works terribly using the ESXi NFS client. Netbooting the OSses to bypass the ESXi client works much better, but still not good enough for many servers. NFS is poorly implemented, with terrible write caching on the client side. Now my plan is to use FreeBSD with VirtualBox and ZFS all in one system, and send replication snapshots from there. I wanted to use ESXi, but I guess I can't. And the worst thing about ESXi, is if you have 1 client going 7MB/s, the second client has to share that 7MB/s, and non-ESXi clients will still go horribly slow. If you have 10 non-ESXi clients going at 100 MB/s, each one is limited to around 100 MB/s (again I only tested this with 1500 MTU so far), but together they can write much more. Just now I tested 2 clients writing 100+100 MB/s (reported by GNU dd), and 3 writing 50+60+60 MB/s (reported by GNU dd) Output from "zpool iostat 5": two clients: tank 38.7T 4.76T 0 1.78K 25.5K 206M (matches 100+100) three clients: tank 38.7T 4.76T 1 2.44K 205K 245M (does not match 60+60+50) (one client is a Linux netboot, and the others are using the Linux NFS client) But I am not an 'official', so this cannot be considered 'officially clarified' ;) > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" -- -------------------------------------------- Peter Maloney Brockmann Consult Max-Planck-Str. 2 21502 Geesthacht Germany Tel: +49 4152 889 300 Fax: +49 4152 889 333 E-mail: peter.maloney@brockmann-consult.de Internet: http://www.brockmann-consult.de --------------------------------------------