From owner-freebsd-fs@FreeBSD.ORG Mon Nov 7 00:47:17 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AD76C106564A; Mon, 7 Nov 2011 00:47:17 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 318D78FC14; Mon, 7 Nov 2011 00:47:16 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArUEAIApt06DaFvO/2dsb2JhbABDhHqjN4JQgXIBAQUjVhsYAgINGQJLDgYTqz+QeoEwhmWBFgSIC4wWkgo X-IronPort-AV: E=Sophos;i="4.69,466,1315195200"; d="scan'208";a="144414471" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 06 Nov 2011 19:47:16 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 4D4B1B3F42; Sun, 6 Nov 2011 19:47:16 -0500 (EST) Date: Sun, 6 Nov 2011 19:47:16 -0500 (EST) From: Rick Macklem To: Josh Paetzel Message-ID: <1093662212.1257099.1320626836299.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <4EB6B4E9.1000804@tcbug.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: Josh Paetzel , freebsd-fs@freebsd.org, zkirsch@freebsd.org, Ronald Klop Subject: Re: [RFC] Should vfs.nfsrv.async be implemented for new NFS server? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Nov 2011 00:47:17 -0000 Josh Paetzel wrote: > On 11/06/11 07:34, Rick Macklem wrote: > > Ronald Klop wrote: > >> On Sun, 06 Nov 2011 02:18:05 +0100, Rick Macklem > >> > >> wrote: > >> > >>> Hi, > >>> > >>> Josh Paetzel pointed out that vfs.nfsrv.async doesn't exist > >>> for the new NFS server. > >>> > >>> I don't think I had spotted this before, but when I looked I > >>> saw that, when vfs.nfsrv.async is set non-zero in the old server, > >>> it returns FILESYNC (which means the write has been committed to > >>> non-volatile storage) even when it hasn't actually done that. > >>> > >>> This can improve performance, but has some negative implications: > >>> - If the server crashes before the write is committed to > >>> non-volatile storage, the file modification will be lost. > >>> (When a server replies UNSTABLE to a write, the client holds > >>> onto the data in its cache and does the write again if the > >>> server crashes/reboots before the client does a Commit RPC > >>> for the file. However, a reply of FILESYNC tells the client > >>> it can forget about the write, because it is done.) > >>> - Because of the above, replying FILESYNC when the data is not > >>> yet committed to non-volatile (also referred to as stable) > >>> storage, this is a violation of RFC1813. > >> > >> Just out of curiosity. Why would lying about FILESYNC improve > >> performance > >> over UNSTABLE? The server does the same work. Only the client holds > >> data > >> longer in memory. I only see impact if the client has just a little > >> bit of > >> memory. > >> > >> Ronald. > > Well, I'm not sure I have an answer. Josh noted that it makes a big > > difference for them. Maybe he can elaborate? > > > > I'll test it out and report back in the next week or so. > > In 8.x, setting the async sysctl was the difference between > 80-100MB/sec > and 800 MB/sec (Yes, MegaBytes!) using a variety of different clients, > including the VMWare ESXi 4.x client, Xen 5.6 client, various linux > clients and the FreeBSD client. I'll note that 800MB/sec is getting > close to the underlying filesystem performance, so it's likely that > the > gate to performance is in the filesystem in that case. 80-100MB/sec is > basically gigE performance. > Just wondering...are these tests writing a file larger than the buffer cache can hold? rick > I can make hardware available if anyone is curious at poking at this, > we > have the ability to set up tests with quad gigE LACP, 10 gigE, and > numerous clients. > > > One additional effect is that the client in head must do a > > synchronous > > write (with FILESYNC and waiting for the RPC reply) before it can > > modify a non-continuous region of the same buffer with respect to > > the old dirty byte region. (This happens > > frequently during builds, done mostly by the loader, I think?) > > If the server replies FILESYNC, then the old dirty byte region is > > done > > (ie. no longer a dirty byte region) so the client doesn't > > have to do the synchronous write described above. > > I hope that the experimental patch I made available a few days ago, > > along with work jhb@ is doing will eventually fix this for the > > FreeBSD > > client, but it won't be in head anytime soon (and who knows what > > other clients do?). > > > > rick > > > > > -- > Thanks, > > Josh Paetzel