From owner-freebsd-fs@FreeBSD.ORG  Mon Nov  7 00:47:17 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id AD76C106564A;
	Mon,  7 Nov 2011 00:47:17 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
	[131.104.91.44])
	by mx1.freebsd.org (Postfix) with ESMTP id 318D78FC14;
	Mon,  7 Nov 2011 00:47:16 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ArUEAIApt06DaFvO/2dsb2JhbABDhHqjN4JQgXIBAQUjVhsYAgINGQJLDgYTqz+QeoEwhmWBFgSIC4wWkgo
X-IronPort-AV: E=Sophos;i="4.69,466,1315195200"; d="scan'208";a="144414471"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 06 Nov 2011 19:47:16 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 4D4B1B3F42;
	Sun,  6 Nov 2011 19:47:16 -0500 (EST)
Date: Sun, 6 Nov 2011 19:47:16 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Josh Paetzel <josh@tcbug.org>
Message-ID: <1093662212.1257099.1320626836299.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <4EB6B4E9.1000804@tcbug.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: Josh Paetzel <jpaetzel@freebsd.org>, freebsd-fs@freebsd.org,
	zkirsch@freebsd.org, Ronald Klop <ronald-freebsd8@klop.yi.org>
Subject: Re: [RFC] Should vfs.nfsrv.async be implemented for new NFS server?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 07 Nov 2011 00:47:17 -0000

Josh Paetzel wrote:
> On 11/06/11 07:34, Rick Macklem wrote:
> > Ronald Klop wrote:
> >> On Sun, 06 Nov 2011 02:18:05 +0100, Rick Macklem
> >> <rmacklem@uoguelph.ca>
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> Josh Paetzel pointed out that vfs.nfsrv.async doesn't exist
> >>> for the new NFS server.
> >>>
> >>> I don't think I had spotted this before, but when I looked I
> >>> saw that, when vfs.nfsrv.async is set non-zero in the old server,
> >>> it returns FILESYNC (which means the write has been committed to
> >>> non-volatile storage) even when it hasn't actually done that.
> >>>
> >>> This can improve performance, but has some negative implications:
> >>> - If the server crashes before the write is committed to
> >>>   non-volatile storage, the file modification will be lost.
> >>>   (When a server replies UNSTABLE to a write, the client holds
> >>>    onto the data in its cache and does the write again if the
> >>>    server crashes/reboots before the client does a Commit RPC
> >>>    for the file. However, a reply of FILESYNC tells the client
> >>>    it can forget about the write, because it is done.)
> >>> - Because of the above, replying FILESYNC when the data is not
> >>>   yet committed to non-volatile (also referred to as stable)
> >>>   storage, this is a violation of RFC1813.
> >>
> >> Just out of curiosity. Why would lying about FILESYNC improve
> >> performance
> >> over UNSTABLE? The server does the same work. Only the client holds
> >> data
> >> longer in memory. I only see impact if the client has just a little
> >> bit of
> >> memory.
> >>
> >> Ronald.
> > Well, I'm not sure I have an answer. Josh noted that it makes a big
> > difference for them. Maybe he can elaborate?
> >
> 
> I'll test it out and report back in the next week or so.
> 
> In 8.x, setting the async sysctl was the difference between
> 80-100MB/sec
> and 800 MB/sec (Yes, MegaBytes!) using a variety of different clients,
> including the VMWare ESXi 4.x client, Xen 5.6 client, various linux
> clients and the FreeBSD client. I'll note that 800MB/sec is getting
> close to the underlying filesystem performance, so it's likely that
> the
> gate to performance is in the filesystem in that case. 80-100MB/sec is
> basically gigE performance.
> 
Just wondering...are these tests writing a file larger than the buffer
cache can hold?

rick
> I can make hardware available if anyone is curious at poking at this,
> we
> have the ability to set up tests with quad gigE LACP, 10 gigE, and
> numerous clients.
> 
> > One additional effect is that the client in head must do a
> > synchronous
> > write (with FILESYNC and waiting for the RPC reply) before it can
> > modify a non-continuous region of the same buffer with respect to
> > the old dirty byte region. (This happens
> > frequently during builds, done mostly by the loader, I think?)
> > If the server replies FILESYNC, then the old dirty byte region is
> > done
> > (ie. no longer a dirty byte region) so the client doesn't
> > have to do the synchronous write described above.
> > I hope that the experimental patch I made available a few days ago,
> > along with work jhb@ is doing will eventually fix this for the
> > FreeBSD
> > client, but it won't be in head anytime soon (and who knows what
> > other clients do?).
> >
> > rick
> >
> 
> 
> --
> Thanks,
> 
> Josh Paetzel