From owner-freebsd-fs@FreeBSD.ORG  Mon Aug 15 22:58:15 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C57281065678;
	Mon, 15 Aug 2011 22:58:15 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
	[131.104.91.44])
	by mx1.freebsd.org (Postfix) with ESMTP id 6DBAC8FC1A;
	Mon, 15 Aug 2011 22:58:15 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AsMAADqjSU6DaFvO/2dsb2JhbABBhEiUAJBFgUABAQUjBFIbDgoCAg0ZAlkGLrEgkVuBLIQLgRAEkxKREQ
X-IronPort-AV: E=Sophos;i="4.67,376,1309752000"; d="scan'208";a="134489408"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 15 Aug 2011 18:58:14 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 85B90B4010;
	Mon, 15 Aug 2011 18:58:14 -0400 (EDT)
Date: Mon, 15 Aug 2011 18:58:14 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: John Baldwin <jhb@freebsd.org>
Message-ID: <1730399830.175988.1313449094531.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <201108151343.14655.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org, onwahe@gmail.com
Subject: Re: NFS calculation of max commit size
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Aug 2011 22:58:16 -0000

John Baldwin wrote:
> On Sunday, August 07, 2011 6:47:46 pm Rick Macklem wrote:
> > A recent PR (kern/159351) noted that the following
> > calculation results in a divide-by-zero when
> > desiredvnodes < 1000.
> >
> > 	nmp->nm_wcommitsize = hibufspace / (desiredvnodes / 1000);
> >
> > Just fixing the divide-by-zero is easy enough, but I'm not
> > sure what this calculation is trying to do. Making it a fraction
> > of "hibufspace" makes sense (nm_wcommitsize is the maximum # of
> > bytes of uncommitted data in the NFS client's buffer cache blocks,
> > if I understand it correctly), but why divide it by
> >
> >                 (desiredvnodes / 1000) ??
> >
> > Maybe thinking that fewer vnodes means sharing it with fewer
> > other file systems or ???
> >
> > Anyhow, it seems to me that the formulae is bogus for small
> > values of desiredvnodes (for example desiredvnodes == 1500
> > implies nm_wcommitsize == hibufspace, which sounds too large
> > to me).
> >
> > I'm thinking that putting an upper limit of 10% of hibufspace
> > might make sense. ie. Change the above to:
> >
> > 	if (desiredvnodes >= 11000)
> > 		nmp->nm_wcommitsize = hibufspace / (desiredvnodes / 1000);
> > 	else
> > 		nmp->nm_wcommitsize = hibufspace / 10;
> >
> > Anyone have comments or insight into this calculation?
> >
> > rick
> > ps: jhb, I hope you don't mind. I emailed you first and then
> >     thought others might have some ideas, too.
> 
> Oh no, this is fine. A broader discussion is probably warranted. I
> honestly
> don't know what the goal is. I do think it is an attempt to share with
> other
> file systems, but I'm not sure how desiredvnodes / 1000 is useful for
> that.
> It also seems that we can end up setting this woefully low as well.
> That is,
> I wonder if we need a minimum of 10% of hibufspace so that it can
> scale
> between 10% and 90% of hibufspace (but I'm not sure what you would use
> to
> pick the scaling factor sanely). To my mind what you really want to do
> is
> something like 'hibufspace / (number of active mounts)', but that will
> not
> really work correctly unless we recalculate the value on each mount
> and
> unmount operation.
> 
> --
> John Baldwin
Btw, this was done by r147280 6.5years ago, so the formula doesn't seem
to be causing a lot of grief. Also of some interest is the fact that
wcommitsize appears to have been setable on a per-mount-point-basis until
mount_nfs(8) was converted to nmount(2). { There is no nmount option to set it. }

Btw, when nm_wcommitsize is exceeded, writes become synchronous, so it affects
how much write behind happens. This, in turn, affects how bursty (is this a real
word? hopefully you get what I mean?) the write traffic to the server is.

What I'm not sure about is what happens when multiple mounts use up the entire
buffer cache with write behinds. I'll try a little experiment to see if I
can find that out. (If making it large isn't detrimental, then I tend to
agree that the above sets nm_wcommitsize very small.)

Since "desiredvnodes" will seldom be less than 1000, I'm not going to
rush to a solution.

Anyone who has insight into what this formula should be, please let us know.

rick