From owner-freebsd-fs@FreeBSD.ORG Wed Aug 17 01:28:38 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CAD96106566C for ; Wed, 17 Aug 2011 01:28:38 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta03.emeryville.ca.mail.comcast.net (qmta03.emeryville.ca.mail.comcast.net [76.96.30.32]) by mx1.freebsd.org (Postfix) with ESMTP id AFBF68FC1C for ; Wed, 17 Aug 2011 01:28:38 +0000 (UTC) Received: from omta01.emeryville.ca.mail.comcast.net ([76.96.30.11]) by qmta03.emeryville.ca.mail.comcast.net with comcast id MDDQ1h0070EPchoA3DUaVs; Wed, 17 Aug 2011 01:28:34 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta01.emeryville.ca.mail.comcast.net with comcast id MDU91h00G1t3BNj8MDUPmC; Wed, 17 Aug 2011 01:28:39 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 50382102C1A; Tue, 16 Aug 2011 18:28:06 -0700 (PDT) Date: Tue, 16 Aug 2011 18:28:06 -0700 From: Jeremy Chadwick To: John Baldwin Message-ID: <20110817012806.GA29555@icarus.home.lan> References: <201108151343.14655.jhb@freebsd.org> <1730399830.175988.1313449094531.JavaMail.root@erie.cs.uoguelph.ca> <20110816022554.GA6018@icarus.home.lan> <201108160931.35626.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201108160931.35626.jhb@freebsd.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, onwahe@gmail.com Subject: Re: NFS calculation of max commit size X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 01:28:38 -0000 On Tue, Aug 16, 2011 at 09:31:35AM -0400, John Baldwin wrote: > On Monday, August 15, 2011 10:25:54 pm Jeremy Chadwick wrote: > > On Mon, Aug 15, 2011 at 06:58:14PM -0400, Rick Macklem wrote: > > > John Baldwin wrote: > > > > On Sunday, August 07, 2011 6:47:46 pm Rick Macklem wrote: > > > > > A recent PR (kern/159351) noted that the following > > > > > calculation results in a divide-by-zero when > > > > > desiredvnodes < 1000. > > > > > > > > > > nmp->nm_wcommitsize = hibufspace / (desiredvnodes / 1000); > > > > > > > > > > Just fixing the divide-by-zero is easy enough, but I'm not > > > > > sure what this calculation is trying to do. Making it a fraction > > > > > of "hibufspace" makes sense (nm_wcommitsize is the maximum # of > > > > > bytes of uncommitted data in the NFS client's buffer cache blocks, > > > > > if I understand it correctly), but why divide it by > > > > > > > > > > (desiredvnodes / 1000) ?? > > > > > > > > > > Maybe thinking that fewer vnodes means sharing it with fewer > > > > > other file systems or ??? > > > > > > > > > > Anyhow, it seems to me that the formulae is bogus for small > > > > > values of desiredvnodes (for example desiredvnodes == 1500 > > > > > implies nm_wcommitsize == hibufspace, which sounds too large > > > > > to me). > > > > > > > > > > I'm thinking that putting an upper limit of 10% of hibufspace > > > > > might make sense. ie. Change the above to: > > > > > > > > > > if (desiredvnodes >= 11000) > > > > > nmp->nm_wcommitsize = hibufspace / (desiredvnodes / 1000); > > > > > else > > > > > nmp->nm_wcommitsize = hibufspace / 10; > > > > > > > > > > Anyone have comments or insight into this calculation? > > > > > > > > > > rick > > > > > ps: jhb, I hope you don't mind. I emailed you first and then > > > > > thought others might have some ideas, too. > > > > > > > > Oh no, this is fine. A broader discussion is probably warranted. I > > > > honestly > > > > don't know what the goal is. I do think it is an attempt to share with > > > > other > > > > file systems, but I'm not sure how desiredvnodes / 1000 is useful for > > > > that. > > > > It also seems that we can end up setting this woefully low as well. > > > > That is, > > > > I wonder if we need a minimum of 10% of hibufspace so that it can > > > > scale > > > > between 10% and 90% of hibufspace (but I'm not sure what you would use > > > > to > > > > pick the scaling factor sanely). To my mind what you really want to do > > > > is > > > > something like 'hibufspace / (number of active mounts)', but that will > > > > not > > > > really work correctly unless we recalculate the value on each mount > > > > and > > > > unmount operation. > > > > > > > > -- > > > > John Baldwin > > > Btw, this was done by r147280 6.5years ago, so the formula doesn't seem > > > to be causing a lot of grief. Also of some interest is the fact that > > > wcommitsize appears to have been setable on a per-mount-point-basis until > > > mount_nfs(8) was converted to nmount(2). { There is no nmount option to set it. } > > > > > > Btw, when nm_wcommitsize is exceeded, writes become synchronous, so it affects > > > how much write behind happens. This, in turn, affects how bursty (is this a real > > > word? hopefully you get what I mean?) the write traffic to the server is. > > > > > > What I'm not sure about is what happens when multiple mounts use up the entire > > > buffer cache with write behinds. I'll try a little experiment to see if I > > > can find that out. (If making it large isn't detrimental, then I tend to > > > agree that the above sets nm_wcommitsize very small.) > > > > > > Since "desiredvnodes" will seldom be less than 1000, I'm not going to > > > rush to a solution. > > > > > > Anyone who has insight into what this formula should be, please let us know. > > > > The commit message tries to explain it, but it's more than just a > > one-line change. > > > > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/nfsclient/nfs_vfsops.c#rev1.177 > > > > There's also an associated PR: > > > > http://www.freebsd.org/cgi/query-pr.cgi?pr=79208 > > The commit added the limit which is sensible, but it doesn't explain the logic > for how the limit is computed (that is, why it uses desiredvnodes / 1000). Understood -- what I was getting at was that the individuals responsible for the commit (there were multiples who reviewed it) could be contacted and inquiries submit. :-) -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |