Date: Mon, 15 Aug 2011 18:58:14 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: John Baldwin <jhb@freebsd.org> Cc: freebsd-fs@freebsd.org, onwahe@gmail.com Subject: Re: NFS calculation of max commit size Message-ID: <1730399830.175988.1313449094531.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <201108151343.14655.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
John Baldwin wrote: > On Sunday, August 07, 2011 6:47:46 pm Rick Macklem wrote: > > A recent PR (kern/159351) noted that the following > > calculation results in a divide-by-zero when > > desiredvnodes < 1000. > > > > nmp->nm_wcommitsize = hibufspace / (desiredvnodes / 1000); > > > > Just fixing the divide-by-zero is easy enough, but I'm not > > sure what this calculation is trying to do. Making it a fraction > > of "hibufspace" makes sense (nm_wcommitsize is the maximum # of > > bytes of uncommitted data in the NFS client's buffer cache blocks, > > if I understand it correctly), but why divide it by > > > > (desiredvnodes / 1000) ?? > > > > Maybe thinking that fewer vnodes means sharing it with fewer > > other file systems or ??? > > > > Anyhow, it seems to me that the formulae is bogus for small > > values of desiredvnodes (for example desiredvnodes == 1500 > > implies nm_wcommitsize == hibufspace, which sounds too large > > to me). > > > > I'm thinking that putting an upper limit of 10% of hibufspace > > might make sense. ie. Change the above to: > > > > if (desiredvnodes >= 11000) > > nmp->nm_wcommitsize = hibufspace / (desiredvnodes / 1000); > > else > > nmp->nm_wcommitsize = hibufspace / 10; > > > > Anyone have comments or insight into this calculation? > > > > rick > > ps: jhb, I hope you don't mind. I emailed you first and then > > thought others might have some ideas, too. > > Oh no, this is fine. A broader discussion is probably warranted. I > honestly > don't know what the goal is. I do think it is an attempt to share with > other > file systems, but I'm not sure how desiredvnodes / 1000 is useful for > that. > It also seems that we can end up setting this woefully low as well. > That is, > I wonder if we need a minimum of 10% of hibufspace so that it can > scale > between 10% and 90% of hibufspace (but I'm not sure what you would use > to > pick the scaling factor sanely). To my mind what you really want to do > is > something like 'hibufspace / (number of active mounts)', but that will > not > really work correctly unless we recalculate the value on each mount > and > unmount operation. > > -- > John Baldwin Btw, this was done by r147280 6.5years ago, so the formula doesn't seem to be causing a lot of grief. Also of some interest is the fact that wcommitsize appears to have been setable on a per-mount-point-basis until mount_nfs(8) was converted to nmount(2). { There is no nmount option to set it. } Btw, when nm_wcommitsize is exceeded, writes become synchronous, so it affects how much write behind happens. This, in turn, affects how bursty (is this a real word? hopefully you get what I mean?) the write traffic to the server is. What I'm not sure about is what happens when multiple mounts use up the entire buffer cache with write behinds. I'll try a little experiment to see if I can find that out. (If making it large isn't detrimental, then I tend to agree that the above sets nm_wcommitsize very small.) Since "desiredvnodes" will seldom be less than 1000, I'm not going to rush to a solution. Anyone who has insight into what this formula should be, please let us know. rick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1730399830.175988.1313449094531.JavaMail.root>