Date: Fri, 6 Nov 2015 10:07:58 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Konstantin Belousov <kostikbel@gmail.com> Cc: Kirk McKusick <mckusick@mckusick.com>, fs@freebsd.org Subject: Re: an easy (?) question on namecache sizing Message-ID: <20151106094022.N3939@besplex.bde.org> In-Reply-To: <20151105205255.GL2257@kib.kiev.ua> References: <20151105195648.GK2257@kib.kiev.ua> <201511052025.tA5KPcLF066724@chez.mckusick.com> <20151105205255.GL2257@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 5 Nov 2015, Konstantin Belousov wrote: > On Thu, Nov 05, 2015 at 12:25:38PM -0800, Kirk McKusick wrote: >>> Date: Thu, 5 Nov 2015 21:56:48 +0200 >>> From: Konstantin Belousov <kostikbel@gmail.com> >>> To: Kirk McKusick <mckusick@mckusick.com> >>> Subject: Re: an easy (?) question on namecache sizing >>> Cc: fs@freebsd.org >>> >>> On Thu, Nov 05, 2015 at 10:56:55AM -0800, Kirk McKusick wrote: >>>> >>>> I propose that we update wantfreevnodes in sysctl_update_desiredvnodes() >>>> so that it tracks the change in desiredvnodes: >>>> >>>> Index: /sys/kern/vfs_subr.c >>>> =================================================================== >>>> --- /sys/kern/vfs_subr.c (revision 290387) >>>> +++ /sys/kern/vfs_subr.c (working copy) >>>> @@ -293,6 +293,7 @@ >>>> if (old_desiredvnodes != desiredvnodes) { >>>> + wantfreevnodes = desiredvnodes / 4; >>>> vfs_hash_changesize(desiredvnodes); >>>> cache_changesize(desiredvnodes); >>>> } >>>> return (0); >>>> } >>>> >>>> Otherwise bumping up desiredvnodes will be less effective than expected. >>>> >>>> I see that Bruce has also suggested this change in his more extensive >>>> revisions. >>> >>> I think the idea is right, but the implementation is not. Just changing >>> wantfreevnodes after desirevnodes was reduced, creates a window where an >>> other thread could see small value for desiredvnodes, but large value >>> for wantfreevnodes. Then, e.g. vlrureclaim() would go wild. IMO it should >>> ensure that the observable values are non-contradictory. I found that this isn't a problem, provided some care is taken to work around bogus variable types in expressions like (desiredvnodes - wantfreevnodes). (These variables are bogusly unsigned, so expressions like this can give unsign extension bugs by having huge unsigned values instead of small negative ones. The variables are also bogusly long, so they they can have preposterous values which overflow when in expressions like vnrlu_free(freevnodes - wantfreevnodes) since vnlru_free()'s arg type is not consistently bogus. Their longness is hard to change since it is part of the sysctl ABI.) The code is quite robust to inconsistent and preposterous values. vlrureclaim() explicitly fixes up the preposterous user value of desiredvnodes <= 0 (it must be precisely 0 then, and the compiler should warn about the '<' part), but most fixups occur implicitly. It has always been possible to set wantfreevnodes to a much larger value than desiredvnodes. Nothing bad happens, but I found it convenient that a change like the above made wantfreevnodes reasonable when making large changes to desiredvnodes. >> Does moving the setting of wantfreevnodes before the cache size changes >> (as redone above) close the window enough? The vlrureclaim() function >> operates slowly enough that a brief period of inconsistency seems >> unimportant. Changing desiredvnodes happens very rarely. And at the moment >> we are not correcting wantfreevnodes at all. Or am I missing some key point? What happens is that there can be a long period of inconsistency before vlrureclaim() runs. It then makes everything consistent. You already depend on this by not bothering to call vlrureclaim() from the sysctl. There are also some unlocked accesses to the variables (including in the sysctl?), so we depend on wrong values of the variables due to races only giving transient inconsistencies. (The long types also ask for races :-).) Perfect locking for all of this would be painful. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20151106094022.N3939>