Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 6 Nov 2015 10:07:58 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        Kirk McKusick <mckusick@mckusick.com>, fs@freebsd.org
Subject:   Re: an easy (?) question on namecache sizing
Message-ID:  <20151106094022.N3939@besplex.bde.org>
In-Reply-To: <20151105205255.GL2257@kib.kiev.ua>
References:  <20151105195648.GK2257@kib.kiev.ua> <201511052025.tA5KPcLF066724@chez.mckusick.com> <20151105205255.GL2257@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 5 Nov 2015, Konstantin Belousov wrote:

> On Thu, Nov 05, 2015 at 12:25:38PM -0800, Kirk McKusick wrote:
>>> Date: Thu, 5 Nov 2015 21:56:48 +0200
>>> From: Konstantin Belousov <kostikbel@gmail.com>
>>> To: Kirk McKusick <mckusick@mckusick.com>
>>> Subject: Re: an easy (?) question on namecache sizing
>>> Cc: fs@freebsd.org
>>>
>>> On Thu, Nov 05, 2015 at 10:56:55AM -0800, Kirk McKusick wrote:
>>>>
>>>> I propose that we update wantfreevnodes in sysctl_update_desiredvnodes()
>>>> so that it tracks the change in desiredvnodes:
>>>>
>>>> Index: /sys/kern/vfs_subr.c
>>>> ===================================================================
>>>> --- /sys/kern/vfs_subr.c	(revision 290387)
>>>> +++ /sys/kern/vfs_subr.c	(working copy)
>>>> @@ -293,6 +293,7 @@
>>>>  	if (old_desiredvnodes != desiredvnodes) {
>>>> +		wantfreevnodes = desiredvnodes / 4;
>>>>  		vfs_hash_changesize(desiredvnodes);
>>>>  		cache_changesize(desiredvnodes);
>>>>  	}
>>>>  	return (0);
>>>>  }
>>>>
>>>> Otherwise bumping up desiredvnodes will be less effective than expected.
>>>>
>>>> I see that Bruce has also suggested this change in his more extensive
>>>> revisions.
>>>
>>> I think the idea is right, but the implementation is not. Just changing
>>> wantfreevnodes after desirevnodes was reduced, creates a window where an
>>> other thread could see small value for desiredvnodes, but large value
>>> for wantfreevnodes. Then, e.g. vlrureclaim() would go wild. IMO it should
>>> ensure that the observable values are non-contradictory.

I found that this isn't a problem, provided some care is taken to work
around bogus variable types in expressions like
(desiredvnodes - wantfreevnodes).

   (These variables are bogusly unsigned, so expressions like this can
   give unsign extension bugs by having huge unsigned values instead of
   small negative ones.  The variables are also bogusly long, so they
   they can have preposterous values which overflow when in expressions
   like vnrlu_free(freevnodes - wantfreevnodes) since vnlru_free()'s
   arg type is not consistently bogus.  Their longness is hard to change
   since it is part of the sysctl ABI.)

The code is quite robust to inconsistent and preposterous values.
vlrureclaim() explicitly fixes up the preposterous user value of
desiredvnodes <= 0 (it must be precisely 0 then, and the compiler
should warn about the '<' part), but most fixups occur implicitly.  It
has always been possible to set wantfreevnodes to a much larger value
than desiredvnodes.  Nothing bad happens, but I found it convenient
that a change like the above made wantfreevnodes reasonable when making
large changes to desiredvnodes.

>> Does moving the setting of wantfreevnodes before the cache size changes
>> (as redone above) close the window enough? The vlrureclaim() function
>> operates slowly enough that a brief period of inconsistency seems
>> unimportant. Changing desiredvnodes happens very rarely. And at the moment
>> we are not correcting wantfreevnodes at all. Or am I missing some key point?

What happens is that there can be a long period of inconsistency before
vlrureclaim() runs.  It then makes everything consistent.  You already
depend on this by not bothering to call vlrureclaim() from the sysctl.
There are also some unlocked accesses to the variables (including in the
sysctl?), so we depend on wrong values of the variables due to races
only giving transient inconsistencies.  (The long types also ask for
races :-).)  Perfect locking for all of this would be painful.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20151106094022.N3939>