Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 16 Nov 2013 19:24:32 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        John Baldwin <jhb@FreeBSD.org>
Cc:        svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org
Subject:   Re: svn commit: r258174 - head/sys/kern
Message-ID:  <20131116183038.C1344@besplex.bde.org>
In-Reply-To: <201311151529.rAFFTrTN062522@svn.freebsd.org>
References:  <201311151529.rAFFTrTN062522@svn.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 15 Nov 2013, John Baldwin wrote:

> Log:
>  Don't allow vfs.lorunningspace or vfs.hirunningspace to be set such
>  that lorunningspace is greater than hirunningspace as the system
>  performs terribly if it is mistuned in this fashion.

Ugh, almost all writeable sysctls have foot-shooting capabilities, and
this one is especially not in need of anti-foot-shooting if it only
gives terrible performance.  Foot-shooting with sysctl gives things
like panics if you are lucky.

It is a feature that sysctls can be changed independently through
possibly invalid intermediate states.  This allows quick changes
without having to think much about the intermediate states if you
know that the invalid ones won't do any unrecoverable damage.  I
admit to being careful when changing the nearby sysctls for dirty
buffers only through valid and not very suboptimal intermediate
states because I don't know how harmful invalid intermediate states
are.  In my version of bge, I use the better design of scattered
independent non-range-checked sysctls that mostly aren't sent to
the hardware immediately; then another sysctl (or down/up) syncs
the changes to the hardware.  The valid range of each depends on
the others in a complicated way that is too hard for the system
to know.

> Modified: head/sys/kern/vfs_bio.c
> ==============================================================================
> --- head/sys/kern/vfs_bio.c	Fri Nov 15 15:14:07 2013	(r258173)
> +++ head/sys/kern/vfs_bio.c	Fri Nov 15 15:29:53 2013	(r258174)
> @@ -118,6 +118,7 @@ static int flushbufqueues(int, int);
> static void buf_daemon(void);
> static void bremfreel(struct buf *bp);
> static __inline void bd_wakeup(void);
> +static int sysctl_runningspace(SYSCTL_HANDLER_ARGS);
> #if defined(COMPAT_FREEBSD4) || defined(COMPAT_FREEBSD5) || \
>     defined(COMPAT_FREEBSD6) || defined(COMPAT_FREEBSD7)
> static int sysctl_bufspace(SYSCTL_HANDLER_ARGS);
> @@ -167,10 +168,12 @@ static int bufdefragcnt;
> SYSCTL_INT(_vfs, OID_AUTO, bufdefragcnt, CTLFLAG_RW, &bufdefragcnt, 0,
>     "Number of times we have had to repeat buffer allocation to defragment");
> static long lorunningspace;
> -SYSCTL_LONG(_vfs, OID_AUTO, lorunningspace, CTLFLAG_RW, &lorunningspace, 0,
> +SYSCTL_PROC(_vfs, OID_AUTO, lorunningspace, CTLTYPE_LONG | CTLFLAG_MPSAFE |
> +    CTLFLAG_RW, &lorunningspace, 0, sysctl_runningspace, "L",
>     "Minimum preferred space used for in-progress I/O");
> static long hirunningspace;
> -SYSCTL_LONG(_vfs, OID_AUTO, hirunningspace, CTLFLAG_RW, &hirunningspace, 0,
> +SYSCTL_PROC(_vfs, OID_AUTO, hirunningspace, CTLTYPE_LONG | CTLFLAG_MPSAFE |
> +    CTLFLAG_RW, &hirunningspace, 0, sysctl_runningspace, "L",
>     "Maximum amount of space to use for in-progress I/O");
> int dirtybufferflushes;
> SYSCTL_INT(_vfs, OID_AUTO, dirtybufferflushes, CTLFLAG_RW, &dirtybufferflushes,
> @@ -332,6 +335,34 @@ const char *buf_wmesg = BUF_WMESG;
> #define VFS_BIO_NEED_FREE	0x04	/* wait for free bufs, hi hysteresis */
> #define VFS_BIO_NEED_BUFSPACE	0x08	/* wait for buf space, lo hysteresis */
>
> +static int
> +sysctl_runningspace(SYSCTL_HANDLER_ARGS)
> +{
> +	long value;
> +	int error;
> +
> +	value = *(long *)arg1;
> +	error = sysctl_handle_long(oidp, &value, 0, req);
> +	if (error != 0 || req->newptr == NULL)
> +		return (error);
> +	mtx_lock(&rbreqlock);
> +	if (arg1 == &hirunningspace) {
> +		if (value < lorunningspace)
> +			error = EINVAL;
> +		else
> +			hirunningspace = value;
> +	} else {
> +		KASSERT(arg1 == &lorunningspace,
> +		    ("%s: unknown arg1", __func__));

Style bug.  vfs_bio.c doesn't use the __func__ obfuscation anywhere else.
It had 65 KASSERT()'s.  About half of these don't even print their
function name.  The fix is not to consistently use the obfuscation to
print the function name.  See an old bikeshed.

> 66 times.

> +		if (value > hirunningspace)
> +			error = EINVAL;
> +		else
> +			lorunningspace = value;
> +	}

Terrible performance is probably still possible by setting the watermarks
to equal values.  But it would be just a bug to prevent the user setting
that, since it is a special case of a small gap between the watermarks.
The system should not enforce any particular gap, since it doesn't know
the correct gap and the reason for existence of these sysctls is to let
the user tell it the correct gap.

> +	mtx_unlock(&rbreqlock);
> +	return (error);
> +}
> +
> #if defined(COMPAT_FREEBSD4) || defined(COMPAT_FREEBSD5) || \
>     defined(COMPAT_FREEBSD6) || defined(COMPAT_FREEBSD7)
> static int

FreeBSD has too many sysctls (3672 on freefall now), and this much code
for each of them would take about as much object space as an entire
non-bloated kernel (1-2MB).  Much larger and complicated code would be
required to prevent all invalid intermediate states.  It is almost better
to write a 10MB GUI application for each combination of related sysctls.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20131116183038.C1344>