Date: Sat, 16 Nov 2013 19:24:32 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: John Baldwin <jhb@FreeBSD.org> Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org Subject: Re: svn commit: r258174 - head/sys/kern Message-ID: <20131116183038.C1344@besplex.bde.org> In-Reply-To: <201311151529.rAFFTrTN062522@svn.freebsd.org> References: <201311151529.rAFFTrTN062522@svn.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 15 Nov 2013, John Baldwin wrote: > Log: > Don't allow vfs.lorunningspace or vfs.hirunningspace to be set such > that lorunningspace is greater than hirunningspace as the system > performs terribly if it is mistuned in this fashion. Ugh, almost all writeable sysctls have foot-shooting capabilities, and this one is especially not in need of anti-foot-shooting if it only gives terrible performance. Foot-shooting with sysctl gives things like panics if you are lucky. It is a feature that sysctls can be changed independently through possibly invalid intermediate states. This allows quick changes without having to think much about the intermediate states if you know that the invalid ones won't do any unrecoverable damage. I admit to being careful when changing the nearby sysctls for dirty buffers only through valid and not very suboptimal intermediate states because I don't know how harmful invalid intermediate states are. In my version of bge, I use the better design of scattered independent non-range-checked sysctls that mostly aren't sent to the hardware immediately; then another sysctl (or down/up) syncs the changes to the hardware. The valid range of each depends on the others in a complicated way that is too hard for the system to know. > Modified: head/sys/kern/vfs_bio.c > ============================================================================== > --- head/sys/kern/vfs_bio.c Fri Nov 15 15:14:07 2013 (r258173) > +++ head/sys/kern/vfs_bio.c Fri Nov 15 15:29:53 2013 (r258174) > @@ -118,6 +118,7 @@ static int flushbufqueues(int, int); > static void buf_daemon(void); > static void bremfreel(struct buf *bp); > static __inline void bd_wakeup(void); > +static int sysctl_runningspace(SYSCTL_HANDLER_ARGS); > #if defined(COMPAT_FREEBSD4) || defined(COMPAT_FREEBSD5) || \ > defined(COMPAT_FREEBSD6) || defined(COMPAT_FREEBSD7) > static int sysctl_bufspace(SYSCTL_HANDLER_ARGS); > @@ -167,10 +168,12 @@ static int bufdefragcnt; > SYSCTL_INT(_vfs, OID_AUTO, bufdefragcnt, CTLFLAG_RW, &bufdefragcnt, 0, > "Number of times we have had to repeat buffer allocation to defragment"); > static long lorunningspace; > -SYSCTL_LONG(_vfs, OID_AUTO, lorunningspace, CTLFLAG_RW, &lorunningspace, 0, > +SYSCTL_PROC(_vfs, OID_AUTO, lorunningspace, CTLTYPE_LONG | CTLFLAG_MPSAFE | > + CTLFLAG_RW, &lorunningspace, 0, sysctl_runningspace, "L", > "Minimum preferred space used for in-progress I/O"); > static long hirunningspace; > -SYSCTL_LONG(_vfs, OID_AUTO, hirunningspace, CTLFLAG_RW, &hirunningspace, 0, > +SYSCTL_PROC(_vfs, OID_AUTO, hirunningspace, CTLTYPE_LONG | CTLFLAG_MPSAFE | > + CTLFLAG_RW, &hirunningspace, 0, sysctl_runningspace, "L", > "Maximum amount of space to use for in-progress I/O"); > int dirtybufferflushes; > SYSCTL_INT(_vfs, OID_AUTO, dirtybufferflushes, CTLFLAG_RW, &dirtybufferflushes, > @@ -332,6 +335,34 @@ const char *buf_wmesg = BUF_WMESG; > #define VFS_BIO_NEED_FREE 0x04 /* wait for free bufs, hi hysteresis */ > #define VFS_BIO_NEED_BUFSPACE 0x08 /* wait for buf space, lo hysteresis */ > > +static int > +sysctl_runningspace(SYSCTL_HANDLER_ARGS) > +{ > + long value; > + int error; > + > + value = *(long *)arg1; > + error = sysctl_handle_long(oidp, &value, 0, req); > + if (error != 0 || req->newptr == NULL) > + return (error); > + mtx_lock(&rbreqlock); > + if (arg1 == &hirunningspace) { > + if (value < lorunningspace) > + error = EINVAL; > + else > + hirunningspace = value; > + } else { > + KASSERT(arg1 == &lorunningspace, > + ("%s: unknown arg1", __func__)); Style bug. vfs_bio.c doesn't use the __func__ obfuscation anywhere else. It had 65 KASSERT()'s. About half of these don't even print their function name. The fix is not to consistently use the obfuscation to print the function name. See an old bikeshed. > 66 times. > + if (value > hirunningspace) > + error = EINVAL; > + else > + lorunningspace = value; > + } Terrible performance is probably still possible by setting the watermarks to equal values. But it would be just a bug to prevent the user setting that, since it is a special case of a small gap between the watermarks. The system should not enforce any particular gap, since it doesn't know the correct gap and the reason for existence of these sysctls is to let the user tell it the correct gap. > + mtx_unlock(&rbreqlock); > + return (error); > +} > + > #if defined(COMPAT_FREEBSD4) || defined(COMPAT_FREEBSD5) || \ > defined(COMPAT_FREEBSD6) || defined(COMPAT_FREEBSD7) > static int FreeBSD has too many sysctls (3672 on freefall now), and this much code for each of them would take about as much object space as an entire non-bloated kernel (1-2MB). Much larger and complicated code would be required to prevent all invalid intermediate states. It is almost better to write a 10MB GUI application for each combination of related sysctls. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20131116183038.C1344>