Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 2 May 2017 15:17:11 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        Alan Somers <asomers@freebsd.org>, Gleb Smirnoff <glebius@freebsd.org>, "src-committers@freebsd.org" <src-committers@freebsd.org>, "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>, "svn-src-head@freebsd.org" <svn-src-head@freebsd.org>
Subject:   Re: svn commit: r317061 - in head: libexec/rpc.rstatd sys/amd64/amd64 sys/amd64/include sys/arm/arm sys/arm/include sys/arm64/include sys/cddl/contrib/opensolaris/uts/common/fs/zfs sys/compat/linprocfs...
Message-ID:  <20170502121711.GE1622@kib.kiev.ua>
In-Reply-To: <20170502203703.I1176@besplex.bde.org>
References:  <201704171734.v3HHYlf5022945@repo.freebsd.org> <CAOtMX2jdNj0du0ZuUKPr16iHK_YeNVzf-nDvwC-MuFM003VVAg@mail.gmail.com> <20170419130428.J956@besplex.bde.org> <20170430201324.GV1622@kib.kiev.ua> <20170501163725.U972@besplex.bde.org> <20170502095527.GB1622@kib.kiev.ua> <20170502203703.I1176@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, May 02, 2017 at 09:25:40PM +1000, Bruce Evans wrote:
> On Tue, 2 May 2017, Konstantin Belousov wrote:
> 
> > On Mon, May 01, 2017 at 06:25:11PM +1000, Bruce Evans wrote:
> >> XX Index: subr_counter.c
> >> XX ===================================================================
> >> XX --- subr_counter.c	(revision 317604)
> >> XX +++ subr_counter.c	(working copy)
> >> XX @@ -78,11 +78,15 @@
> >> XX  sysctl_handle_counter_u64(SYSCTL_HANDLER_ARGS)
> >> XX  {
> >> XX  	uint64_t out;
> >> XX +	uint32_t out32;
> >> XX  	int error;
> >> XX
> >> XX  	out = counter_u64_fetch(*(counter_u64_t *)arg1);
> >> XX
> >> XX -	error = SYSCTL_OUT(req, &out, sizeof(uint64_t));
> >> XX +	if (req->oldlen == 4 && (out32 = out) == out)
> >> XX +		error = SYSCTL_OUT(req, &out32, sizeof(out32));
> >> XX +	else
> >> XX +		error = SYSCTL_OUT(req, &out, sizeof(out));
> >> XX
> >> XX  	if (error || !req->newptr)
> >> XX  		return (error);
> >>
> >> This does the minimum necessary to fix the current problem.
> >>
> >> This works until the counters become too large for a u_int.  There
> >> could be an option to get truncation with no error, but that would
> >> require an API change, so applications should be required to do the
> >> truncation for themself if that is what they want.
> >
> > Yes, this is approximately what I consider to be enough fix, but with
> > some details which I do not like and which did not worked well on my
> > test machine with enough memory to generate lot of increments.
> >
> > Below is what I consider to be good enough.  I will proceed with this
> > version unless very strong objections appear.
> 
> I prefer my version.  It is the same as for sysctl_bufspace(), except
> for simplifications and bug fixes that are easier because the types
> are unsigned.
> 
> Your version is especially broken for the case of a lot of increments.
You consider the 'differentiators' as the main users, but I only want
systat and top to not abort, I have no intent of making them work
for updated counters if overflown.  Instead, I want something human-visible
so that person looking at raw counters values using old tools detected the 
failure.

> 
> Clamping breaks many cases in systat -v which just needs unsigned counters
> to calculate small differences.  Not many counters overflow faster than
> the refresh interval in systat or top.  These cases used to appear to
> work perfectly for most counters when the kernel did blind truncations.
> Clamping breaks the API.  Not returing ENOMEM breaks the API.
> 
> In my version, this is handled as specified by the API by truncating and
> returning ENOMEM.  Then:
> - top sees the ENOMEM and mishandles it by aborting
> - systat sees the ENOMEM (and also sees a short count, if it has the newer
>    breakage of hard-coding 64-bit counters instead of u_int ones, and is
>    run on a kernel with u_int counters) and mishandles it better by
>    printing an error message but not aborting.  Thus it appears to work
>    just as well as when the kernel did blind truncation, except for the
>    error message.
ENOMEM is, of course, the situation which I want to avoid.

> 
> Clamping would be difficult for the application to handle even if an
> error is returned.  This would be like the situation for clamping for
> strtoul(), but worse since it changes the API.  Many applications don't
> understand how to handle errno for strtoul(), but they do do a bogus
> check of the clamped value.  For sysctl(), clamping is not part of
> the API now, so applications shouldn't check for the clamped values.
> Ones that work with deltas like systat -v would see the counters stop
> incrementing when the clamped value is reached, while truncated values
> allow them to continue to appear to work perfectly.
> 
> > diff --git a/sys/vm/vm_meter.c b/sys/vm/vm_meter.c
> > index 5f4cd46ab1e..93d5161de89 100644
> > --- a/sys/vm/vm_meter.c
> > +++ b/sys/vm/vm_meter.c
> > @@ -266,8 +266,29 @@ static SYSCTL_NODE(_vm_stats, OID_AUTO, vm, CTLFLAG_RW, 0,
> > 	"VM meter vm stats");
> > SYSCTL_NODE(_vm_stats, OID_AUTO, misc, CTLFLAG_RW, 0, "VM meter misc stats");
> >
> > +static int
> > +sysctl_handle_vmstat(SYSCTL_HANDLER_ARGS)
> > +{
> > +	uint64_t out;
> > +#ifdef COMPAT_FREEBSD11
> > +	uint32_t out1;
> > +#endif
> 
> The extra wrapper and the ifdef limit the damage from the API change.
> Otherwise, I don't like them.
> 
> I prefer my variable name out32 to the nondescript out1.  ('out' should
> also be renamed out64, but I wanted to keep the diffs small.)
Changed to the out32 name.

> 
> > +
> > +	out = counter_u64_fetch(*(counter_u64_t *)arg1);
> > +#ifdef COMPAT_FREEBSD11
> > +	if (req->oldlen == sizeof(uint32_t)) {
> 
> sizeof(uint32_t) is an obfuscated spelling of '4'.  The size is already
> hard-coded in '32'.  This depends on sizeof(char) being 1.  It is, and
> won't change.  POSIX started specifying this in 2001, and the mere
> existence of uint8_t probably requires 8-bit chars since u_char must
> be the smallest type of an object.  This depends on the size of a u_int
> being 32 bits/4 bytes in the old ABI.  This won't change, and is already
> hard-coded as 32, as it must be because sizeof(u_int) may change.
I prefer the sizeof over 4.

> 
> > +		out1 = 0xffffffff;
> > +		if (out < out1)
> > +			out1 = out;
> 
> This does the clamping.  Clamping is probably better expressed as:
> 
>  		out1 = MIN(out, 0xffffffff)
Changed as well.

> 
> > +		return (SYSCTL_OUT(req, &out1, sizeof(uint32_t)));
> 
> Here sizeof(uint32_t) is an obfuscated spelling of sizeof(out1).  I think
> the sizeof() is justified in this case, to limit the hard-coding to of
> sizes to 1 place.  Truncation by assigment to the destination variable also
> helps limit this
>     (sysctl_bufspace() could also be improved by truncation by assignment.
>     Its integers are signed, so the result is implementation-defined, but
>     we know that it is benign 2's complement and don't really care what it
>     is provided it is reasonable.  This is better than laboriously checking
>     the lower limit or neglecting to check it).
> Here the limit 0xffffffff also depends on the size.
Yes, there is no UINT32_MAX, and never will.

> 
> > +	}
> > +#endif
> > +	return (SYSCTL_OUT(req, &out, sizeof(uint64_t)));
> > +}
> > +
> > #define	VM_STATS(parent, var, descr) \
> > -    SYSCTL_COUNTER_U64(parent, OID_AUTO, var, CTLFLAG_RD, &vm_cnt.var, descr)
> > +    SYSCTL_OID(parent, OID_AUTO, var, CTLTYPE_U64 | CTLFLAG_MPSAFE | \
> > +    CTLFLAG_RD, &vm_cnt.var, 0, sysctl_handle_vmstat, "QU", descr);
> 
> I see a new problem, shared with my version.  The sysctl data says that this
> sysctl is pure 64 bits, but it can return 32 bits.  The sysctl data just
> can't describe the variation.  I guess this is not much a problem, since it
> takes magic to read the sysctl data from the kernel and not many programs
> except sysctl know how to do it.  There programs will see the pure 64
> bits and only attempt to use 64 bit types.

It can return 32bit only on improper use, which we allow for ABI compat.
So I do not see this as a problem either.

diff --git a/sys/vm/vm_meter.c b/sys/vm/vm_meter.c
index 5f4cd46ab1e..67f24609b5a 100644
--- a/sys/vm/vm_meter.c
+++ b/sys/vm/vm_meter.c
@@ -266,8 +266,27 @@ static SYSCTL_NODE(_vm_stats, OID_AUTO, vm, CTLFLAG_RW, 0,
 	"VM meter vm stats");
 SYSCTL_NODE(_vm_stats, OID_AUTO, misc, CTLFLAG_RW, 0, "VM meter misc stats");
 
+static int
+sysctl_handle_vmstat(SYSCTL_HANDLER_ARGS)
+{
+	uint64_t out;
+#ifdef COMPAT_FREEBSD11
+	uint32_t out32;
+#endif
+
+	out = counter_u64_fetch(*(counter_u64_t *)arg1);
+#ifdef COMPAT_FREEBSD11
+	if (req->oldlen == sizeof(uint32_t)) {
+		out32 = MIN(out, 0xffffffff);
+		return (SYSCTL_OUT(req, &out32, sizeof(uint32_t)));
+	}
+#endif
+	return (SYSCTL_OUT(req, &out, sizeof(uint64_t)));
+}
+
 #define	VM_STATS(parent, var, descr) \
-    SYSCTL_COUNTER_U64(parent, OID_AUTO, var, CTLFLAG_RD, &vm_cnt.var, descr)
+    SYSCTL_OID(parent, OID_AUTO, var, CTLTYPE_U64 | CTLFLAG_MPSAFE | \
+    CTLFLAG_RD, &vm_cnt.var, 0, sysctl_handle_vmstat, "QU", descr);
 #define	VM_STATS_VM(var, descr)		VM_STATS(_vm_stats_vm, var, descr)
 #define	VM_STATS_SYS(var, descr)	VM_STATS(_vm_stats_sys, var, descr)
 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170502121711.GE1622>