Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 2 May 2017 23:31:21 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        Bruce Evans <brde@optusnet.com.au>, Alan Somers <asomers@freebsd.org>,  Gleb Smirnoff <glebius@freebsd.org>,  "src-committers@freebsd.org" <src-committers@freebsd.org>,  "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>,  "svn-src-head@freebsd.org" <svn-src-head@freebsd.org>
Subject:   Re: svn commit: r317061 - in head: libexec/rpc.rstatd sys/amd64/amd64 sys/amd64/include sys/arm/arm sys/arm/include sys/arm64/include sys/cddl/contrib/opensolaris/uts/common/fs/zfs sys/compat/linprocfs...
Message-ID:  <20170502223324.P1508@besplex.bde.org>
In-Reply-To: <20170502121711.GE1622@kib.kiev.ua>
References:  <201704171734.v3HHYlf5022945@repo.freebsd.org> <CAOtMX2jdNj0du0ZuUKPr16iHK_YeNVzf-nDvwC-MuFM003VVAg@mail.gmail.com> <20170419130428.J956@besplex.bde.org> <20170430201324.GV1622@kib.kiev.ua> <20170501163725.U972@besplex.bde.org> <20170502095527.GB1622@kib.kiev.ua> <20170502203703.I1176@besplex.bde.org> <20170502121711.GE1622@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 2 May 2017, Konstantin Belousov wrote:

> On Tue, May 02, 2017 at 09:25:40PM +1000, Bruce Evans wrote:
>> On Tue, 2 May 2017, Konstantin Belousov wrote:
>>
>>> On Mon, May 01, 2017 at 06:25:11PM +1000, Bruce Evans wrote:
>>>> XX Index: subr_counter.c
>>>> XX ===================================================================
>>>> XX --- subr_counter.c	(revision 317604)
>>>> XX +++ subr_counter.c	(working copy)
>>>> XX @@ -78,11 +78,15 @@
>>>> XX  sysctl_handle_counter_u64(SYSCTL_HANDLER_ARGS)
>>>> XX  {
>>>> XX  	uint64_t out;
>>>> XX +	uint32_t out32;
>>>> XX  	int error;
>>>> XX
>>>> XX  	out = counter_u64_fetch(*(counter_u64_t *)arg1);
>>>> XX
>>>> XX -	error = SYSCTL_OUT(req, &out, sizeof(uint64_t));
>>>> XX +	if (req->oldlen == 4 && (out32 = out) == out)
>>>> XX +		error = SYSCTL_OUT(req, &out32, sizeof(out32));
>>>> XX +	else
>>>> XX +		error = SYSCTL_OUT(req, &out, sizeof(out));
>>>> XX
>>>> XX  	if (error || !req->newptr)
>>>> XX  		return (error);
>>>>
>>>> This does the minimum necessary to fix the current problem.
>>>>
>>>> This works until the counters become too large for a u_int.  There
>>>> could be an option to get truncation with no error, but that would
>>>> require an API change, so applications should be required to do the
>>>> truncation for themself if that is what they want.
>>>
>>> Yes, this is approximately what I consider to be enough fix, but with
>>> some details which I do not like and which did not worked well on my
>>> test machine with enough memory to generate lot of increments.
>>>
>>> Below is what I consider to be good enough.  I will proceed with this
>>> version unless very strong objections appear.
>>
>> I prefer my version.  It is the same as for sysctl_bufspace(), except
>> for simplifications and bug fixes that are easier because the types
>> are unsigned.
>>
>> Your version is especially broken for the case of a lot of increments.
> You consider the 'differentiators' as the main users, but I only want
> systat and top to not abort, I have no intent of making them work
> for updated counters if overflown.  Instead, I want something human-visible
> so that person looking at raw counters values using old tools detected the
> failure.

Clamping doesn't work for that either.  Truncating either just works using
differences like it has done for 30 years or so, or gives weirder values
that clamping.

Arguably, the API is that the counters may wrap, and applications must
handle this as well as possible by sampling them more often than they
wrap, so that only the initial values might be wrong.  Applications should
also have attempt to guess when the initial values are wrong, and print
hints about this that are more visible than printing blindly wrapped
value.

>> Clamping breaks many cases in systat -v which just needs unsigned counters
>> to calculate small differences.  Not many counters overflow faster than
>> the refresh interval in systat or top.  These cases used to appear to
>> work perfectly for most counters when the kernel did blind truncations.
>> Clamping breaks the API.  Not returing ENOMEM breaks the API.
>>
>> In my version, this is handled as specified by the API by truncating and
>> returning ENOMEM.  Then:
>> - top sees the ENOMEM and mishandles it by aborting
>> - systat sees the ENOMEM (and also sees a short count, if it has the newer
>>    breakage of hard-coding 64-bit counters instead of u_int ones, and is
>>    run on a kernel with u_int counters) and mishandles it better by
>>    printing an error message but not aborting.  Thus it appears to work
>>    just as well as when the kernel did blind truncation, except for the
>>    error message.
> ENOMEM is, of course, the situation which I want to avoid.

Then you have to return no error, but truncate the value instead of
clamping.  Anything else is incompatible.

To find the buggy applications, it is better to return a new errno, not
ENOMEM.  EOVERFLOW resulted in top aborting with a useful error message.

>>> ...
>>> +	out = counter_u64_fetch(*(counter_u64_t *)arg1);
>>> +#ifdef COMPAT_FREEBSD11
>>> +	if (req->oldlen == sizeof(uint32_t)) {
>>
>> sizeof(uint32_t) is an obfuscated spelling of '4'.  The size is already
>> hard-coded in '32'.  This depends on sizeof(char) being 1.  It is, and
>> won't change.  POSIX started specifying this in 2001, and the mere
>> existence of uint8_t probably requires 8-bit chars since u_char must
>> be the smallest type of an object.  This depends on the size of a u_int
>> being 32 bits/4 bytes in the old ABI.  This won't change, and is already
>> hard-coded as 32, as it must be because sizeof(u_int) may change.
> I prefer the sizeof over 4.

sizeof is really bloated when you handle all the sizes as in my larger
patch.  The basic sizes are 1, 2, 4 and 8 bytes (and possibly a larger
intmax_t), and it is a detail that the sizes in the integer types are
in bits.  However, once we have set up the variables to match the sizes
in bytes, it is not so bad to use sizeof(var) instead of sizeof(typename).
The size is then hard-coded in only 1 place.  That works here too:
use sizeof(out32).

>>> +		out1 = 0xffffffff;
>>> +		if (out < out1)
>>> +			out1 = out;
>>
>> This does the clamping.  Clamping is probably better expressed as:
>>
>>  		out1 = MIN(out, 0xffffffff)
> Changed as well.

I forgot to mention that I don't really like MIN().  It is an unsafe macro
with ugly spelling indicating that it is unsafe.  4.4BSD removed it in the
kernel and tried to replace it by the imin() family of inline functions, but
this API is hard to use because it is not type-generic.  The kernel was
reinfected with MIN() via contrib'ed code using home-made versions of MIN().
uqmin() would work here, but quad_t is another mistake...

>>
>>> +		return (SYSCTL_OUT(req, &out1, sizeof(uint32_t)));

sizeof(var) works even better here when it is used above.

>>
>> Here sizeof(uint32_t) is an obfuscated spelling of sizeof(out1).  I think
>> the sizeof() is justified in this case, to limit the hard-coding to of
>> sizes to 1 place.  Truncation by assigment to the destination variable also
>> helps limit this
>>     (sysctl_bufspace() could also be improved by truncation by assignment.
>>     Its integers are signed, so the result is implementation-defined, but
>>     we know that it is benign 2's complement and don't really care what it
>>     is provided it is reasonable.  This is better than laboriously checking
>>     the lower limit or neglecting to check it).
>> Here the limit 0xffffffff also depends on the size.
> Yes, there is no UINT32_MAX, and never will.

You mean that you don't want to use it here.

Of course, UINT32_MAX does exist, and is quite delicate due to
complications to give it the correct type.  It must have the type of
its default promotion, and this depends on the size of int and long.
So it is defined in an MD file, and has an unnecessary U suffix on
i386.  This contrasts with UINT16_MAX, which is also defined in an MD
file but has type signed int on all arches and must be defined without
a U suffix, since all arches have more than 16 bits in it.  There is
also the UINT32_C() macro which has to append U on x86 since smaller
constants or the same constant in decimal would not be naturally
unsigned.

>>> +	}
>>> +#endif
>>> +	return (SYSCTL_OUT(req, &out, sizeof(uint64_t)));
>>> +}
>>> +
>>> #define	VM_STATS(parent, var, descr) \
>>> -    SYSCTL_COUNTER_U64(parent, OID_AUTO, var, CTLFLAG_RD, &vm_cnt.var, descr)
>>> +    SYSCTL_OID(parent, OID_AUTO, var, CTLTYPE_U64 | CTLFLAG_MPSAFE | \
>>> +    CTLFLAG_RD, &vm_cnt.var, 0, sysctl_handle_vmstat, "QU", descr);
>>
>> I see a new problem, shared with my version.  The sysctl data says that this
>> sysctl is pure 64 bits, but it can return 32 bits.  The sysctl data just
>> can't describe the variation.  I guess this is not much a problem, since it
>> takes magic to read the sysctl data from the kernel and not many programs
>> except sysctl know how to do it.  There programs will see the pure 64
>> bits and only attempt to use 64 bit types.
>
> It can return 32bit only on improper use, which we allow for ABI compat.
> So I do not see this as a problem either.

I'm trying to fix the API.

Another bug in the public API is that there is no way to probe for the
sign of a type.  This is the only useful thing in the sysct data.

> diff --git a/sys/vm/vm_meter.c b/sys/vm/vm_meter.c
> index 5f4cd46ab1e..67f24609b5a 100644
> --- a/sys/vm/vm_meter.c
> +++ b/sys/vm/vm_meter.c
> @@ -266,8 +266,27 @@ static SYSCTL_NODE(_vm_stats, OID_AUTO, vm, CTLFLAG_RW, 0,
> 	"VM meter vm stats");
> SYSCTL_NODE(_vm_stats, OID_AUTO, misc, CTLFLAG_RW, 0, "VM meter misc stats");
>
> +static int
> +sysctl_handle_vmstat(SYSCTL_HANDLER_ARGS)
> +{
> +	uint64_t out;
> +#ifdef COMPAT_FREEBSD11
> +	uint32_t out32;
> +#endif
> +
> +	out = counter_u64_fetch(*(counter_u64_t *)arg1);
> +#ifdef COMPAT_FREEBSD11
> +	if (req->oldlen == sizeof(uint32_t)) {
> +		out32 = MIN(out, 0xffffffff);

Incompatible for negative advantages.  It should truncate.

> +		return (SYSCTL_OUT(req, &out32, sizeof(uint32_t)));

It seems to be necessary to drop the error reporting for compatility.
How does sysctl_bufspace() work when it intentionally keeps the error
reporting and even has a comment about this?

> +	}
> +#endif
> +	return (SYSCTL_OUT(req, &out, sizeof(uint64_t)));
> +}

This still has the style bug 'sizeof(typename)'.

If you want to use the type name too much, then you can also use it to
spell 0xffffffff as (uint32_t)-1.  Then the size 4 bytes would be
consistently represented as 32 bits.  Going from type variable out32
to its maximum should be spellable as __maxof(out32) but __maxof()
doesn't exist.

> +
> #define	VM_STATS(parent, var, descr) \
> -    SYSCTL_COUNTER_U64(parent, OID_AUTO, var, CTLFLAG_RD, &vm_cnt.var, descr)
> +    SYSCTL_OID(parent, OID_AUTO, var, CTLTYPE_U64 | CTLFLAG_MPSAFE | \
> +    CTLFLAG_RD, &vm_cnt.var, 0, sysctl_handle_vmstat, "QU", descr);
> #define	VM_STATS_VM(var, descr)		VM_STATS(_vm_stats_vm, var, descr)
> #define	VM_STATS_SYS(var, descr)	VM_STATS(_vm_stats_sys, var, descr)

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170502223324.P1508>