Date: Tue, 24 Aug 2010 11:04:58 +0200 From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no> To: Bruce Evans <brde@optusnet.com.au> Cc: Bruce Cran <bruce@cran.org.uk>, src-committers@FreeBSD.org, svn-src-all@FreeBSD.org, Bruce Evans <bde@FreeBSD.org>, svn-src-head@FreeBSD.org, Dimitry Andric <dimitry@andric.com> Subject: Re: svn commit: r211304 - head/lib/libutil Message-ID: <86tymkpeit.fsf@ds4.des.no> In-Reply-To: <20100824100917.C22315@delplex.bde.org> (Bruce Evans's message of "Tue, 24 Aug 2010 10:12:34 %2B1000 (EST)") References: <201008141434.o7EEYaSA030301@svn.freebsd.org> <20100815205724.00007e0f@unknown> <86wrrraqk8.fsf@ds4.des.no> <4C7194E6.6080708@andric.com> <86y6bxybxc.fsf@ds4.des.no> <20100824100917.C22315@delplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --]
Bruce Evans <brde@optusnet.com.au> writes:
> Dag-Erling Smørgrav <des@des.no> writes:
> > That's awesome! Any way I can convince you to fix expand_number() as
> > well? I got a detailed explanation of what's wrong with it (both before
> > and after my commits) from bde@ (cc:ed) in private correspondence; I can
> > forward it to you if he doesn't mind.
> OK.
>
> I think the final point was that it should go back to supporting signed
> numbers (only), and that means int64_t ones until scientificize^dehumanize^W
> humanize_number() is fixed to support intmax_t ones.
See attachments.
DES
--
Dag-Erling Smørgrav - des@des.no
[-- Attachment #2 --]
Date: Fri, 20 Aug 2010 06:51:47 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
To: Dag-Erling =?utf-8?Q?Sm=C3=B8rgrav?= <des@des.no>
Subject: Re: svn commit: r211304 - head/lib/libutil
Message-ID: <20100820043717.O18794@delplex.bde.org>
References: <201008141434.o7EEYaSA030301@svn.freebsd.org>
<20100815213757.M14856@delplex.bde.org>
<20100815235422.K14924@delplex.bde.org>
<20100819034102.F17926@delplex.bde.org> <8639uadf7s.fsf@ds4.des.no>
MIME-Version: 1.0
>> The following places were even more broken to begin with:
>> geom/core/geom.c: this calls expand_number() on an intmax_t *
>> ipfw/dummynet.c: this calls expand_number() on an int64_t *, after starting
>> with a uint64_t * and bogusly casting to int64_t *.
>
> I'll fix these two, do you have further suggestions or a list of other
> instances that need fixing?
>
> BTW, would you prefer fixing geom.c to use uint64_t, or fixing
> expand_number() to use uintmax_t?
I prefer fixing expand_number() to support negative numbers again. It
should support them since it is supposed to be the inverse of
scientificize^Whumanize_number(), which can and does print them in
a useful way for at least free space in df. I don't know of any use
for them in input, but it simplifies at least the documentation to
have an exact inverse.
However, it should probably still return negative numbers via a
uintmax_t, like strtoumax() does, so that yet another BAD API isn't
required. Externally, this requires documenting what expand_number()
actually does, since it uses strtoumax() internally but currently says
nothing except for the false statement that it is the inverse of
scientifize_number(), plus it requires something to determine whether
the number is signed.
Internally, this requires a bit more care with shifting of negative numbers.
A negative number is by definition one with a leading minus sign.
strtoumax() will return these converted to large positive numbers, and
currently any shift of them will give overflow. Instead, they should
be converted to a sign and a positive value before shifting. Pseudocode:
num = strtoumax();
Return if error.
shift = mumble(); /* mult = mumble() (recursive) for dd */
Return if error.
Return if shift == 1 (this gives strtou*()'s/our historical arcane
error handling for negative values, but only for non-shifted ones).
snum = strtoimax();
Return if error (must be range error with snum < 0).
signed = (snum < 0);
if (signed)
num = -(uintmax_t)snum; /* XXX assumes normal machine */
cutlim = (signed ? INTMAX_MAX / ((intmax_t)1 << shift) :
UINTMAX_MAX / ((uintmax_t)1 << shift);
if (num > cutlim)
overflow();
else
num <<= shift;
if (signed)
num = -(intmax_t)num;
My dd get_off_t() gets this wrong, oops. Just using the unsigned version
fails for things like "-1k". The -1 becomes UINTMAX_MAX and any shifing
of this overflows (or if overflow is ignored, gives back -1 on normal
machines). -1k isn't useful in dd -- all that is needed is for things
like 0xFFFFFFFFFFFFFFFF (2**64-1) to work, which they don't if strtoimax()
is used -- with 64-bit intmax_t, it clips this value to 0x7FFFFFFFFFFFFFFF/
ERANGE.
Here is part of the man page for strtoumax() that gives details relevant
to expand_number():
% STRTOUL(3) FreeBSD Library Functions Manual STRTOUL(3)
Bug: there should be a separate man page for each function.
%
% NAME
% strtoul, strtoull, strtoumax, strtouq -- convert a string to an unsigned
% long, unsigned long long, uintmax_t, or u_quad_t integer
%
% LIBRARY
% Standard C Library (libc, -lc)
%
% SYNOPSIS
% #include <stdlib.h>
% #include <limits.h>
Bug: limits.h is not needed for using strtoul() or strtoull().
%
% unsigned long
% strtoul(const char * restrict nptr, char ** restrict endptr, int base);
%
% unsigned long long
% strtoull(const char * restrict nptr, char ** restrict endptr, int base);
%
% #include <inttypes.h>
Bug: not consistently buggy with the above -- no limits.h here.
%
% uintmax_t
% strtoumax(const char * restrict nptr, char ** restrict endptr, int base);
%
% #include <sys/types.h>
% #include <stdlib.h>
% #include <limits.h>
%
% u_quad_t
% strtouq(const char *nptr, char **endptr, int base);
Bugs:
1. sys/types.h is needed in theory but not in practice, since someone
changed the actual declaration of this in stdlib.h to use uint64_t.
2. stdlib.h says that this and strtoq() are to be removed in FreeBSD-6.0,.
%
% DESCRIPTION
% The strtoul() function converts the string in nptr to an unsigned long
% value. The strtoull() function converts the string in nptr to an
% unsigned long long value. The strtoumax() function converts the string
% in nptr to an uintmax_t value. The strtouq() function converts the
% string in nptr to a u_quad_t value. The conversion is done according to
% the given base, which must be between 2 and 36 inclusive, or be the spe-
% cial value 0.
%
% The string may begin with an arbitrary amount of white space (as deter-
% mined by isspace(3)) followed by a single optional `+' or `-' sign. If
% base is zero or 16, the string may then include a ``0x'' prefix, and the
% number will be read in base 16; otherwise, a zero base is taken as 10
% (decimal) unless the next character is `0', in which case it is taken as
% 8 (octal).
The detail about the base (0) and its prefixes, and the sign, are missing
in expand_number.3.
%
% The remainder of the string is converted to an unsigned long value in the
% obvious manner, stopping at the end of the string or at the first charac-
% ter that does not produce a valid digit in the given base.
More details missing.
% (In bases
% above 10, the letter `A' in either upper or lower case represents 10, `B'
% represents 11, and so forth, with `Z' representing 35.)
%
% If endptr is not NULL, strtoul() stores the address of the first invalid
Bug: this applies to all the functions, not just strtoul(), so all of them
or better none of them should be explicitly mentioned here.
% character in *endptr. If there were no digits at all, however, strtoul()
% stores the original value of nptr in *endptr. (Thus, if *nptr is not
% `\0' but **endptr is `\0' on return, the entire string was valid.)
Other details not relevant, but expand_number is missing the detail on the
error handling for trailing garbage.
%
% RETURN VALUES
% The strtoul(), strtoull(), strtoumax() and strtouq() functions return
% either the result of the conversion or, if there was a leading minus
% sign, the negation of the result of the conversion, unless the original
% (non-negated) value would overflow; in the latter case, strtoul() returns
% ULONG_MAX, strtoull() returns ULLONG_MAX, strtoumax() returns
% UINTMAX_MAX, and strtouq() returns ULLONG_MAX. In all cases, errno is
Here are the details for negative values and how strtou*() handle them
Bugs in expand_number.3, ones not already mentioned and a couple
mentioned again.
% EXPAND_NUMBER(3) FreeBSD Library Functions Manual EXPAND_NUMBER(3)
%
% NAME
% expand_number -- format a number from human readable form
Its bogus name is less than reflected in its bogus descrption.
"format a number from human readable form" is exactly the same description
as is given to scientificize_number(), where its only incorrectness is
that it misspells "scientificize" as "humanize". But this is supposed to
be the inverse function.
% DESCRIPTION
% The expand_number() function unformats the buf string and stores a
% unsigned 64-bit quantity at address pointed out by the num argument.
%
% The expand_number() function follows the SI power of two convention.
%
% The prefixes are:
Neither SI nor this function take prefixes. They take suffixes.
%
% Prefix Description Multiplier
Another bogus "prefix".
The bogus [bB] suffix is not mentioned here. Hmm, B would make sense
as a suffix to these suffixes (thus KB could be an alias for K, but
AFAIR scientificize number never produces either plain B or KB, and
the parser only recognizes a single-letter suffix).
% k kilo 1024
scientificize_number() only produces the K suffix, at least in df output.
% ERRORS
% The expand_number() function will fail if:
%
% [EINVAL] The given string contains no digits.
Not quite right. "foo1" contains a digit, but should give EINVAL.
%
% [EINVAL] An unrecognized prefix was given.
Another bogus "prefix". Only the undocumented octal and hex prefixes
are supported, but this "prefix" means "suffix" as usual.
To give an exact inverse of scientificize_number(), the API also needs
to be bug for bug compatible (int64_t not uintmax_t), with no support
for octal or hex (not too bad).
Gak, prefixes (sic) are much more broken than I said above. They are
also named prefixes in scientificize_number() sources, and there are
several variations, most of which are not supported by the "inverse"
function:
% int
% humanize_number(char *buf, size_t len, int64_t bytes,
% const char *suffix, int scale, int flags)
% {
% const char *prefixes, *sep;
The usual confusion.
% int b, i, r, maxscale, s1, s2, sign;
% int64_t divisor, max;
% size_t baselen;
Various style bugs.
%
% assert(buf != NULL);
% assert(suffix != NULL);
% assert(scale >= 0);
%
% if (flags & HN_DIVISOR_1000) {
% /* SI for decimal multiplies */
mulipliers
% divisor = 1000;
% if (flags & HN_B)
% prefixes = "B\0k\0M\0G\0T\0P\0E";
% else
% prefixes = "\0\0k\0M\0G\0T\0P\0E";
Classic SI prefixes, in which k means 1000 (is B really an SI prefix?).
expand_number() has no support for these.
% } else {
% /*
% * binary multiplies
% * XXX IEC 60027-2 recommends Ki, Mi, Gi...
Blech.
% */
% divisor = 1024;
% if (flags & HN_B)
% prefixes = "B\0K\0M\0G\0T\0P\0E";
% else
% prefixes = "\0\0K\0M\0G\0T\0P\0E";
K seems to be the only letter that is non-ambiguous here.
% }
%
% #define SCALE2PREFIX(scale) (&prefixes[(scale) << 1])
scientificize_number.3 duplicates some of the confusion:
% DESCRIPTION
% The humanize_number() function formats the signed 64-bit quantity given
% in number into buffer. A space and then suffix is appended to the end.
% The buffer pointed to by buffer must be at least len bytes long.
%
% If the formatted number (including suffix) would be too long to fit into
% buffer, then divide number by 1024 until it will. In this case, prefix
% suffix with the appropriate SI designator. The humanize_number() func-
% tion follows the traditional computer science conventions rather than the
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
% proposed SI power of two convention.
This paragaph hasn't caught up with the extension HN_DIVISOR_1000.
%
% The prefixes are:
The usual confusion.
%
% Prefix Description Multiplier Multiplier 1000x
% k kilo 1024 1000
% M mega 1048576 1000000
% G giga 1073741824 1000000000
% T tera 1099511627776 1000000000000
% P peta 1125899906842624 1000000000000000
% E exa 1152921504606846976 1000000000000000000
The "Multiplier 1000x" column covers the extension.
Even a scientist wouldn't write numbers as 1152921504606846976 or
1000000000000000000. They would use exponential notation.
%
% The len argument must be at least 4 plus the length of suffix, in order
% to ensure a useful result is generated into buffer. To use a specific
% prefix, specify this as scale (multiplier = 1024 ^ scale). This cannot
Not with HN_DIVISOR_1000.
% HN_B Use `B' (bytes) as prefix if the original result
% does not have a prefix.
Now I see some B's (for 0B) in df output.
Conclusions:
- although I want a general number parser, expand_number() cannot
be it since it needs to be the inverse of scientificize_number()
which has too many warts to be inverted by a general number parser
(short of one with billions of options).
- negative numbers are useful after all, and many things become simpler
by restricting to them at all levels. We lose the abilitity to
"expand" UINTMAX_MAX, but scientificize() number never had that anyway.
Bruce
[-- Attachment #3 --]
Date: Sat, 21 Aug 2010 07:06:51 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
To: Dag-Erling =?utf-8?Q?Sm=C3=B8rgrav?= <des@des.no>
Subject: Re: svn commit: r211304 - head/lib/libutil
Message-ID: <20100821065444.B1255@besplex.bde.org>
References: <201008141434.o7EEYaSA030301@svn.freebsd.org>
<20100815213757.M14856@delplex.bde.org>
<20100815235422.K14924@delplex.bde.org>
<20100819034102.F17926@delplex.bde.org> <8639uadf7s.fsf@ds4.des.no>
<20100820043717.O18794@delplex.bde.org> <86occxws1b.fsf@ds4.des.no>
MIME-Version: 1.0
>> I prefer fixing expand_number() to support negative numbers again.
>
> "again"? It never did, AFAIK.
%%%
#include <libutil.h>
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
static void
test(const char *s)
{
int64_t num;
num = 0xdead;
if (expand_number(s, &num) != 0)
printf("%s -> error\n", s);
else
printf("%s -> %jd\n", s, (intmax_t)num);
}
int
main(void)
{
test("-1");
test("-1k");
test("-1M");
test("-1G");
test("-1T");
test("-1P");
test("-1E");
return (0);
}
%%%
Output when linked against the old libutil
(-r--r--r-- 1 root wheel 123042 Jun 10 09:02 /usr/lib/libutil.a):
%%%
-1 -> -1
-1k -> -1024
-1M -> -1048576
-1G -> -1073741824
-1T -> -1099511627776
-1P -> -1125899906842624
-1E -> -1152921504606846976
%%%
Output when compiled with current sources: [First, the breakage of
libutil.h had to be fixed. The above no longer compiles since libutil.h
was careful to only declare the types that it uses; it declares int64_t
since it used that, and it doesn't declare uint64_t since it didn't use
that and its typedefs haven't been updated. My first attempt to fix this
broke it by changing the int64_t to uint64_t. This left int64_t undeclared,
but it is still used by humanize_number().]
%%%
-1 -> -1
-1k -> error
-1M -> error
-1G -> error
-1T -> error
-1P -> error
-1E -> error
%%%
Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86tymkpeit.fsf>
