From owner-svn-src-all@FreeBSD.ORG  Tue Aug 24 09:05:01 2010
Return-Path: <owner-svn-src-all@FreeBSD.ORG>
Delivered-To: svn-src-all@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9D70A1065698;
	Tue, 24 Aug 2010 09:05:01 +0000 (UTC) (envelope-from des@des.no)
Received: from smtp.des.no (smtp.des.no [194.63.250.102])
	by mx1.freebsd.org (Postfix) with ESMTP id DAA278FC1F;
	Tue, 24 Aug 2010 09:05:00 +0000 (UTC)
Received: from ds4.des.no (des.no [84.49.246.2])
	by smtp.des.no (Postfix) with ESMTP id 67A8F1FFC33;
	Tue, 24 Aug 2010 09:04:59 +0000 (UTC)
Received: by ds4.des.no (Postfix, from userid 1001)
	id 38CA284557; Tue, 24 Aug 2010 11:04:59 +0200 (CEST)
From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To: Bruce Evans <brde@optusnet.com.au>
References: <201008141434.o7EEYaSA030301@svn.freebsd.org>
	<20100815205724.00007e0f@unknown> <86wrrraqk8.fsf@ds4.des.no>
	<4C7194E6.6080708@andric.com> <86y6bxybxc.fsf@ds4.des.no>
	<20100824100917.C22315@delplex.bde.org>
Date: Tue, 24 Aug 2010 11:04:58 +0200
In-Reply-To: <20100824100917.C22315@delplex.bde.org> (Bruce Evans's message of
	"Tue, 24 Aug 2010 10:12:34 +1000 (EST)")
Message-ID: <86tymkpeit.fsf@ds4.des.no>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=-=-="
Cc: Bruce Cran <bruce@cran.org.uk>, src-committers@FreeBSD.org,
	svn-src-all@FreeBSD.org, Bruce Evans <bde@FreeBSD.org>,
	svn-src-head@FreeBSD.org, Dimitry Andric <dimitry@andric.com>
Subject: Re: svn commit: r211304 - head/lib/libutil
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
	user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
	<mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all>
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
	<mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Aug 2010 09:05:01 -0000

--=-=-=
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Bruce Evans <brde@optusnet.com.au> writes:
> Dag-Erling Sm=C3=B8rgrav <des@des.no> writes:
> > That's awesome!  Any way I can convince you to fix expand_number() as
> > well?  I got a detailed explanation of what's wrong with it (both before
> > and after my commits) from bde@ (cc:ed) in private correspondence; I can
> > forward it to you if he doesn't mind.
> OK.
>
> I think the final point was that it should go back to supporting signed
> numbers (only), and that means int64_t ones until scientificize^dehumaniz=
e^W
> humanize_number() is fixed to support intmax_t ones.

See attachments.

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no


--=-=-=
Content-Type: message/rfc822

Date: Fri, 20 Aug 2010 06:51:47 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
To: Dag-Erling =?utf-8?Q?Sm=C3=B8rgrav?= <des@des.no>
Subject: Re: svn commit: r211304 - head/lib/libutil
Message-ID: <20100820043717.O18794@delplex.bde.org>
References: <201008141434.o7EEYaSA030301@svn.freebsd.org>
	<20100815213757.M14856@delplex.bde.org>
	<20100815235422.K14924@delplex.bde.org>
	<20100819034102.F17926@delplex.bde.org> <8639uadf7s.fsf@ds4.des.no>
MIME-Version: 1.0

>> The following places were even more broken to begin with:
>> geom/core/geom.c: this calls expand_number() on an intmax_t *
>> ipfw/dummynet.c: this calls expand_number() on an int64_t *, after starting
>>                  with a uint64_t * and bogusly casting to int64_t *.
>
> I'll fix these two, do you have further suggestions or a list of other
> instances that need fixing?
>
> BTW, would you prefer fixing geom.c to use uint64_t, or fixing
> expand_number() to use uintmax_t?

I prefer fixing expand_number() to support negative numbers again.  It
should support them since it is supposed to be the inverse of
scientificize^Whumanize_number(), which can and does print them in
a useful way for at least free space in df.  I don't know of any use
for them in input, but it simplifies at least the documentation to
have an exact inverse.

However, it should probably still return negative numbers via a
uintmax_t, like strtoumax() does, so that yet another BAD API isn't
required.  Externally, this requires documenting what expand_number()
actually does, since it uses strtoumax() internally but currently says
nothing except for the false statement that it is the inverse of
scientifize_number(), plus it requires something to determine whether
the number is signed.

Internally, this requires a bit more care with shifting of negative numbers.
A negative number is by definition one with a leading minus sign.
strtoumax() will return these converted to large positive numbers, and
currently any shift of them will give overflow.  Instead, they should
be converted to a sign and a positive value before shifting.  Pseudocode:

 	num = strtoumax();
 	Return if error.
 	shift = mumble();	/* mult = mumble() (recursive) for dd */
 	Return if error.
 	Return if shift == 1 (this gives strtou*()'s/our historical arcane
 	error handling for negative values, but only for non-shifted ones).
 	snum = strtoimax();
 	Return if error (must be range error with snum < 0).
 	signed = (snum < 0);
 	if (signed)
 		num = -(uintmax_t)snum;	/* XXX assumes normal machine */
 	cutlim = (signed ? INTMAX_MAX / ((intmax_t)1 << shift) :
 	    UINTMAX_MAX / ((uintmax_t)1 << shift);
 	if (num > cutlim)
 		overflow();
 	else
 		num <<= shift;
 	if (signed)
 		num = -(intmax_t)num;

My dd get_off_t() gets this wrong, oops.  Just using the unsigned version
fails for things like "-1k".  The -1 becomes UINTMAX_MAX and any shifing
of this overflows (or if overflow is ignored, gives back -1 on normal
machines).  -1k isn't useful in dd -- all that is needed is for things
like 0xFFFFFFFFFFFFFFFF (2**64-1) to work, which they don't if strtoimax()
is used -- with 64-bit intmax_t, it clips this value to 0x7FFFFFFFFFFFFFFF/
ERANGE.

Here is part of the man page for strtoumax() that gives details relevant
to expand_number():

% STRTOUL(3)             FreeBSD Library Functions Manual             STRTOUL(3)

Bug: there should be a separate man page for each function.

% 
% NAME
%      strtoul, strtoull, strtoumax, strtouq -- convert a string to an unsigned
%      long, unsigned long long, uintmax_t, or u_quad_t integer
% 
% LIBRARY
%      Standard C Library (libc, -lc)
% 
% SYNOPSIS
%      #include <stdlib.h>
%      #include <limits.h>

Bug: limits.h is not needed for using strtoul() or strtoull().

% 
%      unsigned long
%      strtoul(const char * restrict nptr, char ** restrict endptr, int base);
% 
%      unsigned long long
%      strtoull(const char * restrict nptr, char ** restrict endptr, int base);
% 
%      #include <inttypes.h>

Bug: not consistently buggy with the above -- no limits.h here.

% 
%      uintmax_t
%      strtoumax(const char * restrict nptr, char ** restrict endptr, int base);
% 
%      #include <sys/types.h>
%      #include <stdlib.h>
%      #include <limits.h>
% 
%      u_quad_t
%      strtouq(const char *nptr, char **endptr, int base);

Bugs:
1. sys/types.h is needed in theory but not in practice, since someone
    changed the actual declaration of this in stdlib.h to use uint64_t.
2. stdlib.h says that this and strtoq() are to be removed in FreeBSD-6.0,.

% 
% DESCRIPTION
%      The strtoul() function converts the string in nptr to an unsigned long
%      value.  The strtoull() function converts the string in nptr to an
%      unsigned long long value.  The strtoumax() function converts the string
%      in nptr to an uintmax_t value.  The strtouq() function converts the
%      string in nptr to a u_quad_t value.  The conversion is done according to
%      the given base, which must be between 2 and 36 inclusive, or be the spe-
%      cial value 0.
% 
%      The string may begin with an arbitrary amount of white space (as deter-
%      mined by isspace(3)) followed by a single optional `+' or `-' sign.  If
%      base is zero or 16, the string may then include a ``0x'' prefix, and the
%      number will be read in base 16; otherwise, a zero base is taken as 10
%      (decimal) unless the next character is `0', in which case it is taken as
%      8 (octal).

The detail about the base (0) and its prefixes, and the sign, are missing
in expand_number.3.

% 
%      The remainder of the string is converted to an unsigned long value in the
%      obvious manner, stopping at the end of the string or at the first charac-
%      ter that does not produce a valid digit in the given base.

More details missing.

%      (In bases
%      above 10, the letter `A' in either upper or lower case represents 10, `B'
%      represents 11, and so forth, with `Z' representing 35.)
% 
%      If endptr is not NULL, strtoul() stores the address of the first invalid

Bug: this applies to all the functions, not just strtoul(), so all of them
or better none of them should be explicitly mentioned here.

%      character in *endptr.  If there were no digits at all, however, strtoul()
%      stores the original value of nptr in *endptr.  (Thus, if *nptr is not
%      `\0' but **endptr is `\0' on return, the entire string was valid.)

Other details not relevant, but expand_number is missing the detail on the
error handling for trailing garbage.

% 
% RETURN VALUES
%      The strtoul(), strtoull(), strtoumax() and strtouq() functions return
%      either the result of the conversion or, if there was a leading minus
%      sign, the negation of the result of the conversion, unless the original
%      (non-negated) value would overflow; in the latter case, strtoul() returns
%      ULONG_MAX, strtoull() returns ULLONG_MAX, strtoumax() returns
%      UINTMAX_MAX, and strtouq() returns ULLONG_MAX.  In all cases, errno is

Here are the details for negative values and how strtou*() handle them

Bugs in expand_number.3, ones not already mentioned and a couple
mentioned again.

% EXPAND_NUMBER(3)       FreeBSD Library Functions Manual       EXPAND_NUMBER(3)
% 
% NAME
%      expand_number -- format a number from human readable form

Its bogus name is less than reflected in its bogus descrption. 
"format a number from human readable form" is exactly the same description
as is given to scientificize_number(), where its only incorrectness is
that it misspells "scientificize" as "humanize".  But this is supposed to
be the inverse function.

% DESCRIPTION
%      The expand_number() function unformats the buf string and stores a
%      unsigned 64-bit quantity at address pointed out by the num argument.
% 
%      The expand_number() function follows the SI power of two convention.
% 
%      The prefixes are:

Neither SI nor this function take prefixes.  They take suffixes.

% 
%            Prefix    Description    Multiplier

Another bogus "prefix".

The bogus [bB] suffix is not mentioned here.  Hmm, B would make sense
as a suffix to these suffixes (thus KB could be an alias for K, but
AFAIR scientificize number never produces either plain B or KB, and
the parser only recognizes a single-letter suffix).

%            k         kilo           1024

scientificize_number() only produces the K suffix, at least in df output.

% ERRORS
%      The expand_number() function will fail if:
% 
%      [EINVAL]           The given string contains no digits.

Not quite right.  "foo1" contains a digit, but should give EINVAL.

% 
%      [EINVAL]           An unrecognized prefix was given.

Another bogus "prefix".  Only the undocumented octal and hex prefixes
are supported, but this "prefix" means "suffix" as usual.

To give an exact inverse of scientificize_number(), the API also needs
to be bug for bug compatible (int64_t not uintmax_t), with no support
for octal or hex (not too bad).

Gak, prefixes (sic) are much more broken than I said above.  They are
also named prefixes in scientificize_number() sources, and there are
several variations, most of which are not supported by the "inverse"
function:

% int
% humanize_number(char *buf, size_t len, int64_t bytes,
%     const char *suffix, int scale, int flags)
% {
% 	const char *prefixes, *sep;

The usual confusion.

% 	int	b, i, r, maxscale, s1, s2, sign;
% 	int64_t	divisor, max;
% 	size_t	baselen;

Various style bugs.

% 
% 	assert(buf != NULL);
% 	assert(suffix != NULL);
% 	assert(scale >= 0);
% 
% 	if (flags & HN_DIVISOR_1000) {
% 		/* SI for decimal multiplies */

mulipliers

% 		divisor = 1000;
% 		if (flags & HN_B)
% 			prefixes = "B\0k\0M\0G\0T\0P\0E";
% 		else
% 			prefixes = "\0\0k\0M\0G\0T\0P\0E";

Classic SI prefixes, in which k means 1000 (is B really an SI prefix?).
expand_number() has no support for these.

% 	} else {
% 		/*
% 		 * binary multiplies
% 		 * XXX IEC 60027-2 recommends Ki, Mi, Gi...

Blech.

% 		 */
% 		divisor = 1024;
% 		if (flags & HN_B)
% 			prefixes = "B\0K\0M\0G\0T\0P\0E";
% 		else
% 			prefixes = "\0\0K\0M\0G\0T\0P\0E";

K seems to be the only letter that is non-ambiguous here.

% 	}
% 
% #define	SCALE2PREFIX(scale)	(&prefixes[(scale) << 1])

scientificize_number.3 duplicates some of the confusion:

% DESCRIPTION
%      The humanize_number() function formats the signed 64-bit quantity given
%      in number into buffer.  A space and then suffix is appended to the end.
%      The buffer pointed to by buffer must be at least len bytes long.
% 
%      If the formatted number (including suffix) would be too long to fit into
%      buffer, then divide number by 1024 until it will.  In this case, prefix
%      suffix with the appropriate SI designator.  The humanize_number() func-
%      tion follows the traditional computer science conventions rather than the
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
%      proposed SI power of two convention.

This paragaph hasn't caught up with the extension HN_DIVISOR_1000.

% 
%      The prefixes are:

The usual confusion.

% 
%            Prefix    Description    Multiplier             Multiplier 1000x
%            k         kilo           1024                   1000
%            M         mega           1048576                1000000
%            G         giga           1073741824             1000000000
%            T         tera           1099511627776          1000000000000
%            P         peta           1125899906842624       1000000000000000
%            E         exa            1152921504606846976    1000000000000000000

The "Multiplier 1000x" column covers the extension.

Even a scientist wouldn't write numbers as 1152921504606846976 or
1000000000000000000.  They would use exponential notation.

% 
%      The len argument must be at least 4 plus the length of suffix, in order
%      to ensure a useful result is generated into buffer.  To use a specific
%      prefix, specify this as scale (multiplier = 1024 ^ scale).  This cannot

Not with HN_DIVISOR_1000.

%            HN_B             Use `B' (bytes) as prefix if the original result
%                             does not have a prefix.

Now I see some B's (for 0B) in df output.

Conclusions:
- although I want a general number parser, expand_number() cannot
   be it since it needs to be the inverse of scientificize_number()
   which has too many warts to be inverted by a general number parser
   (short of one with billions of options).
- negative numbers are useful after all, and many things become simpler
   by restricting to them at all levels.  We lose the abilitity to
   "expand" UINTMAX_MAX, but scientificize() number never had that anyway.

Bruce


--=-=-=
Content-Type: message/rfc822

Date: Sat, 21 Aug 2010 07:06:51 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
To: Dag-Erling =?utf-8?Q?Sm=C3=B8rgrav?= <des@des.no>
Subject: Re: svn commit: r211304 - head/lib/libutil
Message-ID: <20100821065444.B1255@besplex.bde.org>
References: <201008141434.o7EEYaSA030301@svn.freebsd.org>
	<20100815213757.M14856@delplex.bde.org>
	<20100815235422.K14924@delplex.bde.org>
	<20100819034102.F17926@delplex.bde.org> <8639uadf7s.fsf@ds4.des.no>
	<20100820043717.O18794@delplex.bde.org> <86occxws1b.fsf@ds4.des.no>
MIME-Version: 1.0

>> I prefer fixing expand_number() to support negative numbers again.
>
> "again"?  It never did, AFAIK.

%%%
#include <libutil.h>
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>

static void
test(const char *s)
{
 	int64_t num;

 	num = 0xdead;
 	if (expand_number(s, &num) != 0)
 		printf("%s -> error\n", s);
 	else
 		printf("%s -> %jd\n", s, (intmax_t)num);
}

int
main(void)
{
 	test("-1");
 	test("-1k");
 	test("-1M");
 	test("-1G");
 	test("-1T");
 	test("-1P");
 	test("-1E");
 	return (0);
}
%%%

Output when linked against the old libutil
(-r--r--r--  1 root  wheel  123042 Jun 10 09:02 /usr/lib/libutil.a):

%%%
-1 -> -1
-1k -> -1024
-1M -> -1048576
-1G -> -1073741824
-1T -> -1099511627776
-1P -> -1125899906842624
-1E -> -1152921504606846976
%%%

Output when compiled with current sources: [First, the breakage of
libutil.h had to be fixed.  The above no longer compiles since libutil.h
was careful to only declare the types that it uses; it declares int64_t
since it used that, and it doesn't declare uint64_t since it didn't use
that and its typedefs haven't been updated.  My first attempt to fix this
broke it by changing the int64_t to uint64_t.  This left int64_t undeclared,
but it is still used by humanize_number().]

%%%
-1 -> -1
-1k -> error
-1M -> error
-1G -> error
-1T -> error
-1P -> error
-1E -> error
%%%

Bruce


--=-=-=--