From owner-svn-src-all@FreeBSD.ORG Tue Aug 24 09:05:01 2010 Return-Path: Delivered-To: svn-src-all@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9D70A1065698; Tue, 24 Aug 2010 09:05:01 +0000 (UTC) (envelope-from des@des.no) Received: from smtp.des.no (smtp.des.no [194.63.250.102]) by mx1.freebsd.org (Postfix) with ESMTP id DAA278FC1F; Tue, 24 Aug 2010 09:05:00 +0000 (UTC) Received: from ds4.des.no (des.no [84.49.246.2]) by smtp.des.no (Postfix) with ESMTP id 67A8F1FFC33; Tue, 24 Aug 2010 09:04:59 +0000 (UTC) Received: by ds4.des.no (Postfix, from userid 1001) id 38CA284557; Tue, 24 Aug 2010 11:04:59 +0200 (CEST) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: Bruce Evans References: <201008141434.o7EEYaSA030301@svn.freebsd.org> <20100815205724.00007e0f@unknown> <86wrrraqk8.fsf@ds4.des.no> <4C7194E6.6080708@andric.com> <86y6bxybxc.fsf@ds4.des.no> <20100824100917.C22315@delplex.bde.org> Date: Tue, 24 Aug 2010 11:04:58 +0200 In-Reply-To: <20100824100917.C22315@delplex.bde.org> (Bruce Evans's message of "Tue, 24 Aug 2010 10:12:34 +1000 (EST)") Message-ID: <86tymkpeit.fsf@ds4.des.no> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Cc: Bruce Cran , src-committers@FreeBSD.org, svn-src-all@FreeBSD.org, Bruce Evans , svn-src-head@FreeBSD.org, Dimitry Andric Subject: Re: svn commit: r211304 - head/lib/libutil X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Aug 2010 09:05:01 -0000 --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Bruce Evans writes: > Dag-Erling Sm=C3=B8rgrav writes: > > That's awesome! Any way I can convince you to fix expand_number() as > > well? I got a detailed explanation of what's wrong with it (both before > > and after my commits) from bde@ (cc:ed) in private correspondence; I can > > forward it to you if he doesn't mind. > OK. > > I think the final point was that it should go back to supporting signed > numbers (only), and that means int64_t ones until scientificize^dehumaniz= e^W > humanize_number() is fixed to support intmax_t ones. See attachments. DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no --=-=-= Content-Type: message/rfc822 Date: Fri, 20 Aug 2010 06:51:47 +1000 (EST) From: Bruce Evans To: Dag-Erling =?utf-8?Q?Sm=C3=B8rgrav?= Subject: Re: svn commit: r211304 - head/lib/libutil Message-ID: <20100820043717.O18794@delplex.bde.org> References: <201008141434.o7EEYaSA030301@svn.freebsd.org> <20100815213757.M14856@delplex.bde.org> <20100815235422.K14924@delplex.bde.org> <20100819034102.F17926@delplex.bde.org> <8639uadf7s.fsf@ds4.des.no> MIME-Version: 1.0 >> The following places were even more broken to begin with: >> geom/core/geom.c: this calls expand_number() on an intmax_t * >> ipfw/dummynet.c: this calls expand_number() on an int64_t *, after starting >> with a uint64_t * and bogusly casting to int64_t *. > > I'll fix these two, do you have further suggestions or a list of other > instances that need fixing? > > BTW, would you prefer fixing geom.c to use uint64_t, or fixing > expand_number() to use uintmax_t? I prefer fixing expand_number() to support negative numbers again. It should support them since it is supposed to be the inverse of scientificize^Whumanize_number(), which can and does print them in a useful way for at least free space in df. I don't know of any use for them in input, but it simplifies at least the documentation to have an exact inverse. However, it should probably still return negative numbers via a uintmax_t, like strtoumax() does, so that yet another BAD API isn't required. Externally, this requires documenting what expand_number() actually does, since it uses strtoumax() internally but currently says nothing except for the false statement that it is the inverse of scientifize_number(), plus it requires something to determine whether the number is signed. Internally, this requires a bit more care with shifting of negative numbers. A negative number is by definition one with a leading minus sign. strtoumax() will return these converted to large positive numbers, and currently any shift of them will give overflow. Instead, they should be converted to a sign and a positive value before shifting. Pseudocode: num = strtoumax(); Return if error. shift = mumble(); /* mult = mumble() (recursive) for dd */ Return if error. Return if shift == 1 (this gives strtou*()'s/our historical arcane error handling for negative values, but only for non-shifted ones). snum = strtoimax(); Return if error (must be range error with snum < 0). signed = (snum < 0); if (signed) num = -(uintmax_t)snum; /* XXX assumes normal machine */ cutlim = (signed ? INTMAX_MAX / ((intmax_t)1 << shift) : UINTMAX_MAX / ((uintmax_t)1 << shift); if (num > cutlim) overflow(); else num <<= shift; if (signed) num = -(intmax_t)num; My dd get_off_t() gets this wrong, oops. Just using the unsigned version fails for things like "-1k". The -1 becomes UINTMAX_MAX and any shifing of this overflows (or if overflow is ignored, gives back -1 on normal machines). -1k isn't useful in dd -- all that is needed is for things like 0xFFFFFFFFFFFFFFFF (2**64-1) to work, which they don't if strtoimax() is used -- with 64-bit intmax_t, it clips this value to 0x7FFFFFFFFFFFFFFF/ ERANGE. Here is part of the man page for strtoumax() that gives details relevant to expand_number(): % STRTOUL(3) FreeBSD Library Functions Manual STRTOUL(3) Bug: there should be a separate man page for each function. % % NAME % strtoul, strtoull, strtoumax, strtouq -- convert a string to an unsigned % long, unsigned long long, uintmax_t, or u_quad_t integer % % LIBRARY % Standard C Library (libc, -lc) % % SYNOPSIS % #include % #include Bug: limits.h is not needed for using strtoul() or strtoull(). % % unsigned long % strtoul(const char * restrict nptr, char ** restrict endptr, int base); % % unsigned long long % strtoull(const char * restrict nptr, char ** restrict endptr, int base); % % #include Bug: not consistently buggy with the above -- no limits.h here. % % uintmax_t % strtoumax(const char * restrict nptr, char ** restrict endptr, int base); % % #include % #include % #include % % u_quad_t % strtouq(const char *nptr, char **endptr, int base); Bugs: 1. sys/types.h is needed in theory but not in practice, since someone changed the actual declaration of this in stdlib.h to use uint64_t. 2. stdlib.h says that this and strtoq() are to be removed in FreeBSD-6.0,. % % DESCRIPTION % The strtoul() function converts the string in nptr to an unsigned long % value. The strtoull() function converts the string in nptr to an % unsigned long long value. The strtoumax() function converts the string % in nptr to an uintmax_t value. The strtouq() function converts the % string in nptr to a u_quad_t value. The conversion is done according to % the given base, which must be between 2 and 36 inclusive, or be the spe- % cial value 0. % % The string may begin with an arbitrary amount of white space (as deter- % mined by isspace(3)) followed by a single optional `+' or `-' sign. If % base is zero or 16, the string may then include a ``0x'' prefix, and the % number will be read in base 16; otherwise, a zero base is taken as 10 % (decimal) unless the next character is `0', in which case it is taken as % 8 (octal). The detail about the base (0) and its prefixes, and the sign, are missing in expand_number.3. % % The remainder of the string is converted to an unsigned long value in the % obvious manner, stopping at the end of the string or at the first charac- % ter that does not produce a valid digit in the given base. More details missing. % (In bases % above 10, the letter `A' in either upper or lower case represents 10, `B' % represents 11, and so forth, with `Z' representing 35.) % % If endptr is not NULL, strtoul() stores the address of the first invalid Bug: this applies to all the functions, not just strtoul(), so all of them or better none of them should be explicitly mentioned here. % character in *endptr. If there were no digits at all, however, strtoul() % stores the original value of nptr in *endptr. (Thus, if *nptr is not % `\0' but **endptr is `\0' on return, the entire string was valid.) Other details not relevant, but expand_number is missing the detail on the error handling for trailing garbage. % % RETURN VALUES % The strtoul(), strtoull(), strtoumax() and strtouq() functions return % either the result of the conversion or, if there was a leading minus % sign, the negation of the result of the conversion, unless the original % (non-negated) value would overflow; in the latter case, strtoul() returns % ULONG_MAX, strtoull() returns ULLONG_MAX, strtoumax() returns % UINTMAX_MAX, and strtouq() returns ULLONG_MAX. In all cases, errno is Here are the details for negative values and how strtou*() handle them Bugs in expand_number.3, ones not already mentioned and a couple mentioned again. % EXPAND_NUMBER(3) FreeBSD Library Functions Manual EXPAND_NUMBER(3) % % NAME % expand_number -- format a number from human readable form Its bogus name is less than reflected in its bogus descrption. "format a number from human readable form" is exactly the same description as is given to scientificize_number(), where its only incorrectness is that it misspells "scientificize" as "humanize". But this is supposed to be the inverse function. % DESCRIPTION % The expand_number() function unformats the buf string and stores a % unsigned 64-bit quantity at address pointed out by the num argument. % % The expand_number() function follows the SI power of two convention. % % The prefixes are: Neither SI nor this function take prefixes. They take suffixes. % % Prefix Description Multiplier Another bogus "prefix". The bogus [bB] suffix is not mentioned here. Hmm, B would make sense as a suffix to these suffixes (thus KB could be an alias for K, but AFAIR scientificize number never produces either plain B or KB, and the parser only recognizes a single-letter suffix). % k kilo 1024 scientificize_number() only produces the K suffix, at least in df output. % ERRORS % The expand_number() function will fail if: % % [EINVAL] The given string contains no digits. Not quite right. "foo1" contains a digit, but should give EINVAL. % % [EINVAL] An unrecognized prefix was given. Another bogus "prefix". Only the undocumented octal and hex prefixes are supported, but this "prefix" means "suffix" as usual. To give an exact inverse of scientificize_number(), the API also needs to be bug for bug compatible (int64_t not uintmax_t), with no support for octal or hex (not too bad). Gak, prefixes (sic) are much more broken than I said above. They are also named prefixes in scientificize_number() sources, and there are several variations, most of which are not supported by the "inverse" function: % int % humanize_number(char *buf, size_t len, int64_t bytes, % const char *suffix, int scale, int flags) % { % const char *prefixes, *sep; The usual confusion. % int b, i, r, maxscale, s1, s2, sign; % int64_t divisor, max; % size_t baselen; Various style bugs. % % assert(buf != NULL); % assert(suffix != NULL); % assert(scale >= 0); % % if (flags & HN_DIVISOR_1000) { % /* SI for decimal multiplies */ mulipliers % divisor = 1000; % if (flags & HN_B) % prefixes = "B\0k\0M\0G\0T\0P\0E"; % else % prefixes = "\0\0k\0M\0G\0T\0P\0E"; Classic SI prefixes, in which k means 1000 (is B really an SI prefix?). expand_number() has no support for these. % } else { % /* % * binary multiplies % * XXX IEC 60027-2 recommends Ki, Mi, Gi... Blech. % */ % divisor = 1024; % if (flags & HN_B) % prefixes = "B\0K\0M\0G\0T\0P\0E"; % else % prefixes = "\0\0K\0M\0G\0T\0P\0E"; K seems to be the only letter that is non-ambiguous here. % } % % #define SCALE2PREFIX(scale) (&prefixes[(scale) << 1]) scientificize_number.3 duplicates some of the confusion: % DESCRIPTION % The humanize_number() function formats the signed 64-bit quantity given % in number into buffer. A space and then suffix is appended to the end. % The buffer pointed to by buffer must be at least len bytes long. % % If the formatted number (including suffix) would be too long to fit into % buffer, then divide number by 1024 until it will. In this case, prefix % suffix with the appropriate SI designator. The humanize_number() func- % tion follows the traditional computer science conventions rather than the ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ % proposed SI power of two convention. This paragaph hasn't caught up with the extension HN_DIVISOR_1000. % % The prefixes are: The usual confusion. % % Prefix Description Multiplier Multiplier 1000x % k kilo 1024 1000 % M mega 1048576 1000000 % G giga 1073741824 1000000000 % T tera 1099511627776 1000000000000 % P peta 1125899906842624 1000000000000000 % E exa 1152921504606846976 1000000000000000000 The "Multiplier 1000x" column covers the extension. Even a scientist wouldn't write numbers as 1152921504606846976 or 1000000000000000000. They would use exponential notation. % % The len argument must be at least 4 plus the length of suffix, in order % to ensure a useful result is generated into buffer. To use a specific % prefix, specify this as scale (multiplier = 1024 ^ scale). This cannot Not with HN_DIVISOR_1000. % HN_B Use `B' (bytes) as prefix if the original result % does not have a prefix. Now I see some B's (for 0B) in df output. Conclusions: - although I want a general number parser, expand_number() cannot be it since it needs to be the inverse of scientificize_number() which has too many warts to be inverted by a general number parser (short of one with billions of options). - negative numbers are useful after all, and many things become simpler by restricting to them at all levels. We lose the abilitity to "expand" UINTMAX_MAX, but scientificize() number never had that anyway. Bruce --=-=-= Content-Type: message/rfc822 Date: Sat, 21 Aug 2010 07:06:51 +1000 (EST) From: Bruce Evans To: Dag-Erling =?utf-8?Q?Sm=C3=B8rgrav?= Subject: Re: svn commit: r211304 - head/lib/libutil Message-ID: <20100821065444.B1255@besplex.bde.org> References: <201008141434.o7EEYaSA030301@svn.freebsd.org> <20100815213757.M14856@delplex.bde.org> <20100815235422.K14924@delplex.bde.org> <20100819034102.F17926@delplex.bde.org> <8639uadf7s.fsf@ds4.des.no> <20100820043717.O18794@delplex.bde.org> <86occxws1b.fsf@ds4.des.no> MIME-Version: 1.0 >> I prefer fixing expand_number() to support negative numbers again. > > "again"? It never did, AFAIK. %%% #include #include #include #include static void test(const char *s) { int64_t num; num = 0xdead; if (expand_number(s, &num) != 0) printf("%s -> error\n", s); else printf("%s -> %jd\n", s, (intmax_t)num); } int main(void) { test("-1"); test("-1k"); test("-1M"); test("-1G"); test("-1T"); test("-1P"); test("-1E"); return (0); } %%% Output when linked against the old libutil (-r--r--r-- 1 root wheel 123042 Jun 10 09:02 /usr/lib/libutil.a): %%% -1 -> -1 -1k -> -1024 -1M -> -1048576 -1G -> -1073741824 -1T -> -1099511627776 -1P -> -1125899906842624 -1E -> -1152921504606846976 %%% Output when compiled with current sources: [First, the breakage of libutil.h had to be fixed. The above no longer compiles since libutil.h was careful to only declare the types that it uses; it declares int64_t since it used that, and it doesn't declare uint64_t since it didn't use that and its typedefs haven't been updated. My first attempt to fix this broke it by changing the int64_t to uint64_t. This left int64_t undeclared, but it is still used by humanize_number().] %%% -1 -> -1 -1k -> error -1M -> error -1G -> error -1T -> error -1P -> error -1E -> error %%% Bruce --=-=-=--