From owner-freebsd-standards@FreeBSD.ORG Wed Jun 29 23:32:18 2011 Return-Path: Delivered-To: standards@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 083471065673; Wed, 29 Jun 2011 23:32:18 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au [211.29.132.184]) by mx1.freebsd.org (Postfix) with ESMTP id 7DF828FC12; Wed, 29 Jun 2011 23:32:17 +0000 (UTC) Received: from c122-106-165-191.carlnfd1.nsw.optusnet.com.au (c122-106-165-191.carlnfd1.nsw.optusnet.com.au [122.106.165.191]) by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p5TNWBg6004343 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 30 Jun 2011 09:32:12 +1000 Date: Thu, 30 Jun 2011 09:32:11 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stefan Esser In-Reply-To: <4E0B1C47.4010201@freebsd.org> Message-ID: <20110630073705.P1117@besplex.bde.org> References: <99048.1309258976@critter.freebsd.dk> <4E0A0774.3090004@freebsd.org> <20110629082103.O1084@besplex.bde.org> <4E0B1C47.4010201@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: standards@freebsd.org, Poul-Henning Kamp , Stefan Esser , Bruce Evans , Alexander Best Subject: Re: [RFC] Consistent numeric range for "expr" on all architectures X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Jun 2011 23:32:18 -0000 On Wed, 29 Jun 2011, Stefan Esser wrote: > Am 29.06.2011 01:06, schrieb Bruce Evans: >> Other points: >> - `expr -e 10000000000000000000 + 0' (19 zeros) gives "Result too large", >> but it isn't the result that is too large, but the arg that is too large. >> This message is strerror(ERANGE) after strtoimax() sets errno to ERANGE. >> `expr -e 1000000000000000000 \* 10' gives "overflow". This message is >> correct, but it is in a different style to strerror() (uncapitalized, >> and more concise). > > The patch that I sent with my first message fixes this. The message is > changed from "Result too large" to "Not a valid integer: %s". > > ("non-numeric argument" is used in other places and I could adapt to > that, though I prefer to also see which argument was rejected. But I > think that "not a valid integer" better describes the situation.) I prefer "operand too large". A decimal integer operand that is too large to be represented by intmax_t is not really invalid, but is just too large. We already have a different message for invalid. >From an old (?) version of the man page: % Arithmetic operations are performed using signed integer math. If the -e % flag is specified, arithmetic uses the C intmax_t data type (the largest ^^^^^^^^^^ actual arithmentic, not just parsing; but for expr [-e] 10000000000000000000 we are doing actual arithmetic -- see below % integral type available), and expr will detect arithmetic overflow and % return an error indication. If a numeric operand is specified which is "Numeric" is not defined anywhere in the man page. This is the only use of it in the man page. It means "decimal integer" and should say precisely that. The only hint about this in the man page is the statement that "all integer operands are interpreted in base 10". The fuzziness extends to error messages saying "non-numeric argument" instead of "operand not a decimal integer [used in integer context]". % so large as to overflow conversion to an integer, it is parsed as a % string instead. If -e is not specified, arithmetic operations and pars- This specificially says that large operands are parsed as strings. Strangely, since large operands are only checked for with -e, only -e can get this right; without -e, large operands are not even detected. However, this is a bug in the man page -- see below. % ing of integer arguments will overflow silently according to the rules of % the C standard, using the long data type. This says that the -e case is broken, but doesn't override the statement that large operands are parsed as strings. Since the man page is wrong, no override is needed. I originally though of using "argument" instead of "operand", but got the better word from the above section of the man page. > Without "-e" the numeric result is "undefined" and no error is signaled, > since there was no test whether the conversion succeeded before I added > it back in 2000. I first though that the error reporting must be delayed to when an operand is used in an expression, even with -e. But it is already delayed, and the parsing works as specified in POSIX. The parsing is just poorly or incorrectly documented in the man page. - The syntax in the man page doesn't seem to mention the degenerate expression . POSIX specifies this of course. can be either or , where is an optional unary minus followed by digits, and is any argument that is not an and not an operator symbol. Therefore, "expr -e 1000000000000000000" is not a syntax error as seems to be required by the man page; the arg in it forms a degenerate expression. The arg is not a since it is an . Therefore, the expression is numeric (I didn't check that POSIX says this explicitly). Therefore, we are justified in applying strtoimax() to all the operands in the expression (all 1 of them) and getting a range error. - The man page is broken in saying that unrepresentable numeric operands are parsed as strings instead. Whether an operand is numeric is determined by the POSIX syntax which is purely lexical and doesn't depend on representability. So for the degnerate expression with 1 operand, the type of the expression is determined by the type of the operand which is detemined lexically as described above. Similarly for parenthesized degenerate expressions. For non-degenerate expressions, the types of the operators and of the result are again mostly or always determined lexically by the types of the results. For example, '=' means equality of integers if both operands are integers, but equality of of strings if one or both operands is not an integer. - In all cases, whether an operand is an integer is context-dependent, so args must not be classified early. This seems to be done correctly, so the code conforms to the POSIX syntax although the man page doesn't. The syntax is still broken as designed, since it doesn't allow +1 to be an integer, and it requires octal intgers to be misinterpreted as decimal integers although no reasonable specification of decimal integers allows them to start with a '0', and it doesn't support non-decimal integers... >> - POSIX requires brokenness for bases other than 10, but I wonder if an >> arg like 0x10 invokes undefined behaviour and thus can be made to >> work. (I wanted to use a hex number since I can never remember what >> INTMAX_MAX is in decimal and wanted to type it in hex for checking >> the range and overflow errors.) Allowing hex args causes fewer >> problems than allowing decimal args larger than INT32_MAX, since >> they are obviously unportable. Some FreeBSD utilities, e.g., dd, >> support hex args and don't worry about POSIX restricting them. > > Does POSIX require that expr exits on illegal arguments? Not sure. It requires an exit status of 2 for invalid expressions. For the "+" operator, the operands are required to be decimal integers, but the error handling isn't so clearly specified. For the "&" operator, operands(s) are allowed to be null. Anyway, hex numbers can't be put through this gap. Since they are not decimal integers, they are required to be interpreted as strings in some contexts. So "expr 0x10 \< 2" gives 1 because the string "0" is less then the string "2". This conflicts with "expr 16 \< 2" giving 0 since both operands are intgers and of course 16 = 0x10 is not less than 2. >> - POSIX unfortunately requires args larger than INT32_MAX to be unportable >> (to work if longs are longer than 32 bits, else to give undefined (?) >> behaviour. For portability there could be a -p switch that limits args >> to INT32_MAX even if longs are longer than 32 bits. > > Well, undefined behaviour can always be to return the correct result ;-) > > I'd be willing to add "-p" (effectively just make "-e" the default that > can be overridden). I now don't see any problem with -e. Not even the one for degenerate expressions that I thought I saw. POSIX says that shell expressions should be prefered to expr, and for shell expressions it has a non-null discussion of representability and overflows. It basically says that only long arithmetic is supported, without even C's type suffixes which are needed to extend to unsigned long arithmetic, but extensions are encouraged. Bruce