From owner-freebsd-standards@FreeBSD.ORG Wed Jun 29 20:21:31 2011 Return-Path: Delivered-To: freebsd-standards@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 624781065670 for ; Wed, 29 Jun 2011 20:21:31 +0000 (UTC) (envelope-from se@freebsd.org) Received: from nm17.bullet.mail.bf1.yahoo.com (nm17.bullet.mail.bf1.yahoo.com [98.139.212.176]) by mx1.freebsd.org (Postfix) with SMTP id EB35C8FC13 for ; Wed, 29 Jun 2011 20:21:30 +0000 (UTC) Received: from [98.139.212.148] by nm17.bullet.mail.bf1.yahoo.com with NNFMP; 29 Jun 2011 20:07:45 -0000 Received: from [98.139.211.204] by tm5.bullet.mail.bf1.yahoo.com with NNFMP; 29 Jun 2011 20:07:44 -0000 Received: from [127.0.0.1] by smtp213.mail.bf1.yahoo.com with NNFMP; 29 Jun 2011 20:07:44 -0000 X-Yahoo-Newman-Id: 918647.82623.bm@smtp213.mail.bf1.yahoo.com Received: from [192.168.119.20] (se@81.173.158.168 with plain) by smtp213.mail.bf1.yahoo.com with SMTP; 29 Jun 2011 13:07:44 -0700 PDT X-Yahoo-SMTP: iDf2N9.swBDAhYEh7VHfpgq0lnq. X-YMail-OSG: sGJ0h0IVM1nweiP28hoWO89pdFFk1nd4Z8Ev0wEjDjGKCvH YMjSumln2ZrDnEKu0HicXuRQ5CuwsoSzyVByKJFeYekKk3k2t7VHaj3lmPRN 9qwr20HhoRb4cOL2FQPYUdThKfWMHoFBOan5aTCiUeXWdm0bFGOsHkUtHQXP wnCePLgCtDud43lFIhTFBXezw9lOeLOwc8pKBjttwal4jdHJxreQQcymGNVK ZbTdZ0MgHJ4FDe8hzRDmG5xppSElyeADCwJbo3MvCdso.NTO2gHqykSgiR9R yoKttCk5nt991VGkkqIZtcP8.C0LZx87x_JkqOXENmQ1R5EXSTFXjMEC6Uv2 0EgRdv..HVKgJ9rmYNlIZGZkBo0od.1tcLbDl1A-- X-Yahoo-Newman-Property: ymail-3 Message-ID: <4E0B860E.50504@freebsd.org> Date: Wed, 29 Jun 2011 22:07:42 +0200 From: Stefan Esser User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20110624 Thunderbird/5.0 MIME-Version: 1.0 To: freebsd-standards@freebsd.org References: <4E09AF8E.5010509@freebsd.org> In-Reply-To: <4E09AF8E.5010509@freebsd.org> X-Forwarded-Message-Id: <4E09AF8E.5010509@freebsd.org> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Subject: Fwd: [RFC] Consistent numeric range for "expr" on all architectures X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Jun 2011 20:21:31 -0000 Since BDE moved the thread over to this list I'm reposting the first message under the assumption, that many list members have not seen it, yet. Replies received in private were in support of the change, except of the warning of possible POSIX compliance issues received from GWollman, who therefore asked to move this thread to the standards list. -------- Original Message -------- Subject: [RFC] Consistent numeric range for "expr" on all architectures Date: Tue, 28 Jun 2011 12:40:14 +0200 From: Stefan Esser To: FreeBSD Developers CC: Bruce Evans , Alexander Best Hi, this point has been on my ToDO list for too long and I'd like to gather opinions whether we should clean up "expr" a bit. Back in 2000, I found that expr silently failed in an installer for a Linux software demo due to restricted range. It was a bug in that installer, but since it was just a few lines of code change, I made "expr", "test" and "/bin/sh" support 64bit range on all architectures. At the same time I added range checks for all calculations as suggested by Bruce Evans. Due to (false, according to BDE) considerations for POSIX compliance, the 64bit code was made conditional on a command line option in 2002. The expr builtin in /bin/sh has been decoupled from the "expr" sources, some time ago, so it has become a separate issue. Given the simple functionality, the behaviour of "expr" is quite complex, depending on CPU architecture, environment and options. The following list shows supported numeric range depending on the CPU: command | ILP32 | I32LP64 | specialties -------------------------------------------------------------------- expr | 32bit | 64bit | wrap around expr -e | 64bit | 64bit | overflow check EXPR_COMPAT=1 expr | 64bit | 64bit | overflow check EXPR_COMPAT=1 expr -e | error | error | "-e" seen as 1st operand (EXPR_COMPAT allows support of "expr -1 + 1", where "-1" is treated as an option by POSIX expr and "expr -- -1 + 1" must be written instead.) Examples on i386: # --- 32 bit range, no range checks: $ expr 2000000000 + 2000000000 -294967296 # --- 64 bit range and overflow checks via "-e" or EXPR_COMAPT: $ expr -e 2000000000 + 2000000000 4000000000 $ EXPR_COMPAT=1 expr 2000000000 + 2000000000 4000000000 $ expr -e 5000000000000000000 + 5000000000000000000 expr: overflow $ echo $? 2 $ EXPR_COMPAT=1 expr -e 1 + 1 expr: syntax error $ echo $? 2 Examples on amd64: # --- always 64 bit range, but wrap around controlled by -e or ENV: $ expr 5000000000000000000 + 5000000000000000000 -8446744073709551616 $ expr -e 5000000000000000000 + 5000000000000000000 expr: overflow $ EXPR_COMPAT=1 expr 5000000000000000000 + 5000000000000000000 expr: overflow Why do I consider the current situation bad: 1) Shell scripts using "expr" have inconsistent behaviour, depending on the word size of the CPU architecture. This makes it possible to develop a shell script on amd64 and be sure it works on i386. 2) You can not just put "expr -e" into shell scripts to get consistent 64bit range over all architectures, since that will fail with a syntax error if EXPR_COMPAT is set in the environment. 3) Range checks are very useful, since silent overflow in shell scripts can lead to data loss or other malfunction that is hard to diagnose. With 64 bit range, overflows are prevented even in cases, where they occur with 32 bit range. 4) The original reason for 64 bit support was a silently failing test in a commercial installer. It happened to fail in such a way, that the result of the computation was ignored under Linux but not in the port I was preparing at that time. 5) The current code is made complex by support of different ranges and overflow behaviour, depending on the options passed and environment. Consistent behavious is not only easier to understand and use, but does even reduce code size and complexity of the sources. Regarding the POSIX compatibility of the proposed change, I received a message from BDE: On Tue, 15 Jun 2010, Bruce Evans wrote: >On Tue, 15 Jun 2010, Stefan Esser wrote: >> I extended expr and test (standalone binaries and /bin/sh builtins) to >> handle 64 bit on i386 a looong time ago (and I also added overflow >> checks to all calculations). But then it was decided, that POSIX demands >> 32 bit range on 32 bit machines, so support for 64 bit range was made >> conditional on a switch (expr -e if memory serves me right ...). So yes, >> 64 bit range is available on all platforms. > > Hmm, POSIX doesn't require this brokenness for the shell, though it > requires practically equivalent brokenness for all operands and option- > arguments generally: it requires "all" integer arithmetic to use > signed long variables and "all" floating point arithmetic to use double > variables, with Standard C semantics. signed long may have any size, > but normally has 31 or 63 value bits as above. It is unclear how a > variable can ever be floating point. Then it contradicts^Woverrides > "all" with certain extensions for the shell. However, these extensions > are almost useless since they mainly make a difference when the signed > long arithmetic would overflow, in which case the behaviour is undefined > (?) so it can DTRT using any extension that it wants. I take this to mean, that the shell should use "signed long" integers, but that it is up to the implementor, what action is taken on overflow and we are free to implement an extension (and enable it by default), which supports 64bit range instead of returning bogus values. My suggestion is to make the following modifications to expr: 1) Always compute with 64 bit range and overflow checks enabled. This means, that the "-e" option is always assumed. 2) Continue to support the "-e" option as a NOP for compatibility with scripts that knew about that FreeBSD extension. This suggestion applies to -CURRENT - I'm afraid a MFC might be considered too much of a POLA. If there are no strong objections, I'd like to commit this change before the code freeze. Of course, I'm willing to adapt the man page at the time of the code commit. The suggested patch just removes casts of operands to "long" that currently exist only to reduce range. All stored operands are 64bit, anyway. This patch if for review purposes only ("diff -w" to ignore indent level, to show that just complexity is removed). Best regards, STefan --- /usr/src/bin/expr/expr.y 2009-10-30 23:02:40.000000000 +0100 +++ ./expr.y 2011-04-28 16:14:52.000000000 +0200 @@ -71,7 +71,6 @@ int yylex(void); int yyparse(void); -static int eflag; char **av; %} @@ -156,10 +155,7 @@ * non-digits MUST NOT be considered integers. strtoimax() will * figure this out for us. */ - if (eflag) (void)strtoimax(s, &ep, 10); - else - (void)strtol(s, &ep, 10); if (*ep != '\0') vp->type = string; @@ -191,13 +187,9 @@ /* vp->type == numeric_string, make it numeric */ errno = 0; - if (eflag) { i = strtoimax(vp->u.s, (char **)NULL, 10); if (errno == ERANGE) - err(ERR_EXIT, NULL); - } else { - i = strtol(vp->u.s, (char **)NULL, 10); - } + errx(ERR_EXIT, "Not a valid integer: %s", vp->u.s); free (vp->u.s); vp->u.i = i; @@ -282,12 +274,11 @@ if (getenv("EXPR_COMPAT") != NULL || check_utility_compat("expr")) { av = argv + 1; - eflag = 1; } else { while ((c = getopt(argc, argv, "e")) != -1) switch (c) { case 'e': - eflag = 1; + /* ignore for backwards compatibility */ break; default: @@ -483,13 +474,10 @@ errx(ERR_EXIT, "non-numeric argument"); } - if (eflag) { r = make_integer(a->u.i + b->u.i); if (chk_plus(a->u.i, b->u.i, r->u.i)) { errx(ERR_EXIT, "overflow"); } - } else - r = make_integer((long)a->u.i + (long)b->u.i); free_value (a); free_value (b); @@ -520,13 +508,10 @@ errx(ERR_EXIT, "non-numeric argument"); } - if (eflag) { r = make_integer(a->u.i - b->u.i); if (chk_minus(a->u.i, b->u.i, r->u.i)) { errx(ERR_EXIT, "overflow"); } - } else - r = make_integer((long)a->u.i - (long)b->u.i); free_value (a); free_value (b); @@ -554,13 +539,10 @@ errx(ERR_EXIT, "non-numeric argument"); } - if (eflag) { r = make_integer(a->u.i * b->u.i); if (chk_times(a->u.i, b->u.i, r->u.i)) { errx(ERR_EXIT, "overflow"); } - } else - r = make_integer((long)a->u.i * (long)b->u.i); free_value (a); free_value (b); @@ -591,13 +573,10 @@ errx(ERR_EXIT, "division by zero"); } - if (eflag) { r = make_integer(a->u.i / b->u.i); if (chk_div(a->u.i, b->u.i)) { errx(ERR_EXIT, "overflow"); } - } else - r = make_integer((long)a->u.i / (long)b->u.i); free_value (a); free_value (b); @@ -617,11 +596,8 @@ errx(ERR_EXIT, "division by zero"); } - if (eflag) r = make_integer(a->u.i % b->u.i); /* chk_rem necessary ??? */ - else - r = make_integer((long)a->u.i % (long)b->u.i); free_value (a); free_value (b);