From owner-svn-src-head@FreeBSD.ORG Mon Oct 27 16:06:53 2014 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 123AA93B; Mon, 27 Oct 2014 16:06:53 +0000 (UTC) Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au [211.29.132.53]) by mx1.freebsd.org (Postfix) with ESMTP id 87D79BE6; Mon, 27 Oct 2014 16:06:52 +0000 (UTC) Received: from c122-106-147-133.carlnfd1.nsw.optusnet.com.au (c122-106-147-133.carlnfd1.nsw.optusnet.com.au [122.106.147.133]) by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id 3358BD433DE; Tue, 28 Oct 2014 02:45:12 +1100 (AEDT) Date: Tue, 28 Oct 2014 02:45:04 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Kurt Jaeger Subject: Re: svn commit: r273734 - head/bin/dd In-Reply-To: <201410271138.s9RBcHrA002447@svn.freebsd.org> Message-ID: <20141028005225.S2013@besplex.bde.org> References: <201410271138.s9RBcHrA002447@svn.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=dMCfxopb c=1 sm=1 tr=0 a=7NqvjVvQucbO2RlWB8PEog==:117 a=PO7r1zJSAAAA:8 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=qKvkKEtZAAAA:8 a=CyJx2C2pdSMWXbW4vZwA:9 a=irJhU9xJ9eRPbFF4:21 a=CSi6hEsHd1jE7GJe:21 a=-0KXm9qgDs5jN3Rh:21 a=CjuIK1q_8ugA:10 Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Oct 2014 16:06:53 -0000 On Mon, 27 Oct 2014, Kurt Jaeger wrote: > Log: > bin/dd: Fix incorrect casting of arguments > > dd(1) casts many of its numeric arguments from uintmax_t to intmax_t > and back again to detect whether or not the original arguments were > negative. This caused wrong behaviour in some boundary cases: > > $ dd if=/dev/zero of=/dev/null count=18446744073709551615 > dd: count cannot be negative > > After the fix: > > $ dd if=/dev/zero of=/dev/null count=18446744073709551615 > dd: count: Result too large Both of these work correctly in my version (with a relatively small patch and no breakage of other cases). (I actually typed large values as -1 and 11111111111111111111111111. -1 means (uintmax_t)-1 although this is undocumented and now broken). > > PR: 191263 > Submitted by: will@worrbase.com > Approved by: cognet@ I couldn't review the PR since I bugzilla doesn't accept mail responses. I didn't fear it was so bad. > Modified: head/bin/dd/args.c > ============================================================================== > --- head/bin/dd/args.c Mon Oct 27 11:21:47 2014 (r273733) > +++ head/bin/dd/args.c Mon Oct 27 11:38:17 2014 (r273734) > @@ -41,6 +41,7 @@ __FBSDID("$FreeBSD$"); > > #include > > +#include > #include > #include > #include > @@ -171,8 +172,7 @@ jcl(char **argv) > */ > if (in.offset > OFF_MAX / (ssize_t)in.dbsz || > out.offset > OFF_MAX / (ssize_t)out.dbsz) > - errx(1, "seek offsets cannot be larger than %jd", > - (intmax_t)OFF_MAX); > + errx(1, "seek offsets cannot be larger than %jd", OFF_MAX); > } This used to be correct. Now it assumes that off_t == intmax_t. Both just happen to be int64_t. This will break when intmax_t is expanded (off_t is unlikely to be expanded). The bug would be detected at compile time now if the type of intmax_t had maximal rank (long long) but int64_t and off_t remain as long on 64-bit arches. C99 doesn't seem to require intmax_t to have maximal rank. This seems to be a bug in C99, but FreeBSD exploits it to avoid using the long long abomination on 6 4-bit arches. > > static int > @@ -186,37 +186,30 @@ c_arg(const void *a, const void *b) > static void > f_bs(char *arg) > { > - uintmax_t res; > > - res = get_num(arg); This used to be correct. Though I don't like unsigned types, the API of read() and write() encourages use of size_t instead of ssize_t for buffer sizes. That was done, but a limit of SSIZE_MAX was applied, partly to avoid problems detecting errors from read() and write() and partly to distinguish overflowing cases. get_num() returns an unsigned type (uintmax_t), so it is suitable for handling the size_t args here. It was used. However, size_t might be smaller than uintmax_t, so it cannot always hold the result of get_num(). So a temporary variable was used to hold the value before checking it. > - if (res < 1 || res > SSIZE_MAX) > - errx(1, "bs must be between 1 and %jd", (intmax_t)SSIZE_MAX); get_num() returns UINTMAX_MAX if the result is too large. This may or may not be an error. But in all arches, UINTMAX_MAX exceeds SSIZE_MAX, so the overflowing case gives the same error as a non-overflowing but too large value. > - in.dbsz = out.dbsz = (size_t)res; > + in.dbsz = out.dbsz = get_num(arg); This breaks the error checking. Blind assignment may corrupt the value before it can be checked. > + if (out.dbsz < 1 || out.dbsz > SSIZE_MAX) > + errx(1, "bs must be between 1 and %jd", SSIZE_MAX); > } Printf format error, as above. > > static void > f_cbs(char *arg) > { > - uintmax_t res; > > - res = get_num(arg); > - if (res < 1 || res > SSIZE_MAX) > - errx(1, "cbs must be between 1 and %jd", (intmax_t)SSIZE_MAX); > - cbsz = (size_t)res; > + cbsz = get_num(arg); > + if (cbsz < 1 || cbsz > SSIZE_MAX) > + errx(1, "cbs must be between 1 and %jd", SSIZE_MAX); > } This just breaks the range checking and the printf format, as above. > > static void > f_count(char *arg) > { > - intmax_t res; > > - res = (intmax_t)get_num(arg); > - if (res < 0) > - errx(1, "count cannot be negative"); This was correct too. Though get_num() returns an unsigned type, it is based on strtoul() which handles negative values as correctly as possible. (The multipliers in get_num() don't handle negative values as correctly as possible.) dd also has a get_off_t() function. This is supposed to handle negative values more carefully, but is so badly implemented that it has many more bugs than blindly casting get_num() to off_t. The above cast is safer since it converts to the signed type corresponding to the return type. In 2's complement, the result is predictable and correct except in unrepresentable cases. E.g., an arg of -1 is returned as UINTMAX_MAX and casting it gives -1 again. My version implements get_off_t() using get_num(). Of course it doesn't blindly convert to an off_t. If does range checking on the uintmax_t before converting: @ static off_t @ get_off_t(const char *val) @ { @ uintmax_t num; @ @ num = get_num(val); @ if (num != (uintmax_t)(off_t)num) @ errx(1, "%s: offset too large", oper); @ return ((off_t)num); @ } > - if (res == 0) > - cpy_cnt = (uintmax_t)-1; > - else > - cpy_cnt = (uintmax_t)res; This function doesn't want an off_t variable, so get_off_t() is unuable for it. Though I don't like unsigned types, it is natural for this one to use one, and this helps avoid proliferation of conversion functions. So I changed the variable type to uintmax_t so that get_num() can be used directly. I also removed the magic -1 case and fixed the broken count = 0 case. A count of 0 should mean 0. Actually, the variable type was already uintmax_t, so using intmax_t and -1 here is more garbage magic than might first appear. I use the following patches for bogus counting (relative to an old version, and possibly not all enclosed here): @diff -u2 args.c~ args.c @--- args.c~ Wed Apr 7 20:20:58 2004 @+++ args.c Wed Apr 7 20:20:59 2004 @@@ -204,13 +206,10 @@ @ f_count(char *arg) @ { @- intmax_t res; @ @- res = (intmax_t)get_num(arg); @- if (res < 0) @- errx(1, "count cannot be negative"); @- if (res == 0) @- cpy_cnt = (uintmax_t)-1; @- else @- cpy_cnt = (uintmax_t)res; @+ cpy_cnt = get_num(arg); @+#if 0 @+ if (cpy_cnt == 0) @+ terminate(0); @+#endif @ } @ @@@ -220,6 +219,4 @@ @ @ files_cnt = get_num(arg); @- if (files_cnt < 1) @- errx(1, "files must be between 1 and %jd", (uintmax_t)-1); @ } @ @diff -u2 dd.c~ dd.c @--- dd.c~ Wed Apr 7 20:20:48 2004 @+++ dd.c Wed Apr 7 20:20:49 2004 @@@ -272,14 +267,6 @@ @ @ for (;;) { @- switch (cpy_cnt) { @- case -1: /* count=0 was specified */ @+ if (cpy_cnt != 0 && st.in_full + st.in_part >= cpy_cnt) @ return; @- case 0: @- break; @- default: @- if (st.in_full + st.in_part >= (uintmax_t)cpy_cnt) @- return; @- break; @- } @ @ /* > + cpy_cnt = get_num(arg); Seems to be correct. > + if (cpy_cnt == SIZE_MAX) > + errc(1, ERANGE, "%s", oper); Bogus. Counts aren't sizes, and no useful restriction on the count can prevent total counts overflowing. Only the non-useful restriction to counts of 0 and 1 prevents overflow. Counts of 2 to sizes of near the maximum to add up to almost twice the maximum. > + if (cpy_cnt == 0) > + cpy_cnt = -1; > } Bogus. -1 really means UINTMAX_MAX (at least, that is what it becomes in cpy_cnt). On 64-bit arches, SIZE_MAX has the same value and the previous check keeps that value as magic. On 32-bit arches, it doesn't even do that. > > static void > @@ -225,7 +218,7 @@ f_files(char *arg) > > files_cnt = get_num(arg); > if (files_cnt < 1) > - errx(1, "files must be between 1 and %jd", (uintmax_t)-1); > + errx(1, "files must be between 1 and %ju", SIZE_MAX); > } The usual printf format error. > > static void > @@ -241,14 +234,11 @@ f_fillchar(char *arg) > static void > f_ibs(char *arg) > { > - uintmax_t res; > > if (!(ddflags & C_BS)) { > - res = get_num(arg); > - if (res < 1 || res > SSIZE_MAX) > - errx(1, "ibs must be between 1 and %jd", > - (intmax_t)SSIZE_MAX); > - in.dbsz = (size_t)res; > + in.dbsz = get_num(arg); > + if (in.dbsz < 1 || in.dbsz > SSIZE_MAX) > + errx(1, "ibs must be between 1 and %ju", SSIZE_MAX); > } > } The usual range checking errors. More than the usual printf format error. SSIZE_MAX has a signed type, but is now printed using an unsigned format. > > @@ -262,14 +252,11 @@ f_if(char *arg) > static void > f_obs(char *arg) > { > - uintmax_t res; > > if (!(ddflags & C_BS)) { > - res = get_num(arg); > - if (res < 1 || res > SSIZE_MAX) > - errx(1, "obs must be between 1 and %jd", > - (intmax_t)SSIZE_MAX); > - out.dbsz = (size_t)res; > + out.dbsz = get_num(arg); > + if (out.dbsz < 1 || out.dbsz > SSIZE_MAX) > + errx(1, "obs must be between 1 and %jd", SSIZE_MAX); > } > } The usual range checking errors. The usual printf format error. > > @@ -378,11 +365,17 @@ get_num(const char *val) > uintmax_t num, mult, prevnum; > char *expr; > > + while (isspace(val[0])) isspace() is undefined on negative characters. > + val++; > + Style bug (extra blank line). > + if (val[0] == '-') > + errx(1, "%s: cannot be negative", oper); There is no particular reason to enforce this. It breaks using get_num() to implement get_off_t(). Any check like this should be external to the function. I don't know of any useful useful use of negative numbers in dd, but I want get_num() to be more generally useful, and used. It can easily handle negative args but either it or the caller would need a sign check like the above to determine if the unsigned result actually is just a representation of a signed result. There is only an ambiguity for signed results above INTMAX_MAX. Another interesting case is an arg of -1*-1. get_num() allowed both the -1's, but turned them into UINTMAX_MAX. The overflow checking is too careful, so -1*-1 is rejected as overflow instead of giving 1. > + More of this style bug. > errno = 0; > - num = strtouq(val, &expr, 0); > + num = strtoull(val, &expr, 0); get_num() returns uintmax_t, but still used strtouq() internally. This changes it to use a loing long abomination instead of the correct function strtoumax(). My fixes for get_num() are relatively small: patch relative to an old version: @ @@ -334,5 +331,5 @@ @ } @ @ -/* @ +/*- @ * Convert an expression of the following forms to a uintmax_t. @ * 1) A positive decimal number. @ @@ -341,6 +338,6 @@ @ * 4) A positive decimal number followed by a m (mult by 1 << 20). @ * 5) A positive decimal number followed by a g (mult by 1 << 30). @ - * 5) A positive decimal number followed by a w (mult by sizeof int). @ - * 6) Two or more positive decimal numbers (with/without [bkmgw]) @ + * 6) A positive decimal number followed by a w (mult by sizeof int). @ + * 7) Two or more positive decimal numbers (with/without [bkmgw]) @ * separated by x (also * for backwards compatibility), specifying @ * the product of the indicated values. I didn't fix the spammed error in the comment. The errors start with "positive". This function accepts negative numbers and returns them as positive numbers. Except this commit breaks that. The errors continue with "decimal". This function actually takes decimal, octal or hexadecimal numbers... @ @@ -349,13 +346,12 @@ @ get_num(const char *val) @ { @ - uintmax_t num, mult, prevnum; @ + uintmax_t mult, num, prevnum; @ char *expr; @ @ errno = 0; @ - num = strtouq(val, &expr, 0); @ - if (errno != 0) /* Overflow or underflow. */ @ + num = strtoumax(val, &expr, 0); @ + if (errno == ERANGE) /* Overflow. */ "Underflow" can only occur for FP expressions. It is not going negative. @ err(1, "%s", oper); @ - @ - if (expr == val) /* No valid digits. */ @ + if (expr == val) /* No digits. */ @ errx(1, "%s: illegal numeric value", oper); @ @ @@ -369,8 +365,8 @@ @ break; @ case 'm': @ - mult = 1 << 20; @ + mult = 1UL << 20; @ break; Don't assume 32-bit ints. @ case 'g': @ - mult = 1 << 30; @ + mult = 1UL << 30; @ break; @ case 'w': @ @@ -378,7 +374,6 @@ @ break; @ default: @ - ; @ + break; @ } @ - @ if (mult != 0) { @ prevnum = num; @ > Modified: head/bin/dd/conv.c > ============================================================================== > --- head/bin/dd/conv.c Mon Oct 27 11:21:47 2014 (r273733) > +++ head/bin/dd/conv.c Mon Oct 27 11:38:17 2014 (r273734) > @@ -133,7 +133,7 @@ block(void) > */ > ch = 0; > for (inp = in.dbp - in.dbcnt, outp = out.dbp; in.dbcnt;) { > - maxlen = MIN(cbsz, in.dbcnt); > + maxlen = MIN(cbsz, (size_t)in.dbcnt); > if ((t = ctab) != NULL) > for (cnt = 0; cnt < maxlen && (ch = *inp++) != '\n'; > ++cnt) > @@ -146,7 +146,7 @@ block(void) > * Check for short record without a newline. Reassemble the > * input block. > */ > - if (ch != '\n' && in.dbcnt < cbsz) { > + if (ch != '\n' && (size_t)in.dbcnt < cbsz) { > (void)memmove(in.db, in.dbp - in.dbcnt, in.dbcnt); > break; > } Now you need casts to recover from changing the mixture of unsigned and signed types. If some block size variables are signed, then all should be, but cbsz is still size_t. > @@ -228,7 +228,7 @@ unblock(void) > * translation has to already be done or we might not recognize the > * spaces. > */ > - for (inp = in.db; in.dbcnt >= cbsz; inp += cbsz, in.dbcnt -= cbsz) { > + for (inp = in.db; (size_t)in.dbcnt >= cbsz; inp += cbsz, in.dbcnt -= cbsz) { > for (t = inp + cbsz - 1; t >= inp && *t == ' '; --t) > ; > if (t >= inp) { > The excessive casts give 2 more style bugs here (expannsion of the line beyond 80 columns and no re-wrapping to fix this). > Modified: head/bin/dd/dd.c [More type poisoning] > ... > Modified: head/bin/dd/dd.h > ============================================================================== > --- head/bin/dd/dd.h Mon Oct 27 11:21:47 2014 (r273733) > +++ head/bin/dd/dd.h Mon Oct 27 11:38:17 2014 (r273734) > @@ -38,10 +38,9 @@ > typedef struct { > u_char *db; /* buffer address */ > u_char *dbp; /* current buffer I/O address */ > - /* XXX ssize_t? */ ISTR asking the author of this comment to clean up some sign-related bugs. I didn't like the results. They including changing minor sign hacks to broken overflow handling in get_off_t(). > - size_t dbcnt; /* current buffer byte count */ > - size_t dbrcnt; /* last read byte count */ > - size_t dbsz; /* block size */ > + ssize_t dbcnt; /* current buffer byte count */ > + ssize_t dbrcnt; /* last read byte count */ > + ssize_t dbsz; /* block size */ > > #define ISCHR 0x01 /* character device (warn on short) */ > #define ISPIPE 0x02 /* pipe-like (see position.c) */ > @@ -57,13 +56,13 @@ typedef struct { > } IO; > > typedef struct { > - uintmax_t in_full; /* # of full input blocks */ > - uintmax_t in_part; /* # of partial input blocks */ > - uintmax_t out_full; /* # of full output blocks */ > - uintmax_t out_part; /* # of partial output blocks */ > - uintmax_t trunc; /* # of truncated records */ > - uintmax_t swab; /* # of odd-length swab blocks */ > - uintmax_t bytes; /* # of bytes written */ All of these were correct. Counters aren't related to sizes, and shouldn't be limited to 32 bits on 32-bit arches. Especally since they used to be 64 bits on 32-bit arches. They were only u_long in 4.4BSD, but went through several rounds of churning in 1999 to reach the above. > + size_t in_full; /* # of full input blocks */ > + size_t in_part; /* # of partial input blocks */ > + size_t out_full; /* # of full output blocks */ > + size_t out_part; /* # of partial output blocks */ > + size_t trunc; /* # of truncated records */ > + size_t swab; /* # of odd-length swab blocks */ > + size_t bytes; /* # of bytes written */ The byte count is most broken. Now it overflows on 32-bit arches after copying just 0.1% or a 4TB disk. > struct timespec start; /* start time of dd */ > } STAT; > > > Modified: head/bin/dd/position.c > ============================================================================== > --- head/bin/dd/position.c Mon Oct 27 11:21:47 2014 (r273733) > +++ head/bin/dd/position.c Mon Oct 27 11:38:17 2014 (r273734) > @@ -178,7 +178,7 @@ pos_out(void) > n = write(out.fd, out.db, out.dbsz); > if (n == -1) > err(1, "%s", out.name); > - if ((size_t)n != out.dbsz) > + if (n != out.dbsz) > errx(1, "%s: write failure", out.name); > } > break; This is about the only simplification that you get from using ssize_t for dbsz, etc. The values were restricted to SSIZE_MAX instead of to SIZE_MAX mainly so that the error checking can be simpler here (it is too simple to be correct -- dd is clueless about short i/o's). But the compiler doesn't know about the restriction, so might warn. The cast prevents that and expresses correctness of the conversion. This isn't really a simplification. Another compiler might warn about the implicit conversion of out.dbsz when passed to write() (since the compiler doesn't know that the value is >= 0). Then you would need a cast to prevent that and express the correctness of the conversion. Bruce