From owner-svn-src-all@freebsd.org Tue Dec 20 02:44:26 2016 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EAE98C890D3; Tue, 20 Dec 2016 02:44:26 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id 971E21B76; Tue, 20 Dec 2016 02:44:26 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au [122.106.153.191]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 7F8EC1044B1A; Tue, 20 Dec 2016 13:44:16 +1100 (AEDT) Date: Tue, 20 Dec 2016 13:44:15 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Ravi Pokala cc: Ian Lepore , Warner Losh , Sepherosa Ziehau , Dimitry Andric , src-committers , "svn-src-all@freebsd.org" , "svn-src-head@freebsd.org" Subject: Re: svn commit: r310171 - head/sys/sys In-Reply-To: Message-ID: <20161220115922.H1147@besplex.bde.org> References: <201612161949.uBGJnMol059217@repo.freebsd.org> <9BD5034F-55A6-48F6-A391-A0877FF49702@panasas.com> <1482175209.48539.9.camel@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=cZeiljLM c=1 sm=1 tr=0 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17 a=kj9zAlcOel0A:10 a=6I5d2MoRAAAA:8 a=7Qk2ozbKAAAA:8 a=n-kJSqksAAAA:8 a=pGLkceISAAAA:8 a=bdFN6mIY_9BeUA2hOmwA:9 a=sO5Achs8AXtz-njr:21 a=C33AFFPnMQOz1LBH:21 a=CjuIK1q_8ugA:10 a=IjZwj45LgO3ly-622nXo:22 a=1lyxoWkJIXJV6VJUPhuM:22 a=fMAGXkzVR7cRuVcbbVfL:22 a=6kGIvZw6iX1k4Y-7sg4_:22 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Dec 2016 02:44:27 -0000 On Mon, 19 Dec 2016, Ravi Pokala wrote: > -----Original Message----- >> From: on behalf of Ian Lepore >> Date: 2016-12-19, Monday at 11:20 >> To: Warner Losh , Ravi Pokala >> Cc: Sepherosa Ziehau , Dimitry Andric , src-committers , "svn-src-all@freebsd.org" , "svn-src-head@freebsd.org" >> Subject: Re: svn commit: r310171 - head/sys/sys >> >> On Mon, 2016-12-19 at 11:58 -0700, Warner Losh wrote: >>> >>> ... >>> >>> Are there other precedence for avoiding the SCN macros in the tree as >>> well, or is this new art? >> >> There was another commit recently the fixed the same kind of scanf >> error by making the variable fit the scanf type (changing uint64_t to >> an explicit long long unsigned, iirc). I don't know if that alone >> counts as a precedent, but IMO it's a more palatible fix than the >> SCN/PRI ugliness. That was apparently a bug, not a fix. Someone said that the variable needs to have type precisely uint64_t. (Extensive use of fixed-width types in FreeBSD is another bug. It asks for fixed width at all cost. Using the "fast" or "least" types would as for time or space efficiency instead. Or just use basic types whenever possible for simplicity. Fixed-width types should only be used on ABI boundaries.) I think there is an ABI boundary near the other commit. Some hardware wants uint64_t, and we should use uint_fast64_t or uint_least64_t, but we used uint64_t. These uses require translation at the ABI boundary, since the fast and least types may be larger than the fixed-width types. (In more complicated cases, the relevant fixed-width type does't exist. IIRC, POSIX requires only fixed width types of width 8, 16 and 32 to exist, since old badly designed ABIs like inet_addr() require a fixed width, but newer ABIs don't repeat the mistake for width 64. The C standard requires the "fast" and "least" types for widths 8, 16, 32 and 64 to exist. This is possible since they can all be [unsigned] long long or [u]intmax_t although these might be very far from being either fast or least. All fixed-width types are optional in C. So to support bad software ABIs and some hardware ABIs using portable code, you have to read write the data using some basic type (perhaps unsigned char) and convert it to a "fast" or "least" type (or maybe [u]intmax_t). We changed to using the unsigned long long abomination. This is basically a bad spelling of uintmax_t. All uint_fast64_t, uint_least64_t, unsigned long long and uintmax_t are at least as large asthe ABI type uint64_t, so they can be used to represent the ABI values, but all need translation at ABI booundaries since they may be strictly larger. The translation may be as simple as assignment, depending on how the code is written. Since the code wasn't written with this in mind, it might have type puns instead. Or the compiler might just warn about all implicit conversions that might lose bits. > With all apologies to Churchill, SCN/PRI are the worst way to address this in a machine-independent way, except for all the other ways that have been tried from time to time. :-P No, there are much better ways. For PRI*, just cast to [u]intmax_t. The only thing wrong with this is that it lots wastes space and time when [u]intmax_t is very wide. We don't have this problem yet since [u]intmax_t is only 64 bits. That would be very wide on 8-bit systems but is merely wide on 32-bit systems and not wide on 63-bit systems. PRI* is not even available for an average typedef like uid_t. It is available for all "fast" and "least" typedefs. For SCN*, first don't use *scanf(). Use APIs that can actually handle errors, like strtol(). If you have to maintain bad code that uses *scanf(), then convert to uintmax_t. The conversions are not as short as casting for *printf(). They are much like what is needed at ABI boundaries: uintmax_t tmp; uint64_t val; /* * Good code does tmp = strtoumax() here. Then handle errors * reported by strtoumax(). Then check that the result is * representable in uint64_t. Then handle errors from that. * * Bad code did *sscanf(str, "%llx", &val) with no error checking * or handling of course. The type didn't't match the format in * general, and the behaviour was undefined on overflow. */ sscanf(str, "%jx", &tmp); /* keep null error handling */ val = tmp; /* preserve undefined on overflow */ /* * Overflow in sscanf() is less likely than before (uintmax_t * might be larger). Then overflow occurs much like before on * assignment (if the result is too large for the final type). */ Since the conversion is more elaborate for *scanf() than for *printf(), SCN* is slightly less bad than PRI*. SCN* is also not available for an average type like uid_t. It is available for the "fast" and "least" typedefs. But with error handling, it is even easier to always read into a temporary variable of type [u]intmax_t so as to do range checking: uintmax_t tmp; uint_fast64_t fast_val; uint_least64_t least_val; unsigned long long abom_val; uint64_t val; errno = 0; /* chummy with implementation */ r = sscanf(str, "%jx", &tmp); if (r < 0 || errno != 0) /* detect undef behaviour sometimes */ some_error(); if (tmp > UINT64_MAX) range_error(); fast_val = tmp; /* for local use */ least_val = tmp; /* for local use */ abom_val = tmp; /* for unspeakable use */ val = tmp; /* for ABI use */ The value could have been read directly into any of the other variables except uint64_t directly, with only portable SCN* ugliness, but it is easier to use uintmax_t. It is a small step from this to using correct code using strtoumax(). The SCN* mistake is not made for the strtol() family (by having a function for every possible type). But if you have to maintain bad code using *scanf(), it might have 20 args instead of 1, or complications like %n or string input (there is nothing like the strtol() family for convenient parsing of strings). Then converting to 20 copies of the above would be painful (or course you would use a function). The quick fix is to omit the error handling and add 20 temorary variables and 20 assignments. Grepping for examples in src/bin shows only 3 utilities using *scanf() and 1 serious type error: - ls uses 1 sscanf() with bad error handling (no range checking, and a broken fall-through for < 0 (it is treated as all args present), and no errno hack. - sleep uses 1 sscanf() rewritten be me to do all possible error handling short of using the errno hack. - stty uses 4 sscanf()s with assorted type errors. It always reads 1 variable at a time into a single temporary variable of type long. This (or rather using strtol()) was almost correct before C99 broke long being longest. It should have read into a u_long for the unsigned cases. Instead, it bogusly casts &tmp from long * to u_long * and then depends on benign conversions to use tmp as a u_long. Many of the uncast longs have sign errrors in a different way -- they are for speed_t, which happens to be u_int. stty's use shows most of the ABI complications mentioned above. The sscanf()s are 1 for variables of type tcflags_t, 2 for variables of type speed_t and 1 for variables of type cc_t. All of these types are supposed to be opaque, except they are unsigned and integer. So using long to read them was wrong on all cases. The correct type was u_long in C90 and uintmax_t after C99 broke long being longest. u_long still works in FreeBSD because ABIs don't allow easy expansion and there is no need for expanding these types. These types are actually u_char and u_int, so they fit easily in u_long and more care should be taken to disallow reading values that don't fit in the final type. All other errors are ignored anyway. Bruce