From owner-svn-src-head@FreeBSD.ORG Tue Nov 5 00:23:13 2013 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id EADB7F0C; Tue, 5 Nov 2013 00:23:13 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail110.syd.optusnet.com.au (mail110.syd.optusnet.com.au [211.29.132.97]) by mx1.freebsd.org (Postfix) with ESMTP id EAFB52E24; Tue, 5 Nov 2013 00:23:12 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id 2C4C278124D; Tue, 5 Nov 2013 11:23:03 +1100 (EST) Date: Tue, 5 Nov 2013 11:23:02 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Eitan Adler Subject: Re: svn commit: r257646 - head/lib/libc/string In-Reply-To: <201311041905.rA4J5WT0097968@svn.freebsd.org> Message-ID: <20131105082043.G954@besplex.bde.org> References: <201311041905.rA4J5WT0097968@svn.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=bpB1Wiqi c=1 sm=1 tr=0 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=Jjr3N5Xq17sA:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=PxnKILAbCc4A:10 a=0bboea_cAAAA:8 a=nPgqESEMlYIW6C_Ud6sA:9 a=z6bOAOlBPfTk0ypl:21 a=w5yiX3cPpRu9v20-:21 a=CjuIK1q_8ugA:10 a=lZcSDGFEfbgA:10 Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Nov 2013 00:23:14 -0000 On Mon, 4 Nov 2013, Eitan Adler wrote: > Log: > Use OpenBSD's revamped description of strlcpy and strlcat. > > This explanation is supposed to be simpler and better. In particular > "comparing it to the snprintf API provides lots of value, since it raises the > bar on understanding, so that programmers/auditors will a better job calling > all 3 of these functions." LOL. It indeed makes the design errors in all 3 functions more obvious. > Modified: head/lib/libc/string/strlcpy.3 > ============================================================================== > --- head/lib/libc/string/strlcpy.3 Mon Nov 4 18:15:45 2013 (r257645) > +++ head/lib/libc/string/strlcpy.3 Mon Nov 4 19:05:31 2013 (r257646) > @@ -1,4 +1,4 @@ > -.\" $OpenBSD: strlcpy.3,v 1.19 2007/05/31 19:19:32 jmc Exp $ > +.\" $OpenBSD: strlcpy.3,v 1.26 2013/09/30 12:02:35 millert Exp $ > .\" > .\" Copyright (c) 1998, 2000 Todd C. Miller > .\" > @@ -27,7 +27,7 @@ > .\" > .\" $FreeBSD$ > .\" > -.Dd June 22, 1998 > +.Dd November 10, 2013 Glenn mentioned a time warp. There is another one in the Copyright line (it is still for the old version). > .Dt STRLCPY 3 > .Os > .Sh NAME > @@ -47,69 +47,81 @@ The > .Fn strlcpy > and > .Fn strlcat > -functions copy and concatenate strings respectively. > -They are designed > -to be safer, more consistent, and less error prone replacements for > +functions copy and concatenate strings with the > +same input parameters and output result as > +.Xr snprintf 3 . It is false that these functions have the _same_ input parameters and output result as snprintf(). They are simpler and not quite as badly designed, so they don't have so many overflow problems or type errors, and this is reflected in their input parameters and output type being quite different: (1) they aren't variadic, but snprintf() is. (2) point (1) should give only the difference that the final "arg" for snprintf() is `...', but there is the additional difference that these functions have their parameters in strncpy() order: dst, src, size while snprintf() has parameters in (mostly) printf() order: dst, size, fmt, ... (3) the `size' parameter has type size_t for both, but POSIX gratuitously breaks this for snprintf() if its value exceeds SIZE_MAX (POSIX requires returning a negative value with errno EOVERFLOW in this case even if the correct result would be representable in the return type) (4) the return type is size_t for these functions, but is int for snprintf(). This allows these functions to represent all results, so EOVERFLOW can't happen. There is no type that can represent all results for snprintf(), and it returns int for historical reasons. In C99, the behaviour if the correct result is implicitly specified as undefined (explicitly only in a non-normative appendix). In POSIX, the behaviour is specified as returning a negative value with errno EOVERFLOW. This makes handling of errors for snprintf() much more complicated than for these functions, if error handling is actually done. (Actually, strlcat() has undefined behaviour too. Details below.) In most cases you can show at compile time that overflow can't happen, but this is almost as hard as showing that the buffer is large enough so that you don't even need to use snprintf(). > +They are designed to be safer, more consistent, and less error > +prone replacements for the easily misused functions > .Xr strncpy 3 > and > .Xr strncat 3 . "error prone" should probably be hyphenated. (/usr/share/man hyphenates it in a slightly larger percentage of cases than not.). Indeed, they are just as hard to use and as easy to misuse as snprintf(). They are just less error-prone than strncpy() and strncat() when misused. The usual misuse is to never check for errors. WHen errors are checked for, they are much harder to use than sprintf() and strn*(). > -Unlike those functions, > -.Fn strlcpy > -and > -.Fn strlcat > -take the full size of the buffer (not just the length) and guarantee to Actually, all the functions take just a length (a size which can be either the full size of the buffer or the size of an initial part of the buffer, where the buffer pointer may be offset in the "real" buffer and the initial part may be null). > -NUL-terminate the result (as long as > -.Fa size > -is larger than 0 or, in the case of > -.Fn strlcat , > -as long as there is at least one byte free in > -.Fa dst ) . > -Note that a byte for the NUL should be included in > -.Fa size . > -Also note that > +.Pp > .Fn strlcpy > and > .Fn strlcat > -only operate on true > -.Dq C > -strings. > -This means that for > -.Fn strlcpy > -.Fa src > -must be NUL-terminated and for > -.Fn strlcat > -both > -.Fa src > -and > -.Fa dst > -must be NUL-terminated. > +take the full size of the destination buffer and guarantee > +NUL-termination if there is room. > +Note that room for the NUL should be included in > +.Fa dstsize . > .Pp > -The > .Fn strlcpy > -function copies up to > -.Fa size > -- 1 characters from the NUL-terminated string > +copies up to > +.Fa dstsize > +\- 1 characters from the string > .Fa src > to > .Fa dst , > -NUL-terminating the result. > +NUL-terminating the result if > +.Fa dstsize > +is not 0. These changes are hard to compare, but I noticed the breakage of the arg name (it seems to have been renamed from `size' to 'dstsize' everywhere in this man page except in the prototypes where the arg names are defined) and the regression to using the non-technical term "room". > .Pp > -The > .Fn strlcat > -function appends the NUL-terminated string > +appends string Regression away from manpage-ese: "The foo function" -> "foo". I don't like the verboseness of the former, but it is normal. Removing "the NUL-terminated" is good. strings are NUL-terminated by definition, and this shouldn't be repeated ad nauseum. > .Fa src > to the end of > .Fa dst . Old verboseness: "the end of". Appends always occur at ends. > It will append at most > -.Fa size > -- strlen(dst) - 1 bytes, NUL-terminating the result. > +.Fa dstsize > +\- strlen(dst) \- 1 characters. This change mainly gives the usual mismatch with the arg `size'. Old bugs in it: - no markup for strlen(dst) combined with markup for size/dstsize on the same output line - overflow bugs. size - strlen(dst) can easily overflow. E.g., size might be 0, and then the expression overflows for all non-null dst strings. On overflow, the claimed limit is usually vacuously correct, but is not the limit. It is usually much larger than the limit (something like 0-1UL-1 = 0xfffffffffffffffe. Then it is vacuously true, but not useful to know, that at most this many bytes will be appended. > +It will then NUL-terminate, unless > +.Fa dstsize > +is 0 or the original > +.Fa dst > +string was longer than > +.Fa dstsize > +(in practice this should not happen > +as it means that either > +.Fa dstsize > +is incorrect or that > +.Fa dst > +is not a proper string). One of the design errors in snprintf() is that it is hard to use when the result is truncated. But it is clearly not incorrect to use a size that is too small for snprintf() or strlcpy(), provided there is error handling. Correctness of this use is not so clear for strlcat(). The above says that it is unconditionally incorrect. Is it really? > +.Pp > +If the > +.Fa src > +and > +.Fa dst > +strings overlap, the behavior is undefined. strlcat() is quite different from snprintf() and strlcpy(), so the claim that _these_ functions [both of them] have the _same_ input parameters and output result as snprintf() is much more false for strlcat() than for strlcpy(). Change it to only claim similarity for strlcpy(). > .Sh RETURN VALUES > -The > +Besides quibbles over the return type > +.Pf ( Va size_t > +versus > +.Va int ) > +and signal handler safety > +.Pf ( Xr snprintf 3 > +is not entirely safe on some systems), the > +following two are equivalent: > +.Bd -literal -offset indent > +n = strlcpy(dst, src, len); > +n = snprintf(dst, len, "%s", src); The return type difference is not a quibble (except the EOVERFLOW case shouldn't be a problem in practice). It means that first you must declare n to have the correct type (same as the return type). Then you have to take different types of care with sign extension and overflow bugs depending on the type, if you actually use n to check for errors. Well, perhaps only handling the EOVERFLOW case requires the full complications. snprintf() returns a negative value on error. I think that can only happen for the EOVERFLOW and EILSEQ cases, and EILSEQ can't happen for format "%s", but this is not clear. So even among rare uses that actually attempt to handle errors, it is norma to have the following sign extension/overflow bug for use of snprintf(): size_t len; int n; n = snprintf(dst, len, "%s", src); if (n >= len) goto toolong; The comparison in this has various sign extension/overflow problems involving implementation-defined behaviour depending on the relative sizes of int and size_t and of course on the implementation. It is necessary to write it as: if (n < 0) goto toolong_or_error; if (n >= len) goto toolong; Actually, in C99 the behaviour is undefined in toolong case (whether or not it is checked for), since undefined behaviour happens before this case is reached. It is defined in POSIX and doesn't depend on implementation-defined behaviour with the correct check. These complication doesn't apply to strlcpy() or strlcat(). One of the best things about strlcpy() or strlcat() is that their man page gives detailed examples of the smaller complications for the error handling for them than printf()'s man page does for snprintf(). This is in unchanged parts of the man page. The less detailed example in the above doesn't mesh so well with the more detailed examples later. It is interesting that it spells the `size' arg in another different way -- as `len' -- after previous descriptions have misemphasized this arg is not just a length but is "the" "full" buffer size. > +.Ed > +.Pp > +Like > +.Xr snprintf 3 , > +the > .Fn strlcpy > and > .Fn strlcat > -functions return the total length of the string they tried to > -create. > +functions return the total length of the string they tried to create. > For > .Fn strlcpy > that means the length of > @@ -121,29 +133,12 @@ that means the initial length of > plus > the length of Modulo overflow in "plus". The man page can reasonably be fuzzy here, but the implementation (of strlcpy() is not careful about this either. In all cases where it returns early because the result wouldn't fit in the specified size, it returns strlen(dst) + strlen(src) (spelled differently). This can overflow. One string might have length SIZE_MAX and the other length 1, or both slightly longer than SIZE_MAX/2. It takes an exotic arch for this too happen. But not much more exotic than large model in MSODS will do it -- you can have strings in different segments and the segment size is only 64K. I forget if SIZE_MAX is 64K-1 in MSDOS large model. If it is, then it is easy to have strings that each fill out the entire address space accessible through a single pointer. Oops, the representabilty problem applies to strlcat() too. strlcat() thus has the same bugs as C99 snprintf() -- overflow can occur but there is no requirement to detect it or report it, using much the same example: n = snprintf(buf, 0, "%s%s", dst, src); /* (1) */ n = snprintf(buf, 0, "%*s%*s", SIZE_MAX, dst, SIZE_MAX, src); /* (2) */ n = strlcat(dst, 0, src); /* (3) */ (1) Even if snprintf() returned size_t, it couldn't return the sum of the lengths if they are long. (2) Can have large output with small input strings with snprintf(). (3) has to be like (1) -- need large input strings. Note that this bug doesn't affect strcat() or strncat(), since then it is the caller's responsibility to provide enough space at the end of dst to append src, and for strcat() this is just impossible if the sum of the string lengths overflows, while for strncat() overflow can't occur because the API doesn't attempt to return anything useful. Callers can easily avoid this problem, but strlcat() is supposed to always return a useful value so that they don't need to. Only callers that actually attempt to handle errors are affected. The size arg prevents buffer overrun, but since an overflowed result can be returned, callers can't trust any result. I described this bug as giving "undefined" behaviour. We can look at the implementation and see that the result is the defined result of adding 2 size_t's. Overflow doesn't occur in the technical sense since size_t is unsigned. But since this bug wasn't noticed before, any behaviour that it gives is accidental and undocumented, so it is undefined in practice. Callers also can't recover from it since the overflowing (but defined) result is indistinguishable from a valid result. For a quick fix, return SIZE_MAX on overflow. Real strings can only have length SIZE_MAX if the array that holds them has size 1 larger than SIZE_MAX, but since that size is not representable such strings are unusable and much more difficult to create than ones of length merely SIZE_MAX-1. For the same reason, the size arg cannot exceed SIZE_MAX. Thus in a bad case where it equals SIZE_MAX and the sum of the lengths would exceed SIZE_MAX, we return SIZE_MAX and the correct error check detects this as too long: n = strlcat(dst, src, SIZE_MAX); /* result SIZE_MAX */ if (n >= SIZE_MAX) goto toolong; /* equality, so go */ > .Fa src . > -While this may seem somewhat confusing, it was done to make > -truncation detection simple. > .Pp > -Note however, that if > -.Fn strlcat > -traverses > -.Fa size > -characters without finding a NUL, the length of the string is considered > -to be > -.Fa size > -and the destination string will not be NUL-terminated (since there was > -no space for the NUL). > -This keeps > -.Fn strlcat > -from running off the end of a string. > -In practice this should not happen (as it means that either > -.Fa size > -is incorrect or that > -.Fa dst > -is not a proper > -.Dq C > -string). > -The check exists to prevent potential security problems in incorrect code. This section got moved. I'm not sure what changed in it. > +If the return value is > +.Cm >= > +.Va dstsize , > +the output string has been truncated. > +It is the caller's responsibility to handle this. As explained above, ">=' is correct for strl*() but not for snprintf(). > .Sh EXAMPLES > The following code fragment illustrates the simple case: > .Bd -literal -offset indent > @@ -169,7 +164,7 @@ if (strlcat(pname, file, sizeof(pname)) > .Ed > .Pp > Since it is known how many characters were copied the first time, things > -can be sped up a bit by using a copy instead of an append. > +can be sped up a bit by using a copy instead of an append: > .Bd -literal -offset indent > char *dir, *file, pname[MAXPATHLEN]; > size_t n; > @@ -201,5 +196,5 @@ and > .Fn strlcat > functions first appeared in > .Ox 2.4 , > -and made their appearance in > +and > .Fx 3.3 . > The examples remain good. Bruce