Date: Tue, 14 Nov 2017 23:41:38 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Warner Losh <imp@freebsd.org> Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r325765 - head/lib/libc/string Message-ID: <20171114215948.L913@besplex.bde.org> In-Reply-To: <201711131704.vADH4iZ9035812@repo.freebsd.org> References: <201711131704.vADH4iZ9035812@repo.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 13 Nov 2017, Warner Losh wrote: > Add notes about overlapping copies. > > Add notes to each of these that specifically state that results are > undefined if the strings overlap. In the case of memcpy, we document > the overlapping behavior on FreeBSD (pre-existing). For str*, it is > left unspecified, however, since the default (and x86) implementations > do not handle overlapping strings properly. > > PR: 223653 > Sponsored by: Netflix > > Modified: > head/lib/libc/string/memcpy.3 > head/lib/libc/string/strcat.3 > head/lib/libc/string/strcpy.3 C standards and POSIX have much better wording. > Modified: head/lib/libc/string/memcpy.3 > ============================================================================== > --- head/lib/libc/string/memcpy.3 Mon Nov 13 16:53:36 2017 (r325764) > +++ head/lib/libc/string/memcpy.3 Mon Nov 13 17:04:44 2017 (r325765) > @@ -54,6 +54,11 @@ bytes from string > .Fa src > to string > .Fa dst . > +If > +.Fa src > +and > +.Fa dst > +overlap, the results are not defined. > .Sh RETURN VALUES > The > .Fn memcpy The bugs here are: - src and dst are pointers. Overlap for pointers makes no sense except for pointers to single objects but the pointers here are not for that. Standards use the correct description "objects that overlap". - style bug: verboseness in the nonsense, by naming the non-objects and then having to mark up their names. A full description would be much longer, especially for non-counted copies. For strcpy(), the relevant obects are the array of char beginning at .Fa src and ending at its terminating NUL, and the array of char beginning at .Fa dst with the same number of elements as the source array. Standards omit all details and say simple "objects". memcpy() only has a count so it is simpler. strncpy() has a count in addition to the NUL terminators, so it is more complicated. - use of the wrong term "results". The correct term is "behavior". Results are are worst implementation-defined in Standards, since it is almost impossible for a result to be undefined without the whole behaviour also being undefined. These functions have only 1 result. In Standard wording, function results are return values, not the whole behavior of the function. - use of the non-technical term "not defined". Standards use the technical term "undefined". The Standards wording is "If copying takes place between objects that overlap, the behaviour is undefined". > Modified: head/lib/libc/string/strcat.3 > Modified: head/lib/libc/string/strcpy.3 - as above, except "undefined" is spelled correctly Grepping for "overlap" in draft C99 shows the following other functions with overlap giving undefined behavior: snprintf, sprintf, sscanf, vsnprintf, vsprintf, mbstowcs, wcstombs, strxfrm, strftime. all functions in <wchar.h> where overlap might occur, except when explicitly specified (seems to be only explicitly specified for wmemmove) The term "overlap" occurs in 193 section 3 man pages in -current (counting links as 1 each). It isi still never used for any of the above functions. It is used in the following related man pages: - bcmp.3. This use is nonsense. Overlap doesn't cause any problems for bcmp(). - bcopy.3 and memmove.3. This use is not nonsense, but its wording is much worse than in Standards. It says "The .Fn function copies .Fa len bytes from string .Fa src to string .Fa dst. The two strings may overlap.". But these functions don't even copy strings, and "may overlap" gives no details. Standards say "The .Fn function copies .Fn n characters from the object pointed to be .Fa s2 into the object pointed to by .Fa s1. [Complicated 5-line sentence saying that the copying acts as if it is done through a temporary object that doesn't overlap.]". I think standards say somewhere, but man pages say nowhere, that copying between non-overlapping objects has the defined behaviour of copying the bytes (u_char's) it is done by assignment of bytes or a function like memmove() (copying of general objects is more complicated since it may skip padding or otherwise fail to preserve the representation). This is needed so that the 5-line sentence doesn't need to be much longer. - memcpy.3. It was already documented in the BUGS section that memcpy() is implemented as bcopy() and therefore the "strings" "may overlap" [then portability considerations due to this]. This is buggier than before: - old bug: only the MI implementation of memcpy() clearly implements memcpy() as bcopy(). The amd64, i386 and arm MD implementations do the same. mips is backwards for bcopy() and implements it as memmove(). mips has 2 separate memcpy*.S files, one for plain arm and one for xscale. The plain arm one is smaller than memmove.S, so presumably doesn't support overlapped copies. The xscale one is much larger, so has space for overlapped copies and many optimizations that no longer work. I don't know what it does. (The i386 and amd64 ones have a few small optimizations that stopped working 25 years ago. At least they are short.) - new bug: the addition is inconsistent with the BUGS section. - strlcpy.c. This uses a combination of the C99 wording for strcpy() and buggy wording involving the pointer names (for both lcat and lcpy): If the .Fa src and .Fa dst strings overlap, the behavior is undefined. Now it is even better to not give the details about the objects, since at least for dst, the object isn't the string pointed to by dst -- it is the char array of length MIN(dstsize, strlen(dst) + 1). Similarly for old strncat() and strncpy() and not so old snprintf(). For the standard 'n' functions, the object is clearly limited by the dstsize arg and there is no problem in theory or practice if the dst string overlaps beyond the part that is written to, so it is just a bug in the specification = man page of the 'l' functions to have a stricter limit on overlap. Grepping for "overlap" in draft POSIX.1-2001 shows the following additional functions with overlap giving undefined behavior: bcopy: POSIX duplicates the C99 wording for memcpy, but for bcopy it has much the same bugs as this change: it uses "the area pointed to be .Fa s1", etc. memccpy, swab independent details for functions in <wchar.h> (but POSIX combines the details for at least sprintf() and snprintf()). Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20171114215948.L913>