Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 14 Nov 2017 23:41:38 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Warner Losh <imp@freebsd.org>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org,  svn-src-head@freebsd.org
Subject:   Re: svn commit: r325765 - head/lib/libc/string
Message-ID:  <20171114215948.L913@besplex.bde.org>
In-Reply-To: <201711131704.vADH4iZ9035812@repo.freebsd.org>
References:  <201711131704.vADH4iZ9035812@repo.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 13 Nov 2017, Warner Losh wrote:

>  Add notes about overlapping copies.
>
>  Add notes to each of these that specifically state that results are
>  undefined if the strings overlap. In the case of memcpy, we document
>  the overlapping behavior on FreeBSD (pre-existing). For str*, it is
>  left unspecified, however, since the default (and x86) implementations
>  do not handle overlapping strings properly.
>
>  PR: 223653
>  Sponsored by: Netflix
> 
> Modified:
>  head/lib/libc/string/memcpy.3
>  head/lib/libc/string/strcat.3
>  head/lib/libc/string/strcpy.3

C standards and POSIX have much better wording.

> Modified: head/lib/libc/string/memcpy.3
> ==============================================================================
> --- head/lib/libc/string/memcpy.3	Mon Nov 13 16:53:36 2017	(r325764)
> +++ head/lib/libc/string/memcpy.3	Mon Nov 13 17:04:44 2017	(r325765)
> @@ -54,6 +54,11 @@ bytes from string
> .Fa src
> to string
> .Fa dst .
> +If
> +.Fa src
> +and
> +.Fa dst
> +overlap, the results are not defined.
> .Sh RETURN VALUES
> The
> .Fn memcpy

The bugs here are:
- src and dst are pointers.  Overlap for pointers makes no sense except
   for pointers to single objects but the pointers here are not for that.
   Standards use the correct description "objects that overlap".
- style bug: verboseness in the nonsense, by naming the non-objects and
   then having to mark up their names.  A full description would be much
   longer, especially for non-counted copies.  For strcpy(), the relevant
   obects are the array of char beginning at .Fa src and ending at its
   terminating NUL, and the array of char beginning at .Fa dst with the
   same number of elements as the source array.  Standards omit all details
   and say simple "objects".  memcpy() only has a count so it is simpler.
   strncpy() has a count in addition to the NUL terminators, so it is more
   complicated.
- use of the wrong term "results".  The correct term is "behavior".  Results
   are are worst implementation-defined in Standards, since it is almost
   impossible for a result to be undefined without the whole behaviour also
   being undefined.  These functions have only 1 result.  In Standard wording,
   function results are return values, not the whole behavior of the function.
- use of the non-technical term "not defined".  Standards use the technical
   term "undefined".

The Standards wording is "If copying takes place between objects that
overlap, the behaviour is undefined".

> Modified: head/lib/libc/string/strcat.3
> Modified: head/lib/libc/string/strcpy.3
- as above, except "undefined" is spelled correctly

Grepping for "overlap" in draft C99 shows the following other functions
with overlap giving undefined behavior:
    snprintf, sprintf, sscanf, vsnprintf, vsprintf,
    mbstowcs, wcstombs,
    strxfrm, strftime.
    all functions in <wchar.h> where overlap might occur, except when
    explicitly specified (seems to be only explicitly specified for
    wmemmove)

The term "overlap" occurs in 193 section 3 man pages in -current (counting
links as 1 each).  It isi still never used for any of the above functions.
It is used in the following related man pages:
- bcmp.3.  This use is nonsense.  Overlap doesn't cause any problems for
   bcmp().
- bcopy.3 and memmove.3.  This use is not nonsense, but its wording is much
   worse than in Standards.  It says "The .Fn function copies .Fa len bytes
   from string .Fa src to string .Fa dst.  The two strings may overlap.".
   But these functions don't even copy strings, and "may overlap" gives no
   details.  Standards say "The .Fn function copies .Fn n characters from
   the object pointed to be .Fa s2 into the object pointed to by .Fa s1.
   [Complicated 5-line sentence saying that the copying acts as if it is
   done through a temporary object that doesn't overlap.]".  I think standards
   say somewhere, but man pages say nowhere, that copying between
   non-overlapping objects has the defined behaviour of copying the bytes
   (u_char's) it is done by assignment of bytes or a function like memmove()
   (copying of general objects is more complicated since it may skip padding
   or otherwise fail to preserve the representation).  This is needed so that
   the 5-line sentence doesn't need to be much longer.
- memcpy.3.  It was already documented in the BUGS section that memcpy() is
   implemented as bcopy() and therefore the "strings" "may overlap" [then
   portability considerations due to this].  This is buggier than before:
   - old bug: only the MI implementation of memcpy() clearly implements
     memcpy() as bcopy().  The amd64, i386 and arm MD implementations
     do the same.  mips is backwards for bcopy() and implements it as
     memmove().  mips has 2 separate memcpy*.S files, one for plain arm
     and one for xscale.  The plain arm one is smaller than memmove.S,
     so presumably doesn't support overlapped copies.  The xscale one
     is much larger, so has space for overlapped copies and many optimizations
     that no longer work.  I don't know what it does.  (The i386 and amd64
     ones have a few small optimizations that stopped working 25 years ago.
     At least they are short.)
   - new bug: the addition is inconsistent with the BUGS section.
- strlcpy.c.  This uses a combination of the C99 wording for strcpy() and
   buggy wording involving the pointer names (for both lcat and lcpy):
   If the .Fa src and .Fa dst strings overlap, the behavior is undefined.
   Now it is even better to not give the details about the objects, since
   at least for dst, the object isn't the string pointed to by dst --
   it is the char array of length MIN(dstsize, strlen(dst) + 1).  Similarly
   for old strncat() and strncpy() and not so old snprintf().  For the
   standard 'n' functions, the object is clearly limited by the dstsize arg
   and there is no problem in theory or practice if the dst string overlaps
   beyond the part that is written to, so it is just a bug in the
   specification = man page of the 'l' functions to have a stricter limit
   on overlap.

Grepping for "overlap" in draft POSIX.1-2001 shows the following additional
functions with overlap giving undefined behavior:
     bcopy: POSIX duplicates the C99 wording for memcpy, but for bcopy it
       has much the same bugs as this change: it uses "the area pointed to
       be .Fa s1", etc.
     memccpy, swab
     independent details for functions in <wchar.h> (but POSIX combines the
     details for at least sprintf() and snprintf()).

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20171114215948.L913>