From owner-svn-src-head@freebsd.org Thu Nov 16 07:15:48 2017 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 44960DC0CDE for ; Thu, 16 Nov 2017 07:15:48 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail108.syd.optusnet.com.au (mail108.syd.optusnet.com.au [211.29.132.59]) by mx1.freebsd.org (Postfix) with ESMTP id CA3F070495 for ; Thu, 16 Nov 2017 07:15:47 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.103] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail108.syd.optusnet.com.au (Postfix) with ESMTPS id BA9FA1A1BE3; Thu, 16 Nov 2017 18:15:39 +1100 (AEDT) Date: Thu, 16 Nov 2017 18:15:36 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Mark Millard cc: svn-src-head@freebsd.org Subject: Re: svn commit: r325765 - head/lib/libc/string In-Reply-To: <37F2AD18-1B69-4559-8BD3-B92E01A067BF@dsl-only.net> Message-ID: <20171116171111.D1034@besplex.bde.org> References: <37F2AD18-1B69-4559-8BD3-B92E01A067BF@dsl-only.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=KeqiiUQD c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=8ezO6BSHHXrXaAvICcMA:9 a=CjuIK1q_8ugA:10 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Nov 2017 07:15:48 -0000 On Wed, 15 Nov 2017, Mark Millard wrote: > Bruce Evans brde at optusnet.com.au wrote on > Tue Nov 14 12:41:50 UTC 2017 : > >> - memcpy.3. It was already documented in the BUGS section that memcpy() is >> implemented as bcopy() and therefore the "strings" "may overlap" [then >> portability considerations due to this]. This is buggier than before: >> - old bug: only the MI implementation of memcpy() clearly implements >> memcpy() as bcopy(). The amd64, i386 and arm MD implementations >> do the same. I was confused between arm and mips and only fixed this in some places. It is mips that does the same as amd64 and i386. arm has different MD code, and the others have no md code. > head/sys/arm/arm/support.S ( -r283366 ) has > > 394 ENTRY(bcopy) > 395 /* switch the source and destination registers */ > 396 eor r0, r1, r0 > 397 eor r1, r0, r1 > 398 eor r0, r1, r0 > 399 EENTRY(memmove) > 400 /* Do the buffers overlap? */ > 401 cmp r0, r1 > 402 RETeq /* Bail now if src/dst are the same */ > 403 subcc r3, r0, r1 /* if (dst > src) r3 = dst - src */ > 404 subcs r3, r1, r0 /* if (src > dsr) r3 = src - dst */ > 405 cmp r3, r2 /* if (r3 < len) we have an overlap */ > 406 bcc PIC_SYM(_C_LABEL(memcpy), PLT) > . . . I didn't look closely at the arm code, and just noticed more bugs in it. Falling through like that breaks at least profiling. Only ENTRY() does _PROF_PROLOGUE, so profiling of memmove() seems to be broken. A less worse way to use EENTRY() is ENTRY() then 1 more more EENTRY()'s with no code in between. Then the names are just aliases. Profiling can't distinguish the aliases, but at least all entries get counted. amd64 and i386 use ALTENTRY() which handles this as well as possible (not very well -- like a tail call -- too much gets charged to the tail function. sparc64 also has ALTENTRY() but not the complications needed for it to work only as badly as a tail call. It is a style bug for memmove() to exist in the kernel. Especially an MD version of it. It is bad enough that libkern has an MI memmove(). This is not the same as the libc/string one (another style bug), but just implements memmove() by calling bcopy(). Profiling works right for this unless it the function is over-optimized to a tail call when it works like ALTENTRY(). > head/lib/libc/arm/string/memmove.S ( -r288373 ) has: > > 37 #ifndef _BCOPY > 38 /* LINTSTUB: Func: void *memmove(void *, const void *, size_t) */ > 39 ENTRY(memmove) > 40 #else > 41 /* bcopy = memcpy/memmove with arguments reversed. */ > 42 /* LINTSTUB: Func: void bcopy(void *, void *, size_t) */ > 43 ENTRY(bcopy) > 44 /* switch the source and destination registers */ > 45 eor r0, r1, r0 > 46 eor r1, r0, r1 > 47 eor r0, r1, r0 > 48 #endif This is the correct way to do it -- separate object files with duplication in source files avoided using ifdefs. > 49 /* Do the buffers overlap? */ > 50 cmp r0, r1 > 51 it eq > 52 RETeq /* Bail now if src/dst are the same */ > 53 ite cc > 54 subcc r3, r0, r1 /* if (dst > src) r3 = dst - src */ > 55 subcs r3, r1, r0 /* if (src > dsr) r3 = src - dst */ > 56 cmp r3, r2 /* if (r3 < len) we have an overlap */ > 57 bcc PIC_SYM(_C_LABEL(memcpy), PLT) > . . . > > It looks to me like bcopy and memmove call memcpy under > some conditions, not the other way around. I did not > notice any code in memcpy going back to bcopy (or > memmove) in either area. (That might have lead to > recursion as things are.) The above only shows memmove() calling bcopy(), only in the kernel. This would be invalid in userland since memmove() is Standard but bcopy() is supposed to be usable by applications (unless __BSD_VISIBLE). The separate object files in userland make this not a problem. Otherwise, bcopy and memmove in the same object file would cause namespace problems even if bcopy is implemented as a [tail] call or fall-through to memmove. > ... > Another issue is if compilers (clang, gcc) are well > controlled for code substitutions to preserve > supposed FreeBSD rules about where overlaps > are well defined. The library code might not be all > there is to it as far as behavior goes for > compiled C/C++ source code. Do you mean the namespace stuff? > head/sys/arm64/arm64/ is similar for bcopy/memmove > calling memcpy (but head/lib/libc/amd64/string/ is > not): It is reasonable, and possibly best, to implement the usual non-overlapping case by a call to memcpy() and not try hard to optimize the overlapping case (or jump far away to it to keep caches cleaner for the usual case). > head/sys/arm64/arm64/memmove.S ( -r307909 ) has: > ... > head/lib/libc/amd64/string/bcopy.S has: > ... Same organisation as for plain arm, so good for libc and not so good fot the kernel. >> mips is backwards for bcopy() and implements it as >> memmove(). mips has 2 separate memcpy*.S files, one for plain arm >> and one for xscale. The plain arm one is smaller than memmove.S, >> so presumably doesn't support overlapped copies. The xscale one >> is much larger, so has space for overlapped copies and many optimizations >> that no longer work. I don't know what it does. > > The mix of "mips" and "arm"/"xscale" above confused me. Looking > around this seems to be referencing head/lib/libc/arm/string/ > materials instead of head/lib/libc/mips/string/ or > head/sys/mips/mips/ materials. I meant arm here. mips does have some MI string instructions in libc which I missed or got confused before. Also, aarch64 has some which I missed because the they are obfuscated by putting them in contrib instead of under libc. mips implements all of bcopy, memcpy and memmove in bcopy.S, and seems to be indeed like amd64 and i386 -- none of these arches bothers to optimize memcpy by not supporting overlapping copies in it, exactly as documented. However, the documentation is still incorrect. Without -ffreestanding, compilers can and do often implement memcpy inline. Then overlapping copies are not supported. The BUGS section in memcpy.3 should just be deleted. Bruce