From owner-freebsd-arm@freebsd.org Sat Feb 11 02:11:33 2017 Return-Path: Delivered-To: freebsd-arm@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0B31CCDA4AC for ; Sat, 11 Feb 2017 02:11:33 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-76.reflexion.net [208.70.210.76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C087D163D for ; Sat, 11 Feb 2017 02:11:32 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 20037 invoked from network); 11 Feb 2017 02:11:24 -0000 Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1) by 0 (rfx-qmail) with SMTP; 11 Feb 2017 02:11:24 -0000 Received: by mail-cs-01.app.dca.reflexion.local (Reflexion email security v8.30.0) with SMTP; Fri, 10 Feb 2017 21:11:24 -0500 (EST) Received: (qmail 13294 invoked from network); 11 Feb 2017 02:11:24 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 11 Feb 2017 02:11:24 -0000 Received: from [192.168.1.111] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id 09FC5EC7AC6; Fri, 10 Feb 2017 18:11:24 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: Re: bcopy/memmove optimization broken ? [looks like you are correct to me, I give supporting detail] From: Mark Millard In-Reply-To: <5335118.oK1KXXDaG5@pc-alex> Date: Fri, 10 Feb 2017 18:11:23 -0800 Cc: freebsd-arm , Ian Lepore Content-Transfer-Encoding: quoted-printable Message-Id: <25360EAB-3079-4037-9FB5-B7781ED40FA6@dsl-only.net> References: <5335118.oK1KXXDaG5@pc-alex> To: Alexandre Martins X-Mailer: Apple Mail (2.3259) X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Feb 2017 02:11:33 -0000 On 2017-Feb-10, at 7:51 AM, Alexandre Martins wrote: > Hi all >=20 > I see in the kernel code of bcopy/memmove an optimization if the = copied region=20 > don't overlap: >=20 > = https://svnweb.freebsd.org/base/head/sys/arm/arm/support.S?view=3Dannotate= #l403 >=20 > I'm a newbie in ARM assembly, so, sorry if i'm wrong, but > - R0 is the dst (1st parameter) > - R1 is the src (2nd parameter) >=20 > So : > subcc r3, r0, r1 /* if (dst > src) r3 =3D dst - src */ > subcs r3, r1, r0 /* if (src > dsr) r3 =3D src - dst */ > cmp r3, r2 /* if (r3 < len) we have an overlap */ > bcc PIC_SYM(_C_LABEL(memcpy), PLT) >=20 > Seems to be inverted. Should it be that ? : > subcs r3, r0, r1 /* if (dst > src) r3 =3D dst - src */ > subcc r3, r1, r0 /* if (src > dsr) r3 =3D src - dst */ > cmp r3, r2 /* if (r3 < len) we have an overlap */ > bcs PIC_SYM(_C_LABEL(memcpy), PLT) >=20 >=20 > Best regards >=20 > --=20 > Alexandre Martins > STORMSHIELD I did not expect something so central that has been around so long to look wrong but. . . Going through the supporting details of the original code based on my looking around here is what I found: #include void *memmove(void *dest, const void *src, size_t n); So I'd expect (c'ish notation): r0=3D=3Ddest r1=3D=3Dsrc r2=3D=3Dn Then for (the comments vs. code is being challenged): (The comments do seem to give the intent correctly.) cmp r0, r1 RETeq /* Bail now if src/dst are the same */ subcc r3, r0, r1 /* if (dst > src) r3 =3D dst - src */ subcs r3, r1, r0 /* if (src > dsr) r3 =3D src - dst */ cmp r3, r2 /* if (r3 < len) we have an overlap */ bcc PIC_SYM(_C_LABEL(memcpy), PLT) . . . cmp r0,r1 should result in condition code (c'ish notation): eq=3D(r0=3D=3Dr1) cc=3D(r0=3Dr1) (Only the r0 position has no immediate-value alternative.) subcc r3,r0,r1 is: if (cc) r3=3Dr0-r1 // no condition code change In other words: if (dst=3Dsrc) r3=3Dsrc-dst So it does not match the test listed in the comment as far as I can see. And in (mathematical) integer arithmetic the r3 result would be nonpositive for src-dst. But the earlier RETeq prevents the zero case, so: negative. For non-negative arithmetic (natural or whole): undefined. If it was only a normal mathemetical context r3=3D-abs(dst-src) would be a summary of the two sub instruction sequence as it is from what I can tell. For the purpose the summary should be: r3=3Dabs(dst-src), given dst!=3Dsrc . There is no need to wonder outside normal mathematical non-negative arithmetic in the process either. Your code change would have the right summary and use only normal mathematical rules from what I can tell: cmp r0, r1 RETeq /* Bail now if src/dst are the same */ subcs r3, r0, r1 /* if (dst > src) r3 =3D dst - src */ subcc r3, r1, r0 /* if (src > dsr) r3 =3D src - dst */ cmp r3, r2 /* if (r3 < len) we have an overlap */ bcs PIC_SYM(_C_LABEL(memcpy), PLT) . . . subcs r3,r0,r1 is: if (cs) r3=3Dr0-r1 // no condition code change In other words: if (dst>=3Dsrc) r3=3Ddst-src. Given the prior RETeq, that is effectively: if (dst>src) r3=3Ddst-src. And that matches the comments and would produce a positive result for r3, matching the normal mathematical result. subcc r3,r1,r0 is: if (cc) r3=3Dr1-r0 // no condition code change In other words: if (dst