Date: Thu, 2 May 2019 00:43:57 -0700 From: Mark Millard <marklmi@yahoo.com> To: svn-src-head@freebsd.org, Justin Hibbits <chmeeedalf@gmail.com> Subject: Re: svn commit: r346588 - head/lib/libc/powerpc64/string Message-ID: <9C27DA97-6C2F-42B0-8309-8C8FBDECB8F4@yahoo.com> In-Reply-To: <BA85BD70-D514-4B78-968F-06EC1ABD0756@yahoo.com> References: <BA85BD70-D514-4B78-968F-06EC1ABD0756@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[I did not deal with translating register usage correctly.] > On 2019-Apr-27, at 01:50, Mark Millard <marklmi@yahoo.com> wrote: >=20 > Justin Hibbits jhibbits at FreeBSD.org wrote on > Fri Apr 26 16:21:47 UTC 2019 : >=20 >> This actually uses 'cmpb' which is only available on PowerISA 2.05+, = so >> I'll need to pull it out for now, and re-enable it once we have >> ifuncs. As it stands, this commit broke the G5 and POWER4/POWER5. >=20 > As I understand the code like: >=20 > xor %r8,%r8,%r8 /* %r8 <- Zero. */ > xor %r0,%r5,%r6 /* Check if double words are different. = */ > cmpb %r7,%r5,%r8 /* Check if double words contain zero. = */ >=20 > /* > * If double words are different or contain zero, > * find what byte is different or contains zero, > * else load next double words. > */ > or. %r9,%r7,%r0 > bne .Lstrcmp_check_zeros_differences >=20 > (and similarly for the loop. . .): >=20 > A) Each byte of %r5 that is non-zero needs that byte of %r7 to be = zero. > B) Each byte of %r5 that is zero need that byte of %r7 to be non-zero. >=20 > (cmpb assigns 0xff for non-zero as I understand, but even one non-zero > bit is sufficient for the overall code structure.) >=20 > If I've got that much correct, then the following might be an > alternative to cmpb for now. I'll explain the code via commented > c/c++-ish code and then show the assembler notation: >=20 > unsigned long ul_has_zero_byte(unsigned long b) > { > unsigned long constexpr low_7bits_of_bytes{0x7f7f7f7f'7f7f7f7ful}; >=20 > // Illustrating = byte transformations: > unsigned long const x=3D b & low_7bits_of_bytes; // 0x00->0x00, = 0x80->0x00, other->ms-bit-in-byte=3D=3D0 > unsigned long const y=3D x + low_7bits_of_bytes; // ->0x7f, = ->0x7f, ->ms-bit-in-byte=3D=3D1 > unsigned long const z=3D b | y | low_7bits_of_bytes; // ->0x7f, = ->0xff, ->0xff > return ~z; // ->0x80, = ->0x00, ->0x00 > } >=20 > (used in a powerpc64 context, so unsigned long being 64 bits). >=20 > So, not using %r8 as zero but for a different value, > each cmpb can be replaced by: >=20 > # Only once to set up the value in %r8 (Note: 32639=3D0x7f7f): > lis r8,32639 > ori r8,r8,32639 > rldimi r8,r8,32,0 >=20 > # each "cmpb %r7,%r5,%r8" replaced by: > and r7,r5,r8 > add r7,r7,r8 > nor r5,r7,r5 > andc r5,r5,r8 The above 4 lines are an incorrect match to the context's register usage: only r7 of the 3 registers r5, r7, r8 should have been changed. It looks like another temporary register (for the stage) is required to make a match: and %r9,%r5,%r8 add %r9,%r9,%r8 nor %r7,%r9,%r5 andc %r7,%r7,%r8 (%r9 later being replaced via: or. %r9,%r7,%r0) > (The code is from compiler output, but with registers adjusted > to match the context.) >=20 >=20 > The c/c++-ish code came from thinking about material from Hacker's > Delight Second Edition and the specific criteria needed here: it > uses part of Figure 6-2 "Find First 0-Byte, branch-free code", > adjusted for width and for returning something sufficient here. >=20 =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9C27DA97-6C2F-42B0-8309-8C8FBDECB8F4>