Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 2 May 2019 00:43:57 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        svn-src-head@freebsd.org, Justin Hibbits <chmeeedalf@gmail.com>
Subject:   Re: svn commit: r346588 - head/lib/libc/powerpc64/string
Message-ID:  <9C27DA97-6C2F-42B0-8309-8C8FBDECB8F4@yahoo.com>
In-Reply-To: <BA85BD70-D514-4B78-968F-06EC1ABD0756@yahoo.com>
References:  <BA85BD70-D514-4B78-968F-06EC1ABD0756@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[I did not deal with translating register usage correctly.]

> On 2019-Apr-27, at 01:50, Mark Millard <marklmi@yahoo.com> wrote:
>=20
> Justin Hibbits jhibbits at FreeBSD.org wrote on
> Fri Apr 26 16:21:47 UTC 2019 :
>=20
>> This actually uses 'cmpb' which is only available on PowerISA 2.05+, =
so
>> I'll need to pull it out for now, and re-enable it once we have
>> ifuncs.  As it stands, this commit broke the G5 and POWER4/POWER5.
>=20
> As I understand the code like:
>=20
> 	xor	%r8,%r8,%r8	/* %r8 <- Zero. */
> 	xor	%r0,%r5,%r6	/* Check if double words are different. =
*/
> 	cmpb	%r7,%r5,%r8	/* Check if double words contain zero. =
*/
>=20
> 	/*
> 	 * If double words are different or contain zero,
> 	 * find what byte is different or contains zero,
> 	 * else load next double words.
> 	 */
> 	or.	%r9,%r7,%r0
> 	bne	.Lstrcmp_check_zeros_differences
>=20
> (and similarly for the loop. . .):
>=20
> A) Each byte of %r5 that is non-zero needs that byte of %r7 to be =
zero.
> B) Each byte of %r5 that is zero need that byte of %r7 to be non-zero.
>=20
> (cmpb assigns 0xff for non-zero as I understand, but even one non-zero
> bit is sufficient for the overall code structure.)
>=20
> If I've got that much correct, then the following might be an
> alternative to cmpb for now. I'll explain the code via commented
> c/c++-ish code and then show the assembler notation:
>=20
> unsigned long ul_has_zero_byte(unsigned long b)
> {
>    unsigned long constexpr low_7bits_of_bytes{0x7f7f7f7f'7f7f7f7ful};
>=20
>                                                       // Illustrating =
byte transformations:
>    unsigned long const x=3D b & low_7bits_of_bytes;     // 0x00->0x00, =
0x80->0x00, other->ms-bit-in-byte=3D=3D0
>    unsigned long const y=3D x + low_7bits_of_bytes;     //     ->0x7f, =
    ->0x7f,      ->ms-bit-in-byte=3D=3D1
>    unsigned long const z=3D b | y | low_7bits_of_bytes; //     ->0x7f, =
    ->0xff,      ->0xff
>    return ~z;                                         //     ->0x80,   =
  ->0x00,      ->0x00
> }
>=20
> (used in a powerpc64 context, so unsigned long being 64 bits).
>=20
> So, not using %r8 as zero but for a different value,
> each cmpb can be replaced by:
>=20
> # Only once to set up the value in %r8 (Note: 32639=3D0x7f7f):
> lis     r8,32639
> ori     r8,r8,32639
> rldimi  r8,r8,32,0
>=20
> # each "cmpb %r7,%r5,%r8" replaced by:
> and     r7,r5,r8
> add     r7,r7,r8
> nor     r5,r7,r5
> andc    r5,r5,r8

The above 4 lines are an incorrect match to the context's
register usage: only r7 of the 3 registers r5, r7, r8
should have been changed. It looks like another temporary
register (for the stage) is required to make a match:

and      %r9,%r5,%r8
add      %r9,%r9,%r8
nor      %r7,%r9,%r5
andc     %r7,%r7,%r8

(%r9 later being replaced via: or. %r9,%r7,%r0)

> (The code is from compiler output, but with registers adjusted
> to match the context.)
>=20
>=20
> The c/c++-ish code came from thinking about material from Hacker's
> Delight Second Edition and the specific criteria needed here: it
> uses part of Figure 6-2 "Find First 0-Byte, branch-free code",
> adjusted for width and for returning something sufficient here.
>=20



=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9C27DA97-6C2F-42B0-8309-8C8FBDECB8F4>