From owner-svn-src-head@freebsd.org Thu May 2 07:44:09 2019 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 39EDB158BFAF for ; Thu, 2 May 2019 07:44:09 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic301-30.consmr.mail.ne1.yahoo.com (sonic301-30.consmr.mail.ne1.yahoo.com [66.163.184.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CDF8B81AF6 for ; Thu, 2 May 2019 07:44:07 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: bmU50vUVM1kcIYaLR2afzisEENQmLym6zfio42utUdhx3X45njsEpJQtGduVUht AY.SSeVblHqcPF8KcpNBRzhnaf7a2kEsZEqd6arvt_8VYnfCVALAdcOYv6BLDlILchmatTZZR9HU yD9dswJxWsYKIPlvLTlX3E8RheeBUpQx2l1_hTopoyA_wiQ2qKlKe_GxNE5S9KLWz37SCVLNtsQp 1yHtOxXJa9Ee8J73VhZMbvpdG0aqT.mU6IYeHSbet3h9YijJ5Q_l7X2UUwaYgICewdVSvqN5AVAA yj7pU4Vf7y06.iJEcaTMSJTrVqgrfVYKe42sT4XAhAMoHvuRRM3CSUv7iNw6jP3K1FOYhtj8YPhp Relpp89Ijv9EYkdV7UUNAmLsYpYzo7geoeO.IPp1flNA6jWKJ7fFspMsPUQ1b92GRCILUiv9sBK_ h6FvCYR7.qQZ_9eI03_7IT5Vm4hu_27.gElMZ89QZlxCFdsKFyv1R5D4.oBgntPLJ0gSIVJrV1ub tuszzzK8jzculO8JmkHJVknf_IabSF.YifrkpNDs5MKyztCQdY28igU8b5O8mLIktOKu5L65aWtH hjKLbo8o1iiyg5_QnKXGyyVNWXS8GgtFElObGnOTKtmW5xGRlFTIXbSmQjouHSCop9X79ov86d_8 qMpd8azI76tzHrnAd6VHp7vbgVKVEMORAHLcVB6HlnATx2xB2GWB3LFVawwoc7.xL2puiNZpdOe_ rsyKkY5MdjD02AA7.uufg9vxNL1J_77uocm8jlhoRV7nA3fUX4DVnIe4JybO2EOjy2iwojvFdX6H UBUIHJFW2SJjNhiDjtQQyPKWWFMPXQ6Nx3_4Ke.gPMNSqKK0_curXdti_QFpi23TvBzqK9I7f_6c Ddmv4_m3152mAaeCAjSulNLp4rK4mkJLTjwSmLx7S37aVQBKH4_fGRX5frCGO3Tmp2kAwZHxRizB Ai9nkAjeXKdAMhCjsv7ifFJFbax._LNU.N1YiUbrnntAw2lWY8.6JDqLIKYTt.2M.RJAruT_j15B I09p97s2JBiKg.OMJb3hxjTwHvsPv2ITTZJqiyT26HKm6V31swAIK1DkJtL_la5v7VkSnqfj.4pB QX7a9EYddlKJAAdD6N7pOy00GpHkuZcWXrn5lMPsUIH9eX_5Ve9aJF5FSgLe_EgyMRQXIrZvosB1 WYw-- Received: from sonic.gate.mail.ne1.yahoo.com by sonic301.consmr.mail.ne1.yahoo.com with HTTP; Thu, 2 May 2019 07:44:01 +0000 Received: from c-76-115-7-162.hsd1.or.comcast.net (EHLO [192.168.1.103]) ([76.115.7.162]) by smtp417.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID 9478745ae67bf0eed4034164ea4f65ca; Thu, 02 May 2019 07:43:59 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.8\)) Subject: Re: svn commit: r346588 - head/lib/libc/powerpc64/string Date: Thu, 2 May 2019 00:43:57 -0700 References: To: svn-src-head@freebsd.org, Justin Hibbits In-Reply-To: Message-Id: <9C27DA97-6C2F-42B0-8309-8C8FBDECB8F4@yahoo.com> X-Mailer: Apple Mail (2.3445.104.8) X-Rspamd-Queue-Id: CDF8B81AF6 X-Spamd-Bar: + X-Spamd-Result: default: False [1.34 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; DKIM_TRACE(0.00)[yahoo.com:+]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.94)[0.941,0]; NEURAL_HAM_LONG(-0.46)[-0.455,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(1.16)[ip: (3.37), ipnet: 66.163.184.0/21(1.38), asn: 36646(1.11), country: US(-0.06)]; NEURAL_SPAM_MEDIUM(0.20)[0.204,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[199.184.163.66.list.dnswl.org : 127.0.5.0] X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 May 2019 07:44:09 -0000 [I did not deal with translating register usage correctly.] > On 2019-Apr-27, at 01:50, Mark Millard wrote: >=20 > Justin Hibbits jhibbits at FreeBSD.org wrote on > Fri Apr 26 16:21:47 UTC 2019 : >=20 >> This actually uses 'cmpb' which is only available on PowerISA 2.05+, = so >> I'll need to pull it out for now, and re-enable it once we have >> ifuncs. As it stands, this commit broke the G5 and POWER4/POWER5. >=20 > As I understand the code like: >=20 > xor %r8,%r8,%r8 /* %r8 <- Zero. */ > xor %r0,%r5,%r6 /* Check if double words are different. = */ > cmpb %r7,%r5,%r8 /* Check if double words contain zero. = */ >=20 > /* > * If double words are different or contain zero, > * find what byte is different or contains zero, > * else load next double words. > */ > or. %r9,%r7,%r0 > bne .Lstrcmp_check_zeros_differences >=20 > (and similarly for the loop. . .): >=20 > A) Each byte of %r5 that is non-zero needs that byte of %r7 to be = zero. > B) Each byte of %r5 that is zero need that byte of %r7 to be non-zero. >=20 > (cmpb assigns 0xff for non-zero as I understand, but even one non-zero > bit is sufficient for the overall code structure.) >=20 > If I've got that much correct, then the following might be an > alternative to cmpb for now. I'll explain the code via commented > c/c++-ish code and then show the assembler notation: >=20 > unsigned long ul_has_zero_byte(unsigned long b) > { > unsigned long constexpr low_7bits_of_bytes{0x7f7f7f7f'7f7f7f7ful}; >=20 > // Illustrating = byte transformations: > unsigned long const x=3D b & low_7bits_of_bytes; // 0x00->0x00, = 0x80->0x00, other->ms-bit-in-byte=3D=3D0 > unsigned long const y=3D x + low_7bits_of_bytes; // ->0x7f, = ->0x7f, ->ms-bit-in-byte=3D=3D1 > unsigned long const z=3D b | y | low_7bits_of_bytes; // ->0x7f, = ->0xff, ->0xff > return ~z; // ->0x80, = ->0x00, ->0x00 > } >=20 > (used in a powerpc64 context, so unsigned long being 64 bits). >=20 > So, not using %r8 as zero but for a different value, > each cmpb can be replaced by: >=20 > # Only once to set up the value in %r8 (Note: 32639=3D0x7f7f): > lis r8,32639 > ori r8,r8,32639 > rldimi r8,r8,32,0 >=20 > # each "cmpb %r7,%r5,%r8" replaced by: > and r7,r5,r8 > add r7,r7,r8 > nor r5,r7,r5 > andc r5,r5,r8 The above 4 lines are an incorrect match to the context's register usage: only r7 of the 3 registers r5, r7, r8 should have been changed. It looks like another temporary register (for the stage) is required to make a match: and %r9,%r5,%r8 add %r9,%r9,%r8 nor %r7,%r9,%r5 andc %r7,%r7,%r8 (%r9 later being replaced via: or. %r9,%r7,%r0) > (The code is from compiler output, but with registers adjusted > to match the context.) >=20 >=20 > The c/c++-ish code came from thinking about material from Hacker's > Delight Second Edition and the specific criteria needed here: it > uses part of Figure 6-2 "Find First 0-Byte, branch-free code", > adjusted for width and for returning something sufficient here. >=20 =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)