From owner-freebsd-numerics@freebsd.org Sun Sep 8 05:55:51 2019 Return-Path: Delivered-To: freebsd-numerics@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 04E15E9287; Sun, 8 Sep 2019 05:55:51 +0000 (UTC) (envelope-from stefan.kanthak@nexgo.de) Received: from vsmx009.vodafonemail.xion.oxcs.net (vsmx009.vodafonemail.xion.oxcs.net [153.92.174.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 46R0s10sg7z4Wrk; Sun, 8 Sep 2019 05:55:48 +0000 (UTC) (envelope-from stefan.kanthak@nexgo.de) Received: from vsmx001.vodafonemail.xion.oxcs.net (unknown [192.168.75.191]) by mta-5-out.mta.xion.oxcs.net (Postfix) with ESMTP id 2967015A7FF7; Sun, 8 Sep 2019 05:55:46 +0000 (UTC) Received: from H270 (unknown [93.230.223.140]) by mta-5-out.mta.xion.oxcs.net (Postfix) with ESMTPA id 8F3C515A8141; Sun, 8 Sep 2019 05:55:39 +0000 (UTC) Message-ID: <174BDDD122964DA9AD32D77663AB863D@H270> From: "Stefan Kanthak" To: , Cc: Subject: Shorterr releng/12.0/lib/msun/i387/s_remquo.S, releng/12.0/lib/msun/amd64/s_remquo.S, ... Date: Sun, 8 Sep 2019 07:52:46 +0200 Organization: Me, myself & IT MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Windows Mail 6.0.6002.18197 X-MimeOLE: Produced By Microsoft MimeOLE V6.1.7601.24158 X-VADE-STATUS: LEGIT X-Rspamd-Queue-Id: 46R0s10sg7z4Wrk X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of stefan.kanthak@nexgo.de designates 153.92.174.87 as permitted sender) smtp.mailfrom=stefan.kanthak@nexgo.de X-Spamd-Result: default: False [-2.06 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-0.93)[-0.932,0]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; R_SPF_ALLOW(-0.20)[+ip4:153.92.174.0/24]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; MIME_TRACE(0.00)[0:+]; TO_DN_NONE(0.00)[]; DMARC_NA(0.00)[nexgo.de]; HAS_ORG_HEADER(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; RCVD_TLS_LAST(0.00)[]; NEURAL_HAM_LONG(-1.00)[-0.997,0]; NEURAL_HAM_SHORT(-0.23)[-0.229,0]; HAS_X_PRIO_THREE(0.00)[3]; IP_SCORE(-0.00)[country: DE(-0.01)]; FROM_EQ_ENVFROM(0.00)[]; RCVD_IN_DNSWL_LOW(-0.10)[87.174.92.153.list.dnswl.org : 127.0.5.1]; R_DKIM_NA(0.00)[]; ASN(0.00)[asn:60664, ipnet:153.92.174.0/24, country:DE]; MID_RHS_NOT_FQDN(0.50)[]; RECEIVED_SPAMHAUS_PBL(0.00)[140.223.230.93.khpj7ygk5idzvmvt5x4ziurxhy.zen.dq.spamhaus.net : 127.0.0.10] X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Sep 2019 05:55:51 -0000 Hi, here's a patch to shave 4 instructions (and about 25% code size) from http://sources.freebsd.org/releng/12.0/lib/msun/i387/s_remquo.S http://sources.freebsd.org/releng/12.0/lib/msun/i387/s_remquof.S http://sources.freebsd.org/releng/12.0/lib/msun/i387/s_remquol.S http://sources.freebsd.org/releng/12.0/lib/msun/amd64/s_remquo.S http://sources.freebsd.org/releng/12.0/lib/msun/amd64/s_remquof.S http://sources.freebsd.org/releng/12.0/lib/msun/amd64/s_remquol.S Especially the negation is rather clumsy: 1. the 2 shifts by 16 to propagate the sign to all bits can be replaced with a single shift by 31, or with a CLTD alias CDQ (which is 2 bytes shorter); 2. the conversion of -1 to +1 via AND and its addition can be replaced by subtraction of -1. The minor differences between the code for the float, double and long double as well as the i387 and amd64 implementations are intended; pick the variant you like best. I prefer and recommend the variant with 3 ADC and 2 SHL instructions used for the i387 double-precision function http://sources.freebsd.org/releng/12.0/lib/msun/i387/s_remquo.S, which comes first. stay tuned Stefan Kanthak PS: if you ever need to run these functions on a CPU without barrel shifter, replace the first SHL or ROR with BT $14,%eax and the second SHL or ROL with BT $9,%eax ... and hope that BT doesn't use a slow shift under the hood. --- -/releng/12.0/lib/msun/i387/s_remquo.S +++ +/releng/12.0/lib/msun/i387/s_remquo.S @@ -34,1 +34,2 @@ ENTRY(remquo) + xorl %ecx,%ecx @@ -42,22 +43,17 @@ /* Extract the three low-order bits of the quotient from C0,C3,C1. */ - shrl $6,%eax - movl %eax,%ecx - andl $0x108,%eax - rorl $7,%eax - orl %eax,%ecx - roll $4,%eax - orl %ecx,%eax - andl $7,%eax + adcl %ecx,%ecx + shll $18,%eax + adcl %ecx,%ecx + shll $5,%eax + adcl %ecx,%ecx /* Negate the quotient bits if x*y<0. Avoid using an unpredictable branch. */ - movl 16(%esp),%ecx - xorl 8(%esp),%ecx - sarl $16,%ecx - sarl $16,%ecx - xorl %ecx,%eax - andl $1,%ecx - addl %ecx,%eax + movl 16(%esp),%eax + xorl 8(%esp),%eax + cltd + xorl %edx,%ecx + subl %edx,%ecx /* Store the quotient and return. */ - movl 20(%esp),%ecx - movl %eax,(%ecx) + movl 20(%esp),%eax + movl %ecx,(%eax) ret END(remquo) --- -/releng/12.0/lib/msun/i387/s_remquof.S +++ +/releng/12.0/lib/msun/i387/s_remquof.S @@ -42,22 +42,18 @@ /* Extract the three low-order bits of the quotient from C0,C3,C1. */ - shrl $6,%eax - movl %eax,%ecx - andl $0x108,%eax - rorl $7,%eax - orl %eax,%ecx - roll $4,%eax - orl %ecx,%eax - andl $7,%eax + sbbl %ecx,%ecx + negl %ecx + shll $18,%eax + adcl %ecx,%ecx + shll $5,%eax + adcl %ecx,%ecx /* Negate the quotient bits if x*y<0. Avoid using an unpredictable branch. */ - movl 8(%esp),%ecx - xorl 4(%esp),%ecx - sarl $16,%ecx - sarl $16,%ecx - xorl %ecx,%eax - andl $1,%ecx - addl %ecx,%eax + movl 8(%esp),%eax + xorl 4(%esp),%eax + cltd + xorl %edx,%ecx + subl %edx,%ecx /* Store the quotient and return. */ - movl 12(%esp),%ecx - movl %eax,(%ecx) + movl 12(%esp),%eax + movl %ecx,(%eax) ret END(remquof) --- -/releng/12.0/lib/msun/i387/s_remquol.S +++ +/releng/12.0/lib/msun/i387/s_remquol.S @@ -42,22 +42,19 @@ /* Extract the three low-order bits of the quotient from C0,C3,C1. */ - shrl $6,%eax - movl %eax,%ecx - andl $0x108,%eax - rorl $7,%eax - orl %eax,%ecx - roll $4,%eax - orl %ecx,%eax - andl $7,%eax + setc %cl + movzbl %cl,%ecx + shll $18,%eax + adcl %ecx,%ecx + shll $5,%eax + adcl %ecx,%ecx /* Negate the quotient bits if x*y<0. Avoid using an unpredictable branch. */ - movl 24(%esp),%ecx - xorl 12(%esp),%ecx - movsx %cx,%ecx - sarl $16,%ecx - sarl $16,%ecx - xorl %ecx,%eax - andl $1,%ecx - addl %ecx,%eax + movl 24(%esp),%eax + xorl 12(%esp),%eax + cwtl + cltd + xorl %edx,%ecx + subl %edx,%ecx /* Store the quotient and return. */ - movl 28(%esp),%ecx - movl %eax,(%ecx) + movl 28(%esp),%eax + movl %ecx,(%eax) ret +END(remquol) --- -/releng/12.0/lib/msun/amd64/s_remquo.S --- +/releng/12.0/lib/msun/amd64/s_remquo.S @@ -34,1 +35,2 @@ ENTRY(remquo) + xorl %ecx,%ecx @@ -44,19 +45,14 @@ /* Extract the three low-order bits of the quotient from C0,C3,C1. */ - shrl $6,%eax - movl %eax,%ecx - andl $0x108,%eax - rorl $7,%eax - orl %eax,%ecx - roll $4,%eax - orl %ecx,%eax - andl $7,%eax + adcl %ecx,%ecx + rorl $15,%eax + adcl %ecx,%ecx + roll $6,%eax + adcl %ecx,%ecx /* Negate the quotient bits if x*y<0. Avoid using an unpredictable branch. */ - movl -12(%rsp),%ecx - xorl -4(%rsp),%ecx - sarl $16,%ecx - sarl $16,%ecx - xorl %ecx,%eax - andl $1,%ecx - addl %ecx,%eax + movl -12(%rsp),%eax + xorl -4(%rsp),%eax + cltd + xorl %edx,%ecx + subl %edx,%ecx /* Store the quotient and return. */ - movl %eax,(%rdi) + movl %ecx,(%rdi) --- -/releng/12.0/lib/msun/amd64/s_remquof.S --- +/releng/12.0/lib/msun/amd64/s_remquof.S @@ -44,19 +44,15 @@ /* Extract the three low-order bits of the quotient from C0,C3,C1. */ - shrl $6,%eax - movl %eax,%ecx - andl $0x108,%eax - rorl $7,%eax - orl %eax,%ecx - roll $4,%eax - orl %ecx,%eax - andl $7,%eax + sbbl %ecx,%ecx + negl %ecx + rorl $15,%eax + adcl %ecx,%ecx + roll $6,%eax + adcl %ecx,%ecx /* Negate the quotient bits if x*y<0. Avoid using an unpredictable branch. */ - movl -8(%rsp),%ecx - xorl -4(%rsp),%ecx - sarl $16,%ecx - sarl $16,%ecx - xorl %ecx,%eax - andl $1,%ecx - addl %ecx,%eax + movl -8(%rsp),%eax + xorl -4(%rsp),%eax + cltd + xorl %edx,%ecx + subl %edx,%ecx /* Store the quotient and return. */ - movl %eax,(%rdi) + movl %ecx,(%rdi) --- -/releng/12.0/lib/msun/amd64/s_remquol.S --- +/releng/12.0/lib/msun/amd64/s_remquol.S @@ -42,21 +42,18 @@ /* Extract the three low-order bits of the quotient from C0,C3,C1. */ - shrl $6,%eax - movl %eax,%ecx - andl $0x108,%eax - rorl $7,%eax - orl %eax,%ecx - roll $4,%eax - orl %ecx,%eax - andl $7,%eax + setc %cl + movzbl %cl,%ecx + rorl $15,%eax + adcl %ecx,%ecx + roll $6,%eax + adcl %ecx,%ecx /* Negate the quotient bits if x*y<0. Avoid using an unpredictable branch. */ - movl 32(%rsp),%ecx - xorl 16(%rsp),%ecx - movsx %cx,%ecx - sarl $16,%ecx - sarl $16,%ecx - xorl %ecx,%eax - andl $1,%ecx - addl %ecx,%eax + movl 32(%rsp),%eax + xorl 16(%rsp),%eax + cwtl + cltd + xorl %edx,%ecx + subl %edx,%ecx /* Store the quotient and return. */ - movl %eax,(%rdi) + movl %ecx,(%rdi) ret +END(remquol) From owner-freebsd-numerics@freebsd.org Sun Sep 8 14:45:30 2019 Return-Path: Delivered-To: freebsd-numerics@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id EDA3EF52AE; Sun, 8 Sep 2019 14:45:30 +0000 (UTC) (envelope-from stefan.kanthak@nexgo.de) Received: from mx009.vodafonemail.xion.oxcs.net (mx009.vodafonemail.xion.oxcs.net [153.92.174.39]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 46RDc95bWHz3yfj; Sun, 8 Sep 2019 14:45:29 +0000 (UTC) (envelope-from stefan.kanthak@nexgo.de) Received: from vsmx002.vodafonemail.xion.oxcs.net (unknown [192.168.75.192]) by mta-6-out.mta.xion.oxcs.net (Postfix) with ESMTP id 0AFE060D643; Sun, 8 Sep 2019 14:45:27 +0000 (UTC) Received: from H270 (unknown [93.230.223.140]) by mta-6-out.mta.xion.oxcs.net (Postfix) with ESMTPA id 877DB60D6AE; Sun, 8 Sep 2019 14:45:22 +0000 (UTC) Message-ID: <769CF9CBA0A34DFA92C739C970FA2AAF@H270> From: "Stefan Kanthak" To: , Subject: Shorter releng/12.0/lib/msun/i387/e_exp.S and releng/12.0/lib/msun/i387/s_finite.S Date: Sun, 8 Sep 2019 16:37:03 +0200 Organization: Me, myself & IT MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Windows Mail 6.0.6002.18197 X-MimeOLE: Produced By Microsoft MimeOLE V6.1.7601.24158 X-VADE-STATUS: LEGIT X-Rspamd-Queue-Id: 46RDc95bWHz3yfj X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of stefan.kanthak@nexgo.de designates 153.92.174.39 as permitted sender) smtp.mailfrom=stefan.kanthak@nexgo.de X-Spamd-Result: default: False [-2.35 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_LONG(-1.00)[-0.999,0]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:153.92.174.0/24]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; TO_DN_NONE(0.00)[]; DMARC_NA(0.00)[nexgo.de]; HAS_ORG_HEADER(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; RCVD_TLS_LAST(0.00)[]; RCVD_IN_DNSWL_MED(-0.20)[39.174.92.153.list.dnswl.org : 127.0.5.2]; RCPT_COUNT_TWO(0.00)[2]; HAS_X_PRIO_THREE(0.00)[3]; NEURAL_HAM_SHORT(-0.40)[-0.396,0]; NEURAL_HAM_MEDIUM(-0.95)[-0.949,0]; IP_SCORE(-0.00)[country: DE(-0.01)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MID_RHS_NOT_FQDN(0.50)[]; ASN(0.00)[asn:60664, ipnet:153.92.174.0/24, country:DE]; MIME_TRACE(0.00)[0:+] X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Sep 2019 14:45:31 -0000 Hi, here's a patch to remove a conditional branch (and more) from http://sources.freebsd.org/releng/12.0/lib/msun/i387/e_exp.S plus a patch to shave some bytes (immediate operands) from http://sources.freebsd.org/releng/12.0/lib/msun/i387/s_finite.S stay tuned Stefan Kanthak --- -/releng/12.0/lib/msun/i387/e_exp.S +++ +/releng/12.0/lib/msun/i387/e_exp.S @@ -45,7 +45,25 @@ movl 8(%esp),%eax - andl $0x7fffffff,%eax - cmpl $0x7ff00000,%eax - jae x_Inf_or_NaN + leal (%eax+%eax),%edx + cmpl $0xffe00000,%edx + jb finite + /* + * Return 0 if x is -Inf. Otherwise just return x; when x is Inf + * this gives Inf, and when x is a NaN this gives the same result + * as (x + x) (x quieted). + */ + cmpl 4(%esp),$0 + sbbl $0xfff00000,%eax + je minus_inf + +nan: fldl 4(%esp) + ret +minus_inf: + fldz + ret + +finite: + fldl 4(%esp) + @@ -80,19 +98,3 @@ ret - -x_Inf_or_NaN: - /* - * Return 0 if x is -Inf. Otherwise just return x; when x is Inf - * this gives Inf, and when x is a NaN this gives the same result - * as (x + x) (x quieted). - */ - cmpl $0xfff00000,8(%esp) - jne x_not_minus_Inf - cmpl $0,4(%esp) - jne x_not_minus_Inf - fldz - ret - -x_not_minus_Inf: - fldl 4(%esp) - ret END(exp) --- -/releng/12.0/lib/msun/i387/s_finite.S +++ +/releng/12.0/lib/msun/i387/s_finite.S @@ -39,8 +39,8 @@ ENTRY(finite) movl 8(%esp),%eax - andl $0x7ff00000, %eax - cmpl $0x7ff00000, %eax + addl %eax, %eax + cmpl $0xffe00000, %eax setneb %al - andl $0x000000ff, %eax + movzbl %al, %eax ret END(finite) From owner-freebsd-numerics@freebsd.org Tue Sep 10 15:19:37 2019 Return-Path: Delivered-To: freebsd-numerics@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id C0BA2D983D; Tue, 10 Sep 2019 15:19:37 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id 46STGb2vV0z4YSk; Tue, 10 Sep 2019 15:19:34 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.103] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id C5DAC43E52E; Wed, 11 Sep 2019 01:19:30 +1000 (AEST) Date: Wed, 11 Sep 2019 01:19:29 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stefan Kanthak cc: freebsd-numerics@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: Shorter releng/12.0/lib/msun/i387/e_exp.S and releng/12.0/lib/msun/i387/s_finite.S In-Reply-To: <769CF9CBA0A34DFA92C739C970FA2AAF@H270> Message-ID: <20190910230930.Q1373@besplex.bde.org> References: <769CF9CBA0A34DFA92C739C970FA2AAF@H270> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=D+Q3ErZj c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=kj9zAlcOel0A:10 a=6I5d2MoRAAAA:8 a=RyLDTv5gonMbmEc0r54A:9 a=CjuIK1q_8ugA:10 a=IjZwj45LgO3ly-622nXo:22 X-Rspamd-Queue-Id: 46STGb2vV0z4YSk X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates 211.29.132.246 as permitted sender) smtp.mailfrom=brde@optusnet.com.au X-Spamd-Result: default: False [0.00 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.999,0]; RCVD_COUNT_TWO(0.00)[2]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; R_SPF_ALLOW(0.00)[+ip4:211.29.132.0/23]; FREEMAIL_FROM(0.00)[optusnet.com.au]; RBL_MAILSPIKE_WORST(2.00)[246.132.29.211.rep.mailspike.net : 127.0.0.10]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[optusnet.com.au]; TO_DN_SOME(0.00)[]; BAD_REP_POLICIES(0.10)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; IP_SCORE_FREEMAIL(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; IP_SCORE(0.00)[ip: (-6.96), ipnet: 211.28.0.0/14(-3.27), asn: 4804(-2.40), country: AU(0.01)]; RCVD_NO_TLS_LAST(0.10)[]; RCVD_IN_DNSWL_LOW(-0.10)[246.132.29.211.list.dnswl.org : 127.0.5.1]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[optusnet.com.au]; ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU]; MIME_TRACE(0.00)[0:+]; FROM_EQ_ENVFROM(0.00)[] X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Sep 2019 15:19:37 -0000 On Sun, 8 Sep 2019, Stefan Kanthak wrote: I recently got diagnosed as having serious medical problems and am not sure if I care about this... > here's a patch to remove a conditional branch (and more) from > http://sources.freebsd.org/releng/12.0/lib/msun/i387/e_exp.S > plus a patch to shave some bytes (immediate operands) from > http://sources.freebsd.org/releng/12.0/lib/msun/i387/s_finite.S Anyway, don't bother with these functions. They should never have been written in asm and should go away. Improving the mod and remainder functions is more useful and difficult since they are in asm on amd64 too and there seems to be no better way to implement them on all x86 than to use the i387, but they are still slow. > --- -/releng/12.0/lib/msun/i387/e_exp.S > +++ +/releng/12.0/lib/msun/i387/e_exp.S This went away in my version in 2012 or 2013 together with implementing the long double hyperbolic functions. My version uses the same algorithm in all precision for the hyperbolic functions, but only the long double version was committed (in 2013). The uncommitted parts are faster and more accurate. The same methods work relatively trivially for exp() and expf(), except they are insignificantly faster than better d C version after improving the accuracy of that to be slightly worse than the asm version. I gave up on plans to use the same algorithm in all precisions for exp*(). The long double version is too sophisticated to be fast, after developments in x86 CPUs and compilers made the old Sun C versions fast. Summary of implementations of exp*() on x86: - expf(): use the same C version on amd64 and i386 (Cygnus translation of Sun version with some FreeBSD optimizations). This is fast and is currently little less accurate than it should be - exp(): use the C version on amd64 (Sun version with some FreeBSD optimizations). This is fast and is currently little less accurate than it should be. Use the asm version on i386. This is slow since it switches the rounding precision. It needs the 11 extra bits of precision to barely deliver a double precision result to within 1 ulp. > @@ -45,7 +45,25 @@ > movl 8(%esp),%eax > - andl $0x7fffffff,%eax > - cmpl $0x7ff00000,%eax > - jae x_Inf_or_NaN > + leal (%eax+%eax),%edx > + cmpl $0xffe00000,%edx This removes 1 instruction and 1 dependency, not a branch. Seems reasonable. I would try to do it all in %eax. Check what compilers do for the C version of finite() where this check is clearer and easier to optimize (see below). All of this can be written in C with about 1 line of inline asm, and then compilers can generate better code. > + jb finite This seems to pessimize the branch logic in all cases (as would be done in C by getting __predict_mumble() backwards). The branches were carefully optimized (hopefully not backwards) for the i386 and i486 and this happens to be best for later CPUs too. Taken branches are slower on old CPUs, so the code was arranged to not branch in the usual (finite) case. Newer CPUs only use static branch prediction for the first branch, so the branch organization rarely matters except in large code (not like here) where moving the unusual case far away is good for caching. The static prediction is usuually that the first forward branch is not taken while the first backward branch is taken. So the forward branch to the non-finite case was accidentally correct. > > + /* > + * Return 0 if x is -Inf. Otherwise just return x; when x is Inf > + * this gives Inf, and when x is a NaN this gives the same result > + * as (x + x) (x quieted). > + */ > + cmpl 4(%esp),$0 > + sbbl $0xfff00000,%eax > + je minus_inf > + > +nan: > fldl 4(%esp) > + ret > > +minus_inf: > + fldz > + ret > + > +finite: > + fldl 4(%esp) > + > @@ -80,19 +98,3 @@ > ret > - > -x_Inf_or_NaN: > - /* > - * Return 0 if x is -Inf. Otherwise just return x; when x is Inf > - * this gives Inf, and when x is a NaN this gives the same result > - * as (x + x) (x quieted). > - */ > - cmpl $0xfff00000,8(%esp) > - jne x_not_minus_Inf > - cmpl $0,4(%esp) > - jne x_not_minus_Inf > - fldz > - ret > - > -x_not_minus_Inf: > - fldl 4(%esp) > - ret Details not checked. Space/time efficiency doesn't matter in the non-finite case. But see s_expl.c where the magic expression (-1 / x) is used for the return value to optimize for space (it avoids branches but the division is slow). > END(exp) > > --- -/releng/12.0/lib/msun/i387/s_finite.S > +++ +/releng/12.0/lib/msun/i387/s_finite.S This function has several layers of reasons to not exist. It seems to be only a Sun extension to C90. It is not declared in , but exists in libm as namespace pollution to support old ABIs. C99 has the better API isfinite() which is type-generic. I thought that this was usually inlined. Actually, it seems to be implemented by calling __isfinite(), and not this finite(). libm also has finite() in C. Not inlining this and/or having no way to know if it is efficiently inlined makes it unusable in optimized code. > @@ -39,8 +39,8 @@ > ENTRY(finite) > movl 8(%esp),%eax > - andl $0x7ff00000, %eax > - cmpl $0x7ff00000, %eax > + addl %eax, %eax > + cmpl $0xffe00000, %eax This doesn't reduce the number of instructions or dependencies, so it is less worth doing than similar changes above. > setneb %al This is now broken since setneb is only correct after masking out the unimportant bits. > - andl $0x000000ff, %eax > + movzbl %al, %eax > ret Old bug: extra instructions to avoid the branch might be a pessimization all CPUs: - perhaps cmov is best on newer CPUs, but it is unportable - the extra instructions and possibly movz instead of and are slower on old CPUs, while branch prediction is fast for the usual case on newer CPUs. > END(finite) Check what compilers generate for the C versions of finite() and __isfinite() with -fomit-frame-pointer -march=mumble (especially i386) and __predict_mumble(). The best code (better than the above) is for finite(). Oops, it is only gcc-4.2.1 that generates very bad code for __isfinite(). s_finite.c uses masks and compilers don't reorganize this much. s_isfinite.c uses hard-coded bit-fields which some compilers don't optimize very well. Neither does the above, or the standard access macros using bit-fields -- they tend to produce store-to-load mismatches. Well, I finally found where this is inlined. Use __builtin_isfinite() instead of isfinite(). Then gcc generates a libcall to __builtin_isfinite(), while clang generates inline code which is much larger and often slower than any of the above, but it at least avoids store-to-load mismatches and doesn't misclassify long doubles in unsupported formats as finite when they are actually NaNs. It also generates exceptions for signaling NaNs in some cases, which is arguably wrong. The fpclassify and isfinite, etc., macros in are already too complicated but not nearly complicated enough to decide if the corresponding builtins should be used. Bruce