From owner-freebsd-current@freebsd.org Wed Mar 13 18:08:12 2019 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4D1D5153C11C; Wed, 13 Mar 2019 18:08:12 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id C36028D250; Wed, 13 Mar 2019 18:08:09 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id x2DI86bd036009 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Wed, 13 Mar 2019 11:08:06 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id x2DI86sQ036008; Wed, 13 Mar 2019 11:08:06 -0700 (PDT) (envelope-from sgk) Date: Wed, 13 Mar 2019 11:08:06 -0700 From: Steve Kargl To: John Baldwin Cc: freebsd-toolchain@freebsd.org, freebsd-current@freebsd.org Subject: Re: Optimization bug with floating-point? Message-ID: <20190313180806.GA35586@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20190313024506.GA31746@troutmask.apl.washington.edu> <20190313151635.GA34757@troutmask.apl.washington.edu> <20190313164039.GA35340@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.11.2 (2019-01-07) X-Rspamd-Queue-Id: C36028D250 X-Spamd-Bar: ++++ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [4.13 / 15.00]; ARC_NA(0.00)[]; HAS_REPLYTO(0.00)[sgk@troutmask.apl.washington.edu]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; MIME_TRACE(0.00)[0:+]; DMARC_NA(0.00)[washington.edu]; AUTH_NA(1.00)[]; REPLYTO_ADDR_EQ_FROM(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; RCVD_TLS_LAST(0.00)[]; NEURAL_SPAM_SHORT(0.78)[0.782,0]; RCVD_IN_DNSWL_MED(-0.20)[21.76.95.128.list.dnswl.org : 127.0.11.2]; MX_GOOD(-0.01)[cached: troutmask.apl.washington.edu]; NEURAL_SPAM_LONG(0.94)[0.942,0]; NEURAL_SPAM_MEDIUM(0.69)[0.691,0]; R_SPF_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; SUBJECT_ENDS_QUESTION(1.00)[]; ASN(0.00)[asn:73, ipnet:128.95.0.0/16, country:US]; MID_RHS_MATCH_FROM(0.00)[]; IP_SCORE(0.02)[ip: (0.09), ipnet: 128.95.0.0/16(0.08), asn: 73(0.01), country: US(-0.07)] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Mar 2019 18:08:12 -0000 On Wed, Mar 13, 2019 at 10:16:12AM -0700, John Baldwin wrote: > On 3/13/19 9:40 AM, Steve Kargl wrote: > > On Wed, Mar 13, 2019 at 09:32:57AM -0700, John Baldwin wrote: > >> On 3/13/19 8:16 AM, Steve Kargl wrote: > >>> On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote: > >>>> > >>>> gcc8 --version > >>>> gcc8 (FreeBSD Ports Collection) 8.3.0 > >>>> > >>>> gcc8 -fno-builtin -o z a.c -lm && ./z > >>>> gcc8 -O -fno-builtin -o z a.c -lm && ./z > >>>> gcc8 -O2 -fno-builtin -o z a.c -lm && ./z > >>>> gcc8 -O3 -fno-builtin -o z a.c -lm && ./z > >>>> > >>>> Max ULP: 2.297073 > >>>> Count: 0 (# of ULP that exceed 21) > >>>> > >>> > >>> clang agrees with gcc8 if one changes ... > >>> > >>>> int > >>>> main(void) > >>>> { > >>>> double re, im, u, ur, ui; > >>>> float complex f; > >>>> float x, y; > >>> > >>> this line to "volatile float x, y". > >> > >> So it seems to be a regression in clang 7 vs clang 6? > >> > > > > /usr/local/bin/clang60 has the same problem. > > > > % /usr/local/bin/clang60 -o z -O2 a.c -lm && ./z > > Maximum ULP: 23.061242 > > # of ULP > 21: 39 > > > > Adding volatile as in the above "fixes" the problem. > > > > AFAICT, this a i386/387 code generation problem. Perhaps, > > an alignment issue? > > Oh, I misread your earlier e-mail to say that clang60 worked. > > One issue I'm aware of is that clang does not have any support for the > special arrangement FreeBSD/i386 uses where it uses different precision > for registers vs in-memory for some of the floating point types (GCC has > a special hack that is only used on FreeBSD for this but isn't used on > any other OS's). I wonder if that could be a factor? Volatile probably > forces a round trip between memory which might explain why this is the > case. > > I wonder what your test program does on i386 Linux with GCC? I don't have an i386 Linux environment. I tried comparing the assembly generated with and without volatile, but it proves difficult as register numbers are changed between the 2 listings so almost all lines mismatch If I move ranged(), rangef(), dp_csinh(), and ulpfd() into b.c so a.c only contains main(), add appropriate prototypes to a.c, and comment out the printf() statements, I still see the problem. Judging from the diff, there is a difference in the spills and loads in 2 places. % diff -uw without_volatile with_volatile --- without_volatile 2019-03-13 10:51:33.244226000 -0700 +++ with_volatile 2019-03-13 10:51:54.088095000 -0700 @@ -35,11 +35,13 @@ movl %esi, 68(%esp) # 4-byte Spill calll rangef fadds .LCPI0_0 - fstpl 24(%esp) # 8-byte Folded Spill + fstps 28(%esp) calll rangef fadds .LCPI0_1 - fstl 100(%esp) # 8-byte Folded Spill - fldl 24(%esp) # 8-byte Folded Reload + fstps 24(%esp) + flds 28(%esp) + flds 24(%esp) + fxch %st(1) fstps 48(%esp) fstps 52(%esp) movl 48(%esp), %eax @@ -49,13 +51,13 @@ calll csinhf movl %eax, %esi movl %edx, %edi + flds 28(%esp) + flds 24(%esp) leal 72(%esp), %eax movl %eax, 20(%esp) leal 80(%esp), %eax movl %eax, 16(%esp) - fldl 100(%esp) # 8-byte Folded Reload fstpl 8(%esp) - fldl 24(%esp) # 8-byte Folded Reload fstpl (%esp) calll dp_csinh movl %esi, 40(%esp) @@ -75,7 +77,7 @@ fnstsw %ax # kill: def $ah killed $ah killed $ax sahf - fstl 24(%esp) # 8-byte Folded Spill + fstl 100(%esp) # 8-byte Folded Spill ja .LBB0_3 # %bb.2: # %for.body # in Loop: Header=BB0_1 Depth=1 @@ -114,7 +116,7 @@ # in Loop: Header=BB0_1 Depth=1 fstp %st(2) fldl 92(%esp) # 8-byte Folded Reload - fldl 24(%esp) # 8-byte Folded Reload + fldl 100(%esp) # 8-byte Folded Reload fucomp %st(1) fnstsw %ax # kill: def $ah killed $ah killed $ax Adding ieeefp.h to a.c and fpsetprec(FP_PE) in main() produces a massive diff, but still wrong results if volatile is not use. Clang appears to be broken for FP on i386/387. -- Steve