From owner-freebsd-amd64@FreeBSD.ORG  Wed Feb 29 08:08:37 2012
Return-Path: <owner-freebsd-amd64@FreeBSD.ORG>
Delivered-To: freebsd-amd64@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 60E6A106564A
	for <freebsd-amd64@freebsd.org>; Wed, 29 Feb 2012 08:08:37 +0000 (UTC)
	(envelope-from tomdean@speakeasy.org)
Received: from asbnvacz-mailrelay01.megapath.net
	(asbnvacz-mailrelay01.megapath.net [207.145.128.243])
	by mx1.freebsd.org (Postfix) with ESMTP id 16B5C8FC0C
	for <freebsd-amd64@freebsd.org>; Wed, 29 Feb 2012 08:08:36 +0000 (UTC)
Received: from mail4.sea5.speakeasy.net (mail4.sea5.speakeasy.net
	[69.17.117.48])
	by asbnvacz-mailrelay01.megapath.net (Postfix) with ESMTP id
	AF7A9A70141
	for <freebsd-amd64@freebsd.org>; Wed, 29 Feb 2012 03:08:08 -0500 (EST)
Received: (qmail 3310 invoked from network); 29 Feb 2012 08:08:08 -0000
Received: by simscan 1.4.0 ppid: 128, pid: 29677, t: 0.2337s
	scanners: clamav: 0.88.2/m:52/d:10739 spam: 3.0.4
Received: from unknown (HELO P9X79.tddhome) (tomdean@[24.113.107.31])
	(envelope-sender <tomdean@speakeasy.org>)
	by mail4.sea5.speakeasy.net (qmail-ldap-1.03) with SMTP
	for <freebsd-amd64@freebsd.org>; 29 Feb 2012 08:08:07 -0000
Message-ID: <4F4DDCE7.9000008@speakeasy.org>
Date: Wed, 29 Feb 2012 00:08:07 -0800
From: "Thomas D. Dean" <tomdean@speakeasy.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:10.0.2) Gecko/20120228 Thunderbird/10.0.2
MIME-Version: 1.0
To: freebsd-amd64@freebsd.org
References: <4F3EA37F.9010207@speakeasy.org>
	<CAGE5yCpvF0-b1iKAVGbya=fUNaYbGyrpj1PHSQxw4BvycNMLDg@mail.gmail.com>
	<4F3EC0B4.6050107@speakeasy.org> <4F4DA398.6070703@speakeasy.org>
	<20120229161408.G2514@besplex.bde.org>
In-Reply-To: <20120229161408.G2514@besplex.bde.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail4.sea5
X-Spam-Level: 
X-Spam-Status: No, score=0.9 required=8.0 tests=FORGED_RCVD_HELO,
	RATWARE_GECKO_BUILD autolearn=disabled version=3.0.4
Subject: Re: Gcc46 and 128 Bit Floating Point
X-BeenThere: freebsd-amd64@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Porting FreeBSD to the AMD64 platform <freebsd-amd64.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-amd64>,
	<mailto:freebsd-amd64-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-amd64>
List-Post: <mailto:freebsd-amd64@freebsd.org>
List-Help: <mailto:freebsd-amd64-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-amd64>,
	<mailto:freebsd-amd64-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Feb 2012 08:08:37 -0000

On 02/28/12 22:03, Bruce Evans wrote:

>
> But why would you want it? It is essentially unusable on sparc64,
> since it is several thousand times slower than 80-bit floating point
> on i386. At equal CPU clock speeds, it is only about 1000 times slower.
> Most of the factors of 10 are due to fundamental slowness of multi-
> word artithmetic in software and the soft-float implementations not
> being very good (I only tested with the old NetBSD/4.4BSD-derived one.
> This has been replaced by the Hauser one, which has good chances for
> being worse due to its greater generality and correctness, but the old
> one has a lot of slop to improve). A modern x86 is much faster than
> an old sparc64, giving about another factor of 10. 64-bit operations
> are only about this 10 times slower (or more like 3 times slower at
> equal CPU clock speeds) on an old sparc64 as on a not-so-modern core2
> x86. The gnu libraries might be better. So you could hope for only
> a factor of 100 slowdown on scalar code. But modern x86's can also
> do vector code, and thus be up to 8 times faster for 32-bit floating
> point with AVX. Really good multi-word libraries might be able to
> exploit some vector operations, but I think multi-word operations are
> too seial in nature to get much parallelism with them.

I have an application that takes 10 days to run on a 4.16GHz Core-i7 
3930K.  No output until it finishes.

When I first started looking at this, I naively thought the 80-bit FPU 
floats were scaled to 128-bits.  Would be nice...

The application uses libgmp, but, about 1/2 to 2/3 of the work will fit 
in a 128-bit float.

I wanted to get 128-bit floating point operations so I could do 2/3 the 
work in an FPU.  With 80-bits, I can only do 1/3 the work(+-).

Mostly, this is just "can I do it faster...".  Maybe some asm code to 
work the inner loops in FPU registers.  At some point, hand off to 
libgmp.  I now think the speed improvement would not be worth the work.

Tom Dean