Date: Thu, 15 Mar 2012 20:55:37 +0900 (JST) From: Maho NAKATA <chat95@mac.com> To: tomdean@speakeasy.org Cc: freebsd-amd64@freebsd.org Subject: Re: Gcc46 and 128 Bit Floating Point Message-ID: <20120315.205537.1682271453232733525.chat95@mac.com> In-Reply-To: <4F4DDCE7.9000008@speakeasy.org> References: <4F4DA398.6070703@speakeasy.org> <20120229161408.G2514@besplex.bde.org> <4F4DDCE7.9000008@speakeasy.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Thomas D. Dean Why not using double-double approach? double-double is poorman's quad math. Using NVIDIA C2050, we can obtain 16GFlops to 26GFlops performance for matrix-matrix multiplication. I have been developing a linear algebra library. http://mplapack.sourceforge.net/ . Thanks Nakata Maho From: "Thomas D. Dean" <tomdean@speakeasy.org> Subject: Re: Gcc46 and 128 Bit Floating Point Date: Wed, 29 Feb 2012 00:08:07 -0800 > On 02/28/12 22:03, Bruce Evans wrote: > >> >> But why would you want it? It is essentially unusable on sparc64, >> since it is several thousand times slower than 80-bit floating point >> on i386. At equal CPU clock speeds, it is only about 1000 times >> slower. >> Most of the factors of 10 are due to fundamental slowness of multi- >> word artithmetic in software and the soft-float implementations not >> being very good (I only tested with the old NetBSD/4.4BSD-derived one. >> This has been replaced by the Hauser one, which has good chances for >> being worse due to its greater generality and correctness, but the old >> one has a lot of slop to improve). A modern x86 is much faster than >> an old sparc64, giving about another factor of 10. 64-bit operations >> are only about this 10 times slower (or more like 3 times slower at >> equal CPU clock speeds) on an old sparc64 as on a not-so-modern core2 >> x86. The gnu libraries might be better. So you could hope for only >> a factor of 100 slowdown on scalar code. But modern x86's can also >> do vector code, and thus be up to 8 times faster for 32-bit floating >> point with AVX. Really good multi-word libraries might be able to >> exploit some vector operations, but I think multi-word operations are >> too seial in nature to get much parallelism with them. > > I have an application that takes 10 days to run on a 4.16GHz Core-i7 > 3930K. No output until it finishes. > > When I first started looking at this, I naively thought the 80-bit FPU > floats were scaled to 128-bits. Would be nice... > > The application uses libgmp, but, about 1/2 to 2/3 of the work will > fit in a 128-bit float. > > I wanted to get 128-bit floating point operations so I could do 2/3 > the work in an FPU. With 80-bits, I can only do 1/3 the work(+-). > > Mostly, this is just "can I do it faster...". Maybe some asm code to > work the inner loops in FPU registers. At some point, hand off to > libgmp. I now think the speed improvement would not be worth the > work. > > Tom Dean > _______________________________________________ > freebsd-amd64@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-amd64 > To unsubscribe, send any mail to > "freebsd-amd64-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120315.205537.1682271453232733525.chat95>