Date: Mon, 11 Mar 2013 08:37:22 +0000 From: David Chisnall <theraven@FreeBSD.org> To: "freebsd-numerics@freebsd.org" <freebsd-numerics@FreeBSD.org> Subject: Fwd: [cfe-dev] More on atlas and clang Message-ID: <8652E076-8710-4766-8FD0-7774D82A1A0B@FreeBSD.org> References: <E49A1576-970A-4613-A09E-28BD3A818225@macports.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Recent benchmarks of Atlas with clang, recently posted to the clang list = attached. Note that the -fvectorize and -fslp-vectorize flags are = enabling the new autovectorisation code in clang, which will be enabled = by default in 3.3. =20 David Begin forwarded message: > Hi there, >=20 > I have recently undertaken another experimental build of Atlas = (http://math-atlas.sourceforge.net =96 briefly speaking, Atlas provides = a highly complete BLAS/LAPACK implementation optimized for the native = architecture of the computer on which it is running) on an AVX machine = (MacMini 2011) using a snapshot of clang 3.3 (r173279) provided by = MacPorts (http://macports.org), with -O3, -fPIC, -fvectorize and = -fslp-vectorize flags.=20 >=20 > I am please to say that: >=20 > 1. The generated AVX code seems fine: a full test session run under an = Atlas-based SciPy didn=92t raise any error; > 2. The performance seems now on-par or even (sometimes surprisingly) = better than the =91reference GCC=92 =96 whatever that means (I was = unable to get in touch with Atlas developer at that time) =96 as = evidenced by the table below: >=20 > Reference clock rate=3D3292Mhz, new rate=3D2300Mhz > Refrenc : % of clock rate achieved by reference install > Present : % of clock rate achieved by present ATLAS install >=20 > single precision double precision > ******************************** = ******************************* > real complex real = complex > --------------- --------------- --------------- = --------------- > Benchmark Refrenc Present Refrenc Present Refrenc Present Refrenc = Present > =3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D = =3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D > kSelMM 1289.9 1407.4 1188.7 1229.8 686.7 826.8 647.4 = 682.1 > kGenMM 198.2 239.7 198.5 237.8 193.9 231.8 196.0 = 233.8 > kMM_NT 193.7 266.4 195.2 192.9 184.2 187.4 188.5 = 197.5 > kMM_TN 198.5 211.1 197.9 226.2 189.8 227.6 189.5 = 223.2 > BIG_MM 1213.8 1346.7 1241.3 1366.5 652.0 789.5 661.4 = 795.8 > kMV_N 224.3 308.1 438.8 617.3 115.9 152.1 205.8 = 283.5 > kMV_T 224.6 313.5 460.3 642.9 123.2 159.6 211.3 = 288.2 > kGER 148.3 192.4 290.2 381.2 73.3 95.6 144.3 = 184.3 >=20 > This is in stark contrast with the previous test where clang were = lagging about 20% beyond the =91reference implementation=92 based on GCC = for lines 2, 3 and 4 where compiler performance matters most. >=20 > So =96 to summarize in two words: kudos folks! >=20 > I will build another version on a Core2Duo machine tonight and see if = the results are consistent. >=20 > Cheers! > Vincent >=20 >=20 > _______________________________________________ > cfe-dev mailing list > cfe-dev@cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8652E076-8710-4766-8FD0-7774D82A1A0B>