Date: Mon, 11 Mar 2013 08:37:22 +0000 From: David Chisnall <theraven@FreeBSD.org> To: "freebsd-numerics@freebsd.org" <freebsd-numerics@FreeBSD.org> Subject: Fwd: [cfe-dev] More on atlas and clang Message-ID: <8652E076-8710-4766-8FD0-7774D82A1A0B@FreeBSD.org> References: <E49A1576-970A-4613-A09E-28BD3A818225@macports.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Recent benchmarks of Atlas with clang, recently posted to the clang list attached. Note that the -fvectorize and -fslp-vectorize flags are enabling the new autovectorisation code in clang, which will be enabled by default in 3.3. David Begin forwarded message: > Hi there, > > I have recently undertaken another experimental build of Atlas (http://math-atlas.sourceforge.net – briefly speaking, Atlas provides a highly complete BLAS/LAPACK implementation optimized for the native architecture of the computer on which it is running) on an AVX machine (MacMini 2011) using a snapshot of clang 3.3 (r173279) provided by MacPorts (http://macports.org), with -O3, -fPIC, -fvectorize and -fslp-vectorize flags. > > I am please to say that: > > 1. The generated AVX code seems fine: a full test session run under an Atlas-based SciPy didn’t raise any error; > 2. The performance seems now on-par or even (sometimes surprisingly) better than the ‘reference GCC’ – whatever that means (I was unable to get in touch with Atlas developer at that time) – as evidenced by the table below: > > Reference clock rate=3292Mhz, new rate=2300Mhz > Refrenc : % of clock rate achieved by reference install > Present : % of clock rate achieved by present ATLAS install > > single precision double precision > ******************************** ******************************* > real complex real complex > --------------- --------------- --------------- --------------- > Benchmark Refrenc Present Refrenc Present Refrenc Present Refrenc Present > ========= ======= ======= ======= ======= ======= ======= ======= ======= > kSelMM 1289.9 1407.4 1188.7 1229.8 686.7 826.8 647.4 682.1 > kGenMM 198.2 239.7 198.5 237.8 193.9 231.8 196.0 233.8 > kMM_NT 193.7 266.4 195.2 192.9 184.2 187.4 188.5 197.5 > kMM_TN 198.5 211.1 197.9 226.2 189.8 227.6 189.5 223.2 > BIG_MM 1213.8 1346.7 1241.3 1366.5 652.0 789.5 661.4 795.8 > kMV_N 224.3 308.1 438.8 617.3 115.9 152.1 205.8 283.5 > kMV_T 224.6 313.5 460.3 642.9 123.2 159.6 211.3 288.2 > kGER 148.3 192.4 290.2 381.2 73.3 95.6 144.3 184.3 > > This is in stark contrast with the previous test where clang were lagging about 20% beyond the ‘reference implementation’ based on GCC for lines 2, 3 and 4 where compiler performance matters most. > > So – to summarize in two words: kudos folks! > > I will build another version on a Core2Duo machine tonight and see if the results are consistent. > > Cheers! > Vincent > > > _______________________________________________ > cfe-dev mailing list > cfe-dev@cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8652E076-8710-4766-8FD0-7774D82A1A0B>
