Date: Thu, 30 Mar 2017 17:07:42 +0000 From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 218203] Implement AVX2 accelerated Fletcher algorithms Message-ID: <bug-218203-8-3otwsGHPqA@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-218203-8@https.bugs.freebsd.org/bugzilla/> References: <bug-218203-8@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=218203 --- Comment #1 from kungfujesus06@gmail.com --- If desired, I can post my benchmark code. It is using more instructions than the zfsonlinux variant (I used SIMD intrinsics instead of inline assembly). The extra instructions are mostly just shuffling values between registers. After the intermediate sum loop is completed I aliased into the __m256i's instead of doing vmovqdu into memory for the constant multiplications. I suspect the compiler was able to shuffle registers around enough to avoid some trips to memory, but the Intel whitepaper isn't quite fair to itself, as I think they are comparing the best possible performance without SIMD (which is not the original loop, but the loop unrolled 4 times) with their SIMD variant. -- You are receiving this mail because: You are the assignee for the bug.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-218203-8-3otwsGHPqA>
