Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 30 Mar 2017 17:07:42 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 218203] Implement AVX2 accelerated Fletcher algorithms
Message-ID:  <bug-218203-8-3otwsGHPqA@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-218203-8@https.bugs.freebsd.org/bugzilla/>
References:  <bug-218203-8@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=218203

--- Comment #1 from kungfujesus06@gmail.com ---
If desired, I can post my benchmark code.  It is using more instructions than
the zfsonlinux variant (I used SIMD intrinsics instead of inline assembly). 
The extra instructions are mostly just shuffling values between registers. 
After the intermediate sum loop is completed I aliased into the __m256i's
instead of doing vmovqdu into memory for the constant multiplications.  I
suspect the compiler was able to shuffle registers around enough to avoid some
trips to memory, but the Intel whitepaper isn't quite fair to itself, as I
think they are comparing the best possible performance without SIMD (which is
not the original loop, but the loop unrolled 4 times) with their SIMD variant.

-- 
You are receiving this mail because:
You are the assignee for the bug.


Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-218203-8-3otwsGHPqA>