Date: Fri, 28 Feb 2014 10:18:05 -0800 From: John-Mark Gurney <jmg@funkthat.com> To: Peter Jeremy <peter@rulingia.com> Cc: arch@freebsd.org Subject: Re: small kernel kernel option... Message-ID: <20140228181804.GQ47921@funkthat.com> In-Reply-To: <20140228114224.GE2705@server.rulingia.com> References: <20140226214816.GB92037@funkthat.com> <20140228114224.GE2705@server.rulingia.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Peter Jeremy wrote this message on Fri, Feb 28, 2014 at 22:42 +1100: > On 2014-Feb-26 13:48:16 -0800, John-Mark Gurney <jmg@funkthat.com> wrote: > >I'm about to commit a change to sha256 to speed it up, but the cost > >of that speed up is an increase in code/data size from just under 1k > >to almost 9k (as measured on amd64)... this increase is from unrolling > >a loop.. > > Out of interest, how much of a speedup and what CPU/compiler > combinations did you test your change on? I ask because several years > ago, I tried about 7 different SHA-256 implementations (basically, all > the C ones I could easily find in FreeBSD and ports I had installed, > as well as one I tweaked myself) across a range of CPUs and compilers. > I found that not only was there a very wide variation in speed between > implementations but that the best on one CPU often ran quite poorly on > another and unrolling loops didn't necessarily help. I did not do an exhaustive search.. I only benchmarked the two easy ones, the one from libmd and the kernel one... I ran my tests on an A10-5700@3.4GHz, Core i7@2GHz (though under MacOSX) and an Opteron-4228 HE@2.8Ghz... All tests were on amd64... There were a few people who also ran the tests for me but I don't remeber what processors they ran on.. In all the cases we saw an improvement, and mostly saw a ~20% improvement by using cperciva's libmd version than the kernel version... These were proven w/ ministat... Part of the reason I didn't to an exhaustive search is that many implementations (OpenSSL, NSS) are very difficult to extract w/o major work.. If you'd like to run my test suite, it can be run by d/l'ing: https://www.funkthat.com/~jmg/sha256.test.unr.tgz just run the script gennumbers and wait for a while... it'll compile and validate and perform the tests.. This also includes the tests to test various numbers of loop unrolling which exposed some weird timing behavior... Enough that the only option is either unrolled or completely rolled, no partial loops.. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140228181804.GQ47921>