Date: Wed, 13 May 2015 09:27:05 +0100 From: David Chisnall <theraven@FreeBSD.org> To: John-Mark Gurney <jmg@funkthat.com> Cc: Poul-Henning Kamp <phk@phk.freebsd.dk>, Baptiste Daroussin <bapt@freebsd.org>, current@freebsd.org Subject: Re: Increase BUFSIZ to 8192 Message-ID: <A1224018-7540-4C76-91EF-AEA2655E49A8@FreeBSD.org> In-Reply-To: <20150513080342.GE37063@funkthat.com> References: <20150511230635.GA46991@ivaldir.etoilebsd.net> <20150512032307.GP37063@funkthat.com> <14994.1431412293@critter.freebsd.dk> <20150513080342.GE37063@funkthat.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 13 May 2015, at 09:03, John-Mark Gurney <jmg@funkthat.com> wrote: >=20 > Poul-Henning Kamp wrote this message on Tue, May 12, 2015 at 06:31 = +0000: >> -------- >> In message <20150512032307.GP37063@funkthat.com>, John-Mark Gurney = writes: >>=20 >>> Also, you'd probably see even better performance by increasing the >>> size to 64k, [...] >>=20 >> easy: >> 8K on 32bit >> 64k on 64bit >=20 > Sounds good to me... Just for people who care... I did a quick set of > benchmarks on sha256.. This is using my preliminary patch to use sse4 > optimized sha256... But this should be the same for others... >=20 > The numbers in ministat output are the time in seconds it takes my > 3.4GHz AMD A10-5700 APU running HEAD to process a 512MB file, so lower > numbers are better.. I've processed them into easier to read format: > BUFSIZ: 145MB/sec > 8k: 193MB/sec > 16k: 198MB/sec > 64k: 202MB/sec > 128k: 202MB/sec > -t: 211MB/sec It looks like most of the benefit is gained at 16KB. Did you try = running the benchmark with something else running at the same time to = see if there is any advantage in trashing the caches a bit less (simple = case, what happens if you run two instances of the same benchmark at = once)? I suspect that you=E2=80=99re about right anyway - I recently did some = tests while playing with JavaScript FFI generation with a multithreaded = process JavaScript environment calling out to OpenSSL to do SHA = calculations and having each of 8 threads reading in 128KB chunks gave = the fastest performance (Core i7, 4 cores + hyperthreading), with only a = negligible gain over 64KB. In all cases, the JavaScript implementation = was significantly faster than the openssl tool, which used 8KB buffers. David
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A1224018-7540-4C76-91EF-AEA2655E49A8>