Date: Fri, 06 Sep 2013 08:49:02 -0600 From: Ian Lepore <ian@FreeBSD.org> To: Jia-Shiun Li <jiashiun@gmail.com> Cc: "freebsd-arm@freebsd.org" <freebsd-arm@FreeBSD.org> Subject: Re: stream benchmarking on RPi Message-ID: <1378478942.1111.448.camel@revolution.hippie.lan> In-Reply-To: <CAHNYxxM74n1XaQ5Hf4oi9z9QA3bWC-ivmU8v0Jv-yD%2BgS2dkYQ@mail.gmail.com> References: <CAHNYxxNtBcjD_Khq1_pYCMdPwZJmQ0M_GTmcaGWtoLOJkz_86g@mail.gmail.com> <CAG7dG%2Bxn9yCCPn30SXWnC6ppYkoWCjTKhBtgwcH-s46wHAdCJA@mail.gmail.com> <CAHNYxxM74n1XaQ5Hf4oi9z9QA3bWC-ivmU8v0Jv-yD%2BgS2dkYQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 2013-09-06 at 22:29 +0800, Jia-Shiun Li wrote: > On Fri, Sep 6, 2013 at 6:37 AM, Zbigniew Bodek <zbb@semihalf.com> wrote: > > Hello Jia-Shiun. > > > > Thanks for your effort in testing. > > I am actually in the middle of superpages tests and another benchmark and > > set of > > results will be very helpful especially for comparison. > > > > Just for the record: did you enable superpages for your kernel? > > SP are not yet enabled by default, therefore one needs to set > > vm.pmap.sp_enabled to non-zero value in loader.conf (if you are using > > loader) > > or set this value in src by editing sys/arm/arm/pmap-v6.c -> sp_enabled. > > > > Nevertheless I've made short tests on Armada XP (clang). > > I used two array sizes (default and 2 x default). I also made few runs to > > ensure > > that the results are steady. > > Please check below (improvement in copy can be seen but from what one can > > observe via sysctl vm.pmap.section not so many superpages are "requested" > > during the test): > > Yes I confirmed that superpages was not enabled yet. I thought it was on > by default. Should have paid more attention. Then the improvement I've > seen can also attribute to someone else. Any nominees? ;) > > after enabling it in loader.rc ("set vm.pmap.sp_enabled=1"), the > benchmark did not see big difference. Like your results, > differences are visible, but not big. > ------------------------------------------------------------- > Function Best Rate MB/s Avg time Min time Max time > Copy: 372.6 0.043278 0.042943 0.043590 > Scale: 31.1 0.529411 0.514686 0.545614 > Add: 69.2 0.363791 0.346574 0.381367 > Triad: 27.4 0.909578 0.875739 0.995989 > ------------------------------------------------------------- > > sp did only have a few activities. I suppose it to be more obvious for > usages heavily sporting and fragmenting memory, rather than > sequential large block accesses like stream did? After several > stream runs: > # sysctl vm.pmap.section > vm.pmap.section.demotions: 0 > vm.pmap.section.mappings: 0 > vm.pmap.section.p_failures: 120 > vm.pmap.section.promotions: 277 > > BTW I modified the array size from 10m to 1m, otherwise it will allocate > more than 200MB/s and run for several minutes. It should not affect > result much on processors having speed like this . > I think we might see better performance gains if we supported 64k superpages rather than 1m sections. The odds of an application allocating a whole megabyte at once and getting all contiguous physical pages for it seem fairly small. > I was checking if there is anything can be done to improve performance > of RPi. Building world takes days and nights. (But works! Ya!) > For stream it looks more like being bound to some OS/compiler/etc. > usage rather than hard limit of hardware. Let's see what else can be found. We are still using software floating point. The hardfloat support is being worked on, but not enabled yet. I have no idea what that benchmark is testing, but Scale, Add, and Triad sound like things that would involve floating point. -- Ian
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1378478942.1111.448.camel>