Date: Fri, 6 Sep 2013 22:29:45 +0800 From: Jia-Shiun Li <jiashiun@gmail.com> To: Zbigniew Bodek <zbb@semihalf.com> Cc: "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org> Subject: Re: stream benchmarking on RPi Message-ID: <CAHNYxxM74n1XaQ5Hf4oi9z9QA3bWC-ivmU8v0Jv-yD%2BgS2dkYQ@mail.gmail.com> In-Reply-To: <CAG7dG%2Bxn9yCCPn30SXWnC6ppYkoWCjTKhBtgwcH-s46wHAdCJA@mail.gmail.com> References: <CAHNYxxNtBcjD_Khq1_pYCMdPwZJmQ0M_GTmcaGWtoLOJkz_86g@mail.gmail.com> <CAG7dG%2Bxn9yCCPn30SXWnC6ppYkoWCjTKhBtgwcH-s46wHAdCJA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Sep 6, 2013 at 6:37 AM, Zbigniew Bodek <zbb@semihalf.com> wrote: > Hello Jia-Shiun. > > Thanks for your effort in testing. > I am actually in the middle of superpages tests and another benchmark and > set of > results will be very helpful especially for comparison. > > Just for the record: did you enable superpages for your kernel? > SP are not yet enabled by default, therefore one needs to set > vm.pmap.sp_enabled to non-zero value in loader.conf (if you are using > loader) > or set this value in src by editing sys/arm/arm/pmap-v6.c -> sp_enabled. > > Nevertheless I've made short tests on Armada XP (clang). > I used two array sizes (default and 2 x default). I also made few runs to > ensure > that the results are steady. > Please check below (improvement in copy can be seen but from what one can > observe via sysctl vm.pmap.section not so many superpages are "requested" > during the test): Yes I confirmed that superpages was not enabled yet. I thought it was on by default. Should have paid more attention. Then the improvement I've seen can also attribute to someone else. Any nominees? ;) after enabling it in loader.rc ("set vm.pmap.sp_enabled=1"), the benchmark did not see big difference. Like your results, differences are visible, but not big. ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 372.6 0.043278 0.042943 0.043590 Scale: 31.1 0.529411 0.514686 0.545614 Add: 69.2 0.363791 0.346574 0.381367 Triad: 27.4 0.909578 0.875739 0.995989 ------------------------------------------------------------- sp did only have a few activities. I suppose it to be more obvious for usages heavily sporting and fragmenting memory, rather than sequential large block accesses like stream did? After several stream runs: # sysctl vm.pmap.section vm.pmap.section.demotions: 0 vm.pmap.section.mappings: 0 vm.pmap.section.p_failures: 120 vm.pmap.section.promotions: 277 BTW I modified the array size from 10m to 1m, otherwise it will allocate more than 200MB/s and run for several minutes. It should not affect result much on processors having speed like this . I was checking if there is anything can be done to improve performance of RPi. Building world takes days and nights. (But works! Ya!) For stream it looks more like being bound to some OS/compiler/etc. usage rather than hard limit of hardware. Let's see what else can be found. Thanks, Jia-Shiun.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAHNYxxM74n1XaQ5Hf4oi9z9QA3bWC-ivmU8v0Jv-yD%2BgS2dkYQ>