Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 7 Sep 2013 22:11:36 +0200
From:      Zbigniew Bodek <zbb@semihalf.com>
To:        Jia-Shiun Li <jiashiun@gmail.com>
Cc:        "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org>
Subject:   Re: stream benchmarking on RPi
Message-ID:  <CAG7dG%2BwBN6qEsYgkVD54-Q3hDy8T%2B6q1p1TvQegj=--F-U-utQ@mail.gmail.com>
In-Reply-To: <CAHNYxxM74n1XaQ5Hf4oi9z9QA3bWC-ivmU8v0Jv-yD%2BgS2dkYQ@mail.gmail.com>
References:  <CAHNYxxNtBcjD_Khq1_pYCMdPwZJmQ0M_GTmcaGWtoLOJkz_86g@mail.gmail.com> <CAG7dG%2Bxn9yCCPn30SXWnC6ppYkoWCjTKhBtgwcH-s46wHAdCJA@mail.gmail.com> <CAHNYxxM74n1XaQ5Hf4oi9z9QA3bWC-ivmU8v0Jv-yD%2BgS2dkYQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
2013/9/6 Jia-Shiun Li <jiashiun@gmail.com>

> On Fri, Sep 6, 2013 at 6:37 AM, Zbigniew Bodek <zbb@semihalf.com> wrote:
> > Hello  Jia-Shiun.
> >
> > Thanks for your effort in testing.
> > I am actually in the middle of superpages tests and another benchmark and
> > set of
> > results will be very helpful especially for comparison.
> >
> > Just for the record: did you enable superpages for your kernel?
> > SP are not yet enabled by default, therefore one needs to set
> > vm.pmap.sp_enabled to non-zero value in loader.conf (if you are using
> > loader)
> > or set this value in src by editing sys/arm/arm/pmap-v6.c -> sp_enabled.
> >
> > Nevertheless I've made short tests on Armada XP (clang).
> > I used two array sizes (default and 2 x default). I also made few runs to
> > ensure
> > that the results are steady.
> > Please check below (improvement in copy can be seen but from what one can
> > observe via sysctl vm.pmap.section not so many superpages are "requested"
> > during the test):
>
> Yes I confirmed that superpages was not enabled yet. I thought it was on
> by default. Should have paid more attention. Then the improvement I've
> seen can also attribute to someone else. Any nominees? ;)
>
> after enabling it in loader.rc ("set vm.pmap.sp_enabled=1"), the
> benchmark did not see big difference. Like your results,
> differences are visible, but not big.
> -------------------------------------------------------------
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:             372.6     0.043278     0.042943     0.043590
> Scale:             31.1     0.529411     0.514686     0.545614
> Add:               69.2     0.363791     0.346574     0.381367
> Triad:             27.4     0.909578     0.875739     0.995989
> -------------------------------------------------------------
>
> sp did only have a few activities. I suppose it to be more obvious for
> usages heavily sporting and fragmenting memory, rather than
> sequential large block accesses like stream did? After several
> stream runs:
> # sysctl vm.pmap.section
> vm.pmap.section.demotions: 0
> vm.pmap.section.mappings: 0
> vm.pmap.section.p_failures: 120
> vm.pmap.section.promotions: 277
>
> BTW I modified the array size from 10m to 1m, otherwise it will allocate
> more than 200MB/s and run for several minutes. It should not affect
> result much on processors having speed like this .
>
> I was checking if there is anything can be done to improve performance
> of RPi. Building world takes days and nights. (But works! Ya!)
> For stream it looks more like being bound to some OS/compiler/etc.
> usage rather than hard limit of hardware. Let's see what else can be found.
>
> Hello Jia-Shiun.

I was looking for similar benchmark that will not depend on floating point
and I found this:
http://alasir.com/software/ramspeed/

The idea seems to be the same as the stream benchmark but you have option
to not use FP.
I've actually made some tests on my ARM and here are the results (default
benchmark settings):
 ./ramsmp -b3
RAMspeed/SMP (GENERIC) v3.5.0 by Rhett M. Hollander and Paul V. Bolotoff,
2002-09

8Gb per pass mode, 2 processes

INTEGER   Copy:      1769.56 MB/s
INTEGER   Scale:     1211.91 MB/s
INTEGER   Add:       1735.90 MB/s
INTEGER   Triad:     1468.31 MB/s
---
INTEGER   AVERAGE:   1546.42 MB/s

./ramsmp -b6
RAMspeed/SMP (GENERIC) v3.5.0 by Rhett M. Hollander and Paul V. Bolotoff,
2002-09

8Gb per pass mode, 2 processes

FL-POINT  Copy:      2003.85 MB/s
FL-POINT  Scale:     104.86 MB/s
FL-POINT  Add:       123.27 MB/s
FL-POINT  Triad:     68.53 MB/s
---
FL-POINT  AVERAGE:   575.13 MB/s

Best regards
Zbigniew Bodek



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAG7dG%2BwBN6qEsYgkVD54-Q3hDy8T%2B6q1p1TvQegj=--F-U-utQ>