From owner-freebsd-arm@FreeBSD.ORG Fri Sep 6 14:30:16 2013 Return-Path: Delivered-To: freebsd-arm@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 348C7136 for ; Fri, 6 Sep 2013 14:30:16 +0000 (UTC) (envelope-from jiashiun@gmail.com) Received: from mail-oa0-x230.google.com (mail-oa0-x230.google.com [IPv6:2607:f8b0:4003:c02::230]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 01074206C for ; Fri, 6 Sep 2013 14:30:15 +0000 (UTC) Received: by mail-oa0-f48.google.com with SMTP id o17so3906795oag.21 for ; Fri, 06 Sep 2013 07:30:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=q3gJoJVDp35yrqWXkPwnzgRZWaxF7OKNG+Laim/r61w=; b=ru0wFIL6h2Dsfp/2uDLMyhMm/wuQg59VM46QxvgAM5Zug/Z3R/1S5R4N8ncwBTwPWd /NlhoUQr4LLzO1S435hTy5SkX+/nfjk5++LNIa0Kvr1ZSnC5Divq4j/7jzaszntW9aJq jtN4fB6P10kOkwLd3XtSDLdMd5/PKibBvw5MXBI9yfLO3SkdsjoEgYI8nM3XSNSuLbjv wvliIB/Zcqsp0sskVmdj/SroxmaZp1potQqKetki4aS2Ntiv4xCvh4MNjpszaULBVxpw QYXV0yd/b2L4TDzZTk55HT5FiHmhtbuuEQDiBEb/ybM+0MbdDhh157qixC32MFXozVii o9BA== X-Received: by 10.182.81.65 with SMTP id y1mr2027221obx.89.1378477815360; Fri, 06 Sep 2013 07:30:15 -0700 (PDT) MIME-Version: 1.0 Received: by 10.76.131.111 with HTTP; Fri, 6 Sep 2013 07:29:45 -0700 (PDT) In-Reply-To: References: From: Jia-Shiun Li Date: Fri, 6 Sep 2013 22:29:45 +0800 Message-ID: Subject: Re: stream benchmarking on RPi To: Zbigniew Bodek Content-Type: text/plain; charset=UTF-8 Cc: "freebsd-arm@freebsd.org" X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Porting FreeBSD to the StrongARM Processor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Sep 2013 14:30:16 -0000 On Fri, Sep 6, 2013 at 6:37 AM, Zbigniew Bodek wrote: > Hello Jia-Shiun. > > Thanks for your effort in testing. > I am actually in the middle of superpages tests and another benchmark and > set of > results will be very helpful especially for comparison. > > Just for the record: did you enable superpages for your kernel? > SP are not yet enabled by default, therefore one needs to set > vm.pmap.sp_enabled to non-zero value in loader.conf (if you are using > loader) > or set this value in src by editing sys/arm/arm/pmap-v6.c -> sp_enabled. > > Nevertheless I've made short tests on Armada XP (clang). > I used two array sizes (default and 2 x default). I also made few runs to > ensure > that the results are steady. > Please check below (improvement in copy can be seen but from what one can > observe via sysctl vm.pmap.section not so many superpages are "requested" > during the test): Yes I confirmed that superpages was not enabled yet. I thought it was on by default. Should have paid more attention. Then the improvement I've seen can also attribute to someone else. Any nominees? ;) after enabling it in loader.rc ("set vm.pmap.sp_enabled=1"), the benchmark did not see big difference. Like your results, differences are visible, but not big. ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 372.6 0.043278 0.042943 0.043590 Scale: 31.1 0.529411 0.514686 0.545614 Add: 69.2 0.363791 0.346574 0.381367 Triad: 27.4 0.909578 0.875739 0.995989 ------------------------------------------------------------- sp did only have a few activities. I suppose it to be more obvious for usages heavily sporting and fragmenting memory, rather than sequential large block accesses like stream did? After several stream runs: # sysctl vm.pmap.section vm.pmap.section.demotions: 0 vm.pmap.section.mappings: 0 vm.pmap.section.p_failures: 120 vm.pmap.section.promotions: 277 BTW I modified the array size from 10m to 1m, otherwise it will allocate more than 200MB/s and run for several minutes. It should not affect result much on processors having speed like this . I was checking if there is anything can be done to improve performance of RPi. Building world takes days and nights. (But works! Ya!) For stream it looks more like being bound to some OS/compiler/etc. usage rather than hard limit of hardware. Let's see what else can be found. Thanks, Jia-Shiun.