From owner-freebsd-arm@FreeBSD.ORG  Mon Dec 29 06:41:27 2014
Return-Path: <owner-freebsd-arm@FreeBSD.ORG>
Delivered-To: freebsd-arm@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 3494A82B
 for <freebsd-arm@FreeBSD.org>; Mon, 29 Dec 2014 06:41:27 +0000 (UTC)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 1B55C1E5E
 for <freebsd-arm@FreeBSD.org>; Mon, 29 Dec 2014 06:41:27 +0000 (UTC)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id sBT6fQ51063746
 for <freebsd-arm@FreeBSD.org>; Mon, 29 Dec 2014 06:41:26 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-arm@FreeBSD.org
Subject: [Bug 194635] Speed optimisation for framebuffer console driver on
 Raspberry Pi
Date: Mon, 29 Dec 2014 06:41:25 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: arm
X-Bugzilla-Version: 10.0-RELEASE
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Many People
X-Bugzilla-Who: adrian@freebsd.org
X-Bugzilla-Status: Open
X-Bugzilla-Priority: Normal
X-Bugzilla-Assigned-To: freebsd-arm@FreeBSD.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-194635-7-Di4K2gzOsI@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-194635-7@https.bugs.freebsd.org/bugzilla/>
References: <bug-194635-7@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-arm@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: "Porting FreeBSD to ARM processors." <freebsd-arm.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arm>,
 <mailto:freebsd-arm-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arm/>
List-Post: <mailto:freebsd-arm@freebsd.org>
List-Help: <mailto:freebsd-arm-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arm>,
 <mailto:freebsd-arm-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Dec 2014 06:41:27 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194635

--- Comment #8 from Adrian Chadd <adrian@freebsd.org> ---
Ok, so I finally got around to this!

FreeBSD-HEAD is using vt now, not syscons - I'll still merge your stuff at some
point, but your code is for the syscons console. For vt, it exposes a straight
simple mapped framebuffer to the vt code that then uses the code in
sys/dev/vt/hw/fb/ to draw things.

So, it also does mostly what you've done, and it's doing it 8, 16, or 32 bits
at a time depending upon the bpp depth.

So, I figured I'd write something that just mmap'ed /dev/fb0 into userland and
tried 8, 16 and 32 bit stores to see what's faster.

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <time.h>
#include <sys/mman.h>
#include <sys/types.h>

#include <err.h>

//fb0: 1184x624(0x0@0,0) 16bpp

#define WIDTH   1184
#define HEIGHT  624
#define BPP     16

// Not true - need to know "stride".
// but treat this as if it's in bytes
#define FB_SIZE (1184*624*2)

struct timespec
ts_diff(struct timespec start, struct timespec end)
{
        struct timespec temp;

        if ((end.tv_nsec-start.tv_nsec)<0) {
                temp.tv_sec = end.tv_sec-start.tv_sec-1;
                temp.tv_nsec = 1000000000+end.tv_nsec-start.tv_nsec;
        } else {
                temp.tv_sec = end.tv_sec-start.tv_sec;
                temp.tv_nsec = end.tv_nsec-start.tv_nsec;
        }
        return temp;
}

void
fill_1byte(char *fb, char val)
{
        int i;
        for (i = 0; i < FB_SIZE; i++)
                fb[i] = val;
}

void
fill_2byte(char *fb, uint16_t val)
{
        uint16_t *f = (void *) fb;
        int i;

        for (i = 0; i < FB_SIZE / 2; i++) {
                f[i] = val;
        }
}

void
fill_4byte(char *fb, uint32_t val)
{
        uint32_t *f = (void *) fb;
        int i;

        for (i = 0; i < FB_SIZE / 4; i++) {
                f[i] = val;
        }
}

int
main(int argc, const char *argv[])
{
        char *fb = NULL;
        int fd;
        int i;
        struct timespec tv_start, tv_end, tv_diff;

        fd = open("/dev/fb0", O_RDWR);
        if (fd < 0) {
                err(1, "%s: open", __func__);
        }

        fb = mmap(NULL, FB_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
        if (fb == MAP_FAILED) {
                err(1, "%s: mmap", __func__);
        }

        clock_gettime(CLOCK_MONOTONIC_PRECISE, &tv_start);
        for (i = 0; i < 100; i++)
                fill_1byte(fb, i);
        clock_gettime(CLOCK_MONOTONIC_PRECISE, &tv_end);
        tv_diff = ts_diff(tv_start, tv_end);
        printf("8 bit: 100 runs: %lld.%06lld sec\n",
            (long long) tv_diff.tv_sec,
            (long long) tv_diff.tv_nsec);

        clock_gettime(CLOCK_MONOTONIC_PRECISE, &tv_start);
        for (i = 0; i < 100; i++)
                fill_2byte(fb, i);
        clock_gettime(CLOCK_MONOTONIC_PRECISE, &tv_end);
        tv_diff = ts_diff(tv_start, tv_end);
        printf("16 bit: 100 runs: %lld.%06lld sec\n",
            (long long) tv_diff.tv_sec,
            (long long) tv_diff.tv_nsec);

        clock_gettime(CLOCK_MONOTONIC_PRECISE, &tv_start);
        for (i = 0; i < 100; i++)
                fill_4byte(fb, i);
        clock_gettime(CLOCK_MONOTONIC_PRECISE, &tv_end);
        tv_diff = ts_diff(tv_start, tv_end);
        printf("32 bit: 100 runs: %lld.%06lld sec\n",
            (long long) tv_diff.tv_sec,
            (long long) tv_diff.tv_nsec);

        exit(0);
}

.. and the output:

root@raspberry-pi:~ # ./test 
8 bit: 100 runs: 4.15364000 sec
16 bit: 100 runs: 2.107316000 sec
32 bit: 100 runs: 1.12614000 sec
root@raspberry-pi:~ # 

.. so:

* Your work is good and it's still good for people  using syscons, but you
should double-check what's in sys/dev/vt/hw/fb/ to see if there's any
optimisation there;
* To get really fast speed, we should be doing 32 bit stores, not lots of 8 or
16 bit stores. The above test filled the same region of memory but with 8, 16
and 32 bit stores. The difference between 8, 16 and 32 bit is quite
substantial.

-- 
You are receiving this mail because:
You are the assignee for the bug.