Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 29 Sep 2015 09:19:42 -0600
From:      Ian Lepore <ian@freebsd.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        freebsd-arm@freebsd.org
Subject:   Re: Shared page and related goodies for ARMv7
Message-ID:  <1443539982.1224.433.camel@freebsd.org>
In-Reply-To: <20150929132332.GH11284@kib.kiev.ua>
References:  <20150929132332.GH11284@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 2015-09-29 at 16:23 +0300, Konstantin Belousov wrote:
> As an exercise to get myself more familiar with the ARM architecture,
> I added the shared page for FreeBSD/ARMv7.  This provides the standard
> features tied to the shared page, in particular, a non-executable stack
> for the compatible binaries, and fast gettimeofday() and clock_gettime()
> functions.  For reference, the measurements on my RPI2 done by
> tools/tools/syscall_timing, show
> for userspace gettimeofday:
> % ./syscall_timing gettimeofday
> Clock resolution: 0.000000053
> test    loop    time    iterations      periteration
> gettimeofday    0       1.009965385     2743838 0.000000368
> gettimeofday    1       1.009899240     2743629 0.000000368
> gettimeofday    2       1.009952833     2538253 0.000000397
> gettimeofday    3       1.009918198     2404272 0.000000420
> gettimeofday    4       1.009875126     2404567 0.000000419
> gettimeofday    5       1.009950700     2405196 0.000000419
> gettimeofday    6       1.009859555     2623534 0.000000384
> gettimeofday    7       1.009911534     2743249 0.000000368
> gettimeofday    8       1.009928618     2743240 0.000000368
> gettimeofday    9       1.009920910     2743227 0.000000368
> for syscall:
> gettimeofday    0       1.009994949     659319  0.000001531
> gettimeofday    1       1.009869846     583343  0.000001731
> gettimeofday    2       1.009899950     583384  0.000001731
> gettimeofday    3       1.009873232     636420  0.000001586
> gettimeofday    4       1.009909639     669715  0.000001507
> gettimeofday    5       1.009941201     669640  0.000001508
> gettimeofday    6       1.009930733     669051  0.000001509
> gettimeofday    7       1.009890005     669064  0.000001509
> gettimeofday    8       1.009915474     669168  0.000001509
> gettimeofday    9       1.009918860     668739  0.000001510
> 
> The patch is pretty much straightforward, interesting details are
> listed below.
> 
> - The shared page is only enabled for ARMv7 kernels.  From my reading of
>   VMSA chapters for ARMv6 and ARMv7, only v7 ensures that there is no
>   cache aliasing for multiple-times mapped page, while v6 requires coloring.
>   Shared page is mapped both at the top of UVA and somewhere in KVA.
> - There is a bug in the generic timer setup, it seems.  The CNTKCTL CP15
>   register is core-private, which means that the in-tree code only sets access
>   permissions on the BSP.  APs CNTKCTL are left in undefined state, possibly
>   set up to some value by loader.  This might allow userspace to reprogram
>   timers on APs.  I fixed this by using rendezvous after SMP is started.
> - I have to add explicit directives to create .note.GNU-stack sections in
>   some __eabi files from libcompiler-rt which are linked into libc.
>   Upstream refused to do global change adding the stack note for all asms,
>   recommending to live with --noexecstack assembler option.  But we cannot
>   do this for files linked into libc.
> - arm64 would require some additions, I did not tested the build.
> 
> It would be useful to test the patch on ARMv6 to ensure that signals and
> gettimeofday() work.

Some things, in no particular order...

I can't do anything with an inline email patch (my mail client destroys
whitespace).  Can you send it as an attachment, or put it somewhere on
freefall or something please?

There is no difference between armv6 and armv7 in our world.  The only
armv6 chip we support is the one used in the original rpi and it has a
16K 4-way L1 cache which means the page coloring issue disappears and we
can treat it the same as an armv7 chip (different cache ops, but the
caches behave the same).

I just skimmed through the patch quickly and the main thing that jumps
out at me is that what you've done works only on rpi2 and aarch64,
because those are the only platforms that support that timer hardware.
(That means I can't test it, but once I get your patch in a usable form
I can have a shot at implementations for other timers).

It's not clear to me that this scheme can even work on most armv7
hardware because of the timer hardware involved.  I think it would mean
giving userland read access to a whole page worth of IO space and in
some cases there are registers in that range where reads have side
effects whose consequences could be dire (such as pending-interrupt
registers).

-- Ian




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1443539982.1224.433.camel>