Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 22 Jul 2014 05:27:37 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Warner Losh <imp@bsdimp.com>
Cc:        Tim Kientzle <tim@kientzle.com>, Ian Lepore <ian@freebsd.org>, arch@freebsd.org, freebsd-arm <freebsd-arm@freebsd.org>, Fabien Thomas <fabient@freebsd.org>
Subject:   Re: [CFR] mge driver / elf reloc
Message-ID:  <20140722041517.M3229@besplex.bde.org>
In-Reply-To: <467619B1-F530-49AF-91BF-14CA3A31908B@bsdimp.com>
References:  <14D22EA6-B73C-47BA-9A86-A957D24F23B8@freebsd.org> <1405810447.85788.41.camel@revolution.hippie.lan> <20140720220514.GP45513@funkthat.com> <F6D53A17-FED0-4F08-BB5B-9F66C5AF5EF6@kientzle.com> <20140720231056.GQ45513@funkthat.com> <9464C309-B390-4A27-981A-E854921B1C98@bsdimp.com> <20140721162559.GS45513@funkthat.com> <467619B1-F530-49AF-91BF-14CA3A31908B@bsdimp.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 21 Jul 2014, Warner Losh wrote:

> On Jul 21, 2014, at 10:25 AM, John-Mark Gurney <jmg@funkthat.com> wrote:
>
>> Warner Losh wrote this message on Mon, Jul 21, 2014 at 08:46 -0600:
>>>
>>> On Jul 20, 2014, at 5:10 PM, John-Mark Gurney <jmg@funkthat.com> wrote:
>>>
>>>> Tim Kientzle wrote this message on Sun, Jul 20, 2014 at 15:25 -0700:
>>>>> $ man 9 byteorder
>>>>>
>>>>> is most of what you want, lacking only some aliases to pick
>>>>> the correct macro for native byte order.
>>>>
>>>> Um, those doesn't help if you want native endian order?
>>>
>>> Ummm, yes they do. enc converts from native order. dec decodes to nativ=
e byte
>>
>> No they don't.. If you want to read a value in memory that is native
>> endian order to native endian order (no conversion), they cannot be
>> used w/o using something like below=85
>
> Missed the native to native. bcopy works, but is ugly, as you note below.

Indeed, the API is missing support for the easy case of host load/store.
But this is a feature.

He used memcpy(), not bcopy().  This is a case where using memcpy() instead
of bcopy() is not a style bug.  Using memcpy() is pretty.

>>> order. They are more general cases than the ntoh* functions that are mo=
re traditional
>>> since they also work on byte streams that may not be completely aligned=
 when
>>> sitting in memory. Which is what you are asking for.
>>
>> So, you're saying that I now need to write code like:
>> #if LITTLE_ENDIAN /* or how ever this is spelled*/
>> =09var =3D le32enc(foo);
>> #else
>> =09var =3D be32enc(foo);
>> #endif
>>
>> If I want to read a arch native endian value?  No thank you=85
>
> I=92m not saying that at all.
>
>>>> Also, only the enc/dec functions are documented to work on non-aligned
>>>> address, so that doesn't help in most cases?
>>>
>>> They work on all addresses. They are even documented to work on any add=
ress:
>>>
>>>     The be16enc(), be16dec(), be32enc(), be32dec(), be64enc(), be64dec(=
),
>>>     le16enc(), le16dec(), le32enc(), le32dec(), le64enc(), and le64dec(=
)
>>>     functions encode and decode integers to/from byte strings on any al=
ign-
>>>     ment in big/little endian format.
>>>
>>> So they are quite useful in general. Peeking under the covers at the im=
plementation
>>> also shows they will work for any alignment, so I?m having trouble unde=
rstanding
>>> where this objection is really coming from.
>>
>> There are places where you write code such as:
>> =09int i;
>> =09memcpy(&i, inp, sizeof i);
>> =09/* use i */
>>
>> In order to avoid alignment faults...  None of the functions in byteorde=
r
>> do NO conversion of endian, or you have to know which endian you are but
>> that doesn't work on MI code...

This is good code.  memcpy() is likely to be optimized better than anything
in <sys/endian.h>.  Even with the optimizations in my previous reply.

Compilers do magic things with memcpy(), like turning it into a no-op
if the memory is already in a register and the register doesn't need
to be moved.  Sometimes the register needs to be moved but the move
can be, and is, to another register (perhaps in a different register
set, like XMM instead of a general register on x86).  Doing nothing
much for memcpy()s that are just for a alignment is a special case of
this.  For the above, on all arches, the compiler should load the
int-sized memory at address inp into an int-sized register and then
rename the register to i (i should never hit memory unless it is changed
and the change needs to be stored).  The instructions that are used
for the load depend on what the compiler knows about the alignment of
inp, and the alignment requirements of the arch.  On arches without
strict alignment, like x86 without CR0_AC, there is nothing better
than doing a mislaligned load *(int *)inp the same as you could do by
lieing to the compiler.  On arches with strict alignment, there is
nothing better than loading 1 byte at a time and combining if the
alignment of inp is 1 or unknown.

This seems to make most of byteorder(9) a mistake.  Just about everything
can be done better using memcpy() and standard functions in byteorder(3),
except the (3) functions only go up to 32 bits and have 1980's spellings
[sl] for the integer sizes.  All alignment stuff can be done in host order
using memcpy().  Most byte swapping stuff can be done using ntohl().  The
direction for n/h can be confusing but might need fewer ifdefs than be/le.

le32enc done using more-standard APIs:
    first htole32() in a register.  Only in byteorder(9)
    then store to an aligned uint32_t temporary variable (optimized away)
    then memcpy() to final place

le32dec done using more-standard APIs:
    first memcpy() to aligned uint32_t temporary variable (optimized away)
    then load to a register
    then le32toh on a register.  Only in byteorder(9)

These take 1 line extra each (for the memcpy()).  You already have to do
this for h/l conversions.  E.g.:

htonl to from register to misaligned memory (like le32enc):
    first htonl() in the register.  Standard in byteorder(3) and POSIX.
    then store to an aligned uint32_t temporary variable (optimized away)
    then memcpy() to final place

-fbuiltin is essential for optimizing memcpy(), so memcpy must be
spelled __builtin_memcpy in time-critical code if -fbuiltin might be
turned off.  -fbuiltin used to be only turned off in the kernel for
LINT, but this was broken 13 years ago by using -ffreestanding.  13
years is not long enough for LINT (conf/NOTES) to have caught up with
the change.  clang breaks this even more.  With gcc, you can use
-fbuiltin after -ffreestanding to turn builtins back on, but with clang
-ffreestanding has apparently has precedence so -fbuiltin never turns
builtins back on.  Also, gcc's man page actually documents these options,
and somewhere in the gcc documentation there is a recommendation to
keep -fbuiltin off and only use selected builtins via __builtin_foo.
Turning all builtins back on originally only caused minor problems with
a few builtins like the one for printf.

Bruce
From owner-freebsd-arm@FreeBSD.ORG  Mon Jul 21 19:55:35 2014
Return-Path: <owner-freebsd-arm@FreeBSD.ORG>
Delivered-To: freebsd-arm@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 870E12E3;
 Mon, 21 Jul 2014 19:55:35 +0000 (UTC)
Received: from mail110.syd.optusnet.com.au (mail110.syd.optusnet.com.au
 [211.29.132.97])
 by mx1.freebsd.org (Postfix) with ESMTP id 184C52D2D;
 Mon, 21 Jul 2014 19:55:34 +0000 (UTC)
Received: from c122-106-147-133.carlnfd1.nsw.optusnet.com.au
 (c122-106-147-133.carlnfd1.nsw.optusnet.com.au [122.106.147.133])
 by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id 874907857FB;
 Tue, 22 Jul 2014 05:55:31 +1000 (EST)
Date: Tue, 22 Jul 2014 05:55:30 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
Subject: Re: [CFR] mge driver / elf reloc
In-Reply-To: <20140722022100.S2586@besplex.bde.org>
Message-ID: <20140722053651.V3446@besplex.bde.org>
References: <14D22EA6-B73C-47BA-9A86-A957D24F23B8@freebsd.org>
 <1405810447.85788.41.camel@revolution.hippie.lan>
 <20140720220514.GP45513@funkthat.com>
 <F6D53A17-FED0-4F08-BB5B-9F66C5AF5EF6@kientzle.com>
 <20140720231056.GQ45513@funkthat.com>
 <9464C309-B390-4A27-981A-E854921B1C98@bsdimp.com>
 <1405955048.85788.74.camel@revolution.hippie.lan>
 <20140722022100.S2586@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=dZS5gxne c=1 sm=1 tr=0
 a=7NqvjVvQucbO2RlWB8PEog==:117 a=PO7r1zJSAAAA:8 a=5zrNXb6StqYA:10
 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=VRqNOJ3i96_1dxBxEUkA:9
 a=CjuIK1q_8ugA:10
Cc: arch@freebsd.org, freebsd-arm <freebsd-arm@freebsd.org>,
 Ian Lepore <ian@freebsd.org>
X-BeenThere: freebsd-arm@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: "Porting FreeBSD to ARM processors." <freebsd-arm.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arm>,
 <mailto:freebsd-arm-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arm/>;
List-Post: <mailto:freebsd-arm@freebsd.org>
List-Help: <mailto:freebsd-arm-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arm>,
 <mailto:freebsd-arm-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 21 Jul 2014 19:55:35 -0000

On Tue, 22 Jul 2014, Bruce Evans wrote:

> ...
> This behaviour can probable be exploited in the enc/dec functions.
> When there is no endianness conversion, write 32-bit accesses as
> p->x where p is a pointer to a __packed __aligned(1) struct containing
> x.  The compiler will then produce the above mess on ia64 and a single
> 32-bit access if possible.  No ifdefs required.  When there is an
> endianess conversion, do it on a register with value p->x.
>
> Lightly tested implementation:
>
> % struct _le32 {
> % 	uint32_t _x;
> % } __packed;
> % % #define	le32enc(p, u)	(((struct _le32 *)(p))->_x = (u))
>
> This is simpler than the inline version, and fixes some namespace errors.

Oops, that is not so good.  I just rememembed another optimization that
at least clang does in at least some cases for the inline version: if
the the compiler can see that the pointer is to aligned memory (which it
can sometimes do since the function is inline), then the compiler generates
1 32-bit access.  The __packed attribute in the above defeats this by
forcing to 1-byte alignment.

> If you want avoid the mess on ia64, this can be done at compile time if
> the alignment is known to be more than 1 then.  I think if can often
> be known, as for struct ip (don't generate misaligned struct ip, and
> if you start with one then copy to an aligned one).  The enc/dec
> inline functions could handle this by being converted to macros that
> take the alignment parameter.  If the alignment is not known until
> runtime...don't do that.
>
> To add an alignment parameter to the above, use something like:
>
> #define	_sle32(a)	struct _le32 { uint32_t _x; } __packed __aligned(a)
> #define	le32enc(p, u, a)	(((_sle32(a) *)(p))->_x = (u))
>
> Macroizing the struct declaration to add the alignment parameter to it
> and avoiding backslashes made the code less readable but even shorter.
>
> This was tested on i386 and ia64 with alignments 1 and 4 and and gave
> the expected output in asm.

Also not so good.  If the pointer was 'a'-byte aligned, then the 'a'
parameter recovers by tells the compiler this in a worse way after
__packed clobbers the original alignment.  It is better for the caller
to do any alignment.

In other replies, we discussed using memcpy() to force alignment.  The
above macros could use it to copy to aligned memory and load as if
from that memory less hackishly.  This would work in the inline
functions, but doesn't do a lot that the compiler can't already do.
It is still better for callers to do any alignment.  They can use
memcpy(), or if they know that the pointer is n-byte aligned then they
can cast the pointer an __aligned(n) one where the inline fnction
can see the cast.  The original pointer type might be void * or char *
so memcpy() for alignment would have to copy memory, but _aligned(n)
would be virtual.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140722041517.M3229>