Date: Tue, 22 Jul 2014 05:27:37 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Warner Losh <imp@bsdimp.com> Cc: Tim Kientzle <tim@kientzle.com>, Ian Lepore <ian@freebsd.org>, arch@freebsd.org, freebsd-arm <freebsd-arm@freebsd.org>, Fabien Thomas <fabient@freebsd.org> Subject: Re: [CFR] mge driver / elf reloc Message-ID: <20140722041517.M3229@besplex.bde.org> In-Reply-To: <467619B1-F530-49AF-91BF-14CA3A31908B@bsdimp.com> References: <14D22EA6-B73C-47BA-9A86-A957D24F23B8@freebsd.org> <1405810447.85788.41.camel@revolution.hippie.lan> <20140720220514.GP45513@funkthat.com> <F6D53A17-FED0-4F08-BB5B-9F66C5AF5EF6@kientzle.com> <20140720231056.GQ45513@funkthat.com> <9464C309-B390-4A27-981A-E854921B1C98@bsdimp.com> <20140721162559.GS45513@funkthat.com> <467619B1-F530-49AF-91BF-14CA3A31908B@bsdimp.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 21 Jul 2014, Warner Losh wrote: > On Jul 21, 2014, at 10:25 AM, John-Mark Gurney <jmg@funkthat.com> wrote: > >> Warner Losh wrote this message on Mon, Jul 21, 2014 at 08:46 -0600: >>> >>> On Jul 20, 2014, at 5:10 PM, John-Mark Gurney <jmg@funkthat.com> wrote: >>> >>>> Tim Kientzle wrote this message on Sun, Jul 20, 2014 at 15:25 -0700: >>>>> $ man 9 byteorder >>>>> >>>>> is most of what you want, lacking only some aliases to pick >>>>> the correct macro for native byte order. >>>> >>>> Um, those doesn't help if you want native endian order? >>> >>> Ummm, yes they do. enc converts from native order. dec decodes to nativ= e byte >> >> No they don't.. If you want to read a value in memory that is native >> endian order to native endian order (no conversion), they cannot be >> used w/o using something like below=85 > > Missed the native to native. bcopy works, but is ugly, as you note below. Indeed, the API is missing support for the easy case of host load/store. But this is a feature. He used memcpy(), not bcopy(). This is a case where using memcpy() instead of bcopy() is not a style bug. Using memcpy() is pretty. >>> order. They are more general cases than the ntoh* functions that are mo= re traditional >>> since they also work on byte streams that may not be completely aligned= when >>> sitting in memory. Which is what you are asking for. >> >> So, you're saying that I now need to write code like: >> #if LITTLE_ENDIAN /* or how ever this is spelled*/ >> =09var =3D le32enc(foo); >> #else >> =09var =3D be32enc(foo); >> #endif >> >> If I want to read a arch native endian value? No thank you=85 > > I=92m not saying that at all. > >>>> Also, only the enc/dec functions are documented to work on non-aligned >>>> address, so that doesn't help in most cases? >>> >>> They work on all addresses. They are even documented to work on any add= ress: >>> >>> The be16enc(), be16dec(), be32enc(), be32dec(), be64enc(), be64dec(= ), >>> le16enc(), le16dec(), le32enc(), le32dec(), le64enc(), and le64dec(= ) >>> functions encode and decode integers to/from byte strings on any al= ign- >>> ment in big/little endian format. >>> >>> So they are quite useful in general. Peeking under the covers at the im= plementation >>> also shows they will work for any alignment, so I?m having trouble unde= rstanding >>> where this objection is really coming from. >> >> There are places where you write code such as: >> =09int i; >> =09memcpy(&i, inp, sizeof i); >> =09/* use i */ >> >> In order to avoid alignment faults... None of the functions in byteorde= r >> do NO conversion of endian, or you have to know which endian you are but >> that doesn't work on MI code... This is good code. memcpy() is likely to be optimized better than anything in <sys/endian.h>. Even with the optimizations in my previous reply. Compilers do magic things with memcpy(), like turning it into a no-op if the memory is already in a register and the register doesn't need to be moved. Sometimes the register needs to be moved but the move can be, and is, to another register (perhaps in a different register set, like XMM instead of a general register on x86). Doing nothing much for memcpy()s that are just for a alignment is a special case of this. For the above, on all arches, the compiler should load the int-sized memory at address inp into an int-sized register and then rename the register to i (i should never hit memory unless it is changed and the change needs to be stored). The instructions that are used for the load depend on what the compiler knows about the alignment of inp, and the alignment requirements of the arch. On arches without strict alignment, like x86 without CR0_AC, there is nothing better than doing a mislaligned load *(int *)inp the same as you could do by lieing to the compiler. On arches with strict alignment, there is nothing better than loading 1 byte at a time and combining if the alignment of inp is 1 or unknown. This seems to make most of byteorder(9) a mistake. Just about everything can be done better using memcpy() and standard functions in byteorder(3), except the (3) functions only go up to 32 bits and have 1980's spellings [sl] for the integer sizes. All alignment stuff can be done in host order using memcpy(). Most byte swapping stuff can be done using ntohl(). The direction for n/h can be confusing but might need fewer ifdefs than be/le. le32enc done using more-standard APIs: first htole32() in a register. Only in byteorder(9) then store to an aligned uint32_t temporary variable (optimized away) then memcpy() to final place le32dec done using more-standard APIs: first memcpy() to aligned uint32_t temporary variable (optimized away) then load to a register then le32toh on a register. Only in byteorder(9) These take 1 line extra each (for the memcpy()). You already have to do this for h/l conversions. E.g.: htonl to from register to misaligned memory (like le32enc): first htonl() in the register. Standard in byteorder(3) and POSIX. then store to an aligned uint32_t temporary variable (optimized away) then memcpy() to final place -fbuiltin is essential for optimizing memcpy(), so memcpy must be spelled __builtin_memcpy in time-critical code if -fbuiltin might be turned off. -fbuiltin used to be only turned off in the kernel for LINT, but this was broken 13 years ago by using -ffreestanding. 13 years is not long enough for LINT (conf/NOTES) to have caught up with the change. clang breaks this even more. With gcc, you can use -fbuiltin after -ffreestanding to turn builtins back on, but with clang -ffreestanding has apparently has precedence so -fbuiltin never turns builtins back on. Also, gcc's man page actually documents these options, and somewhere in the gcc documentation there is a recommendation to keep -fbuiltin off and only use selected builtins via __builtin_foo. Turning all builtins back on originally only caused minor problems with a few builtins like the one for printf. Bruce From owner-freebsd-arm@FreeBSD.ORG Mon Jul 21 19:55:35 2014 Return-Path: <owner-freebsd-arm@FreeBSD.ORG> Delivered-To: freebsd-arm@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 870E12E3; Mon, 21 Jul 2014 19:55:35 +0000 (UTC) Received: from mail110.syd.optusnet.com.au (mail110.syd.optusnet.com.au [211.29.132.97]) by mx1.freebsd.org (Postfix) with ESMTP id 184C52D2D; Mon, 21 Jul 2014 19:55:34 +0000 (UTC) Received: from c122-106-147-133.carlnfd1.nsw.optusnet.com.au (c122-106-147-133.carlnfd1.nsw.optusnet.com.au [122.106.147.133]) by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id 874907857FB; Tue, 22 Jul 2014 05:55:31 +1000 (EST) Date: Tue, 22 Jul 2014 05:55:30 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> X-X-Sender: bde@besplex.bde.org To: Bruce Evans <brde@optusnet.com.au> Subject: Re: [CFR] mge driver / elf reloc In-Reply-To: <20140722022100.S2586@besplex.bde.org> Message-ID: <20140722053651.V3446@besplex.bde.org> References: <14D22EA6-B73C-47BA-9A86-A957D24F23B8@freebsd.org> <1405810447.85788.41.camel@revolution.hippie.lan> <20140720220514.GP45513@funkthat.com> <F6D53A17-FED0-4F08-BB5B-9F66C5AF5EF6@kientzle.com> <20140720231056.GQ45513@funkthat.com> <9464C309-B390-4A27-981A-E854921B1C98@bsdimp.com> <1405955048.85788.74.camel@revolution.hippie.lan> <20140722022100.S2586@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=dZS5gxne c=1 sm=1 tr=0 a=7NqvjVvQucbO2RlWB8PEog==:117 a=PO7r1zJSAAAA:8 a=5zrNXb6StqYA:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=VRqNOJ3i96_1dxBxEUkA:9 a=CjuIK1q_8ugA:10 Cc: arch@freebsd.org, freebsd-arm <freebsd-arm@freebsd.org>, Ian Lepore <ian@freebsd.org> X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: "Porting FreeBSD to ARM processors." <freebsd-arm.freebsd.org> List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arm>, <mailto:freebsd-arm-request@freebsd.org?subject=unsubscribe> List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arm/> List-Post: <mailto:freebsd-arm@freebsd.org> List-Help: <mailto:freebsd-arm-request@freebsd.org?subject=help> List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arm>, <mailto:freebsd-arm-request@freebsd.org?subject=subscribe> X-List-Received-Date: Mon, 21 Jul 2014 19:55:35 -0000 On Tue, 22 Jul 2014, Bruce Evans wrote: > ... > This behaviour can probable be exploited in the enc/dec functions. > When there is no endianness conversion, write 32-bit accesses as > p->x where p is a pointer to a __packed __aligned(1) struct containing > x. The compiler will then produce the above mess on ia64 and a single > 32-bit access if possible. No ifdefs required. When there is an > endianess conversion, do it on a register with value p->x. > > Lightly tested implementation: > > % struct _le32 { > % uint32_t _x; > % } __packed; > % % #define le32enc(p, u) (((struct _le32 *)(p))->_x = (u)) > > This is simpler than the inline version, and fixes some namespace errors. Oops, that is not so good. I just rememembed another optimization that at least clang does in at least some cases for the inline version: if the the compiler can see that the pointer is to aligned memory (which it can sometimes do since the function is inline), then the compiler generates 1 32-bit access. The __packed attribute in the above defeats this by forcing to 1-byte alignment. > If you want avoid the mess on ia64, this can be done at compile time if > the alignment is known to be more than 1 then. I think if can often > be known, as for struct ip (don't generate misaligned struct ip, and > if you start with one then copy to an aligned one). The enc/dec > inline functions could handle this by being converted to macros that > take the alignment parameter. If the alignment is not known until > runtime...don't do that. > > To add an alignment parameter to the above, use something like: > > #define _sle32(a) struct _le32 { uint32_t _x; } __packed __aligned(a) > #define le32enc(p, u, a) (((_sle32(a) *)(p))->_x = (u)) > > Macroizing the struct declaration to add the alignment parameter to it > and avoiding backslashes made the code less readable but even shorter. > > This was tested on i386 and ia64 with alignments 1 and 4 and and gave > the expected output in asm. Also not so good. If the pointer was 'a'-byte aligned, then the 'a' parameter recovers by tells the compiler this in a worse way after __packed clobbers the original alignment. It is better for the caller to do any alignment. In other replies, we discussed using memcpy() to force alignment. The above macros could use it to copy to aligned memory and load as if from that memory less hackishly. This would work in the inline functions, but doesn't do a lot that the compiler can't already do. It is still better for callers to do any alignment. They can use memcpy(), or if they know that the pointer is n-byte aligned then they can cast the pointer an __aligned(n) one where the inline fnction can see the cast. The original pointer type might be void * or char * so memcpy() for alignment would have to copy memory, but _aligned(n) would be virtual. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140722041517.M3229>