Date: Tue, 22 Jul 2014 08:18:18 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Poul-Henning Kamp <phk@phk.freebsd.dk> Cc: Tim Kientzle <tim@kientzle.com>, Ian Lepore <ian@freebsd.org>, arch@freebsd.org, freebsd-arm <freebsd-arm@freebsd.org>, Bruce Evans <brde@optusnet.com.au>, Fabien Thomas <fabient@freebsd.org> Subject: Re: [CFR] mge driver / elf reloc Message-ID: <20140722065334.D3681@besplex.bde.org> In-Reply-To: <38240.1405973083@critter.freebsd.dk> References: <14D22EA6-B73C-47BA-9A86-A957D24F23B8@freebsd.org> <1405810447.85788.41.camel@revolution.hippie.lan> <20140720220514.GP45513@funkthat.com> <F6D53A17-FED0-4F08-BB5B-9F66C5AF5EF6@kientzle.com> <20140720231056.GQ45513@funkthat.com> <9464C309-B390-4A27-981A-E854921B1C98@bsdimp.com> <20140721162559.GS45513@funkthat.com> <467619B1-F530-49AF-91BF-14CA3A31908B@bsdimp.com> <20140722041517.M3229@besplex.bde.org> <38240.1405973083@critter.freebsd.dk>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 21 Jul 2014, Poul-Henning Kamp wrote: > In message <20140722041517.M3229@besplex.bde.org>, Bruce Evans writes: > >> This seems to make most of byteorder(9) a mistake. Just about everything >> can be done better using memcpy() and standard functions in byteorder(3), > > Covered by Bruce' "just about" is that code analysers like Coverity > and FlexeLint cannot see through his proposed version of the macros > on little-endian architectures because of the inline assembler used > to access the exchange instructions which the C-compiler do not expose. Not a problem: - byteorder(9) already uses the bswap macros in many cases (including for all conversions to and from big-endian on little-endian hosts) - we removed the pure exchange instructions from the x86 bswap16() macro some time ago - we still use inline asm and bswap instructions in it for the x86 bswap32() and bswap64(), but this is just to avoid pessimizing for gcc or ifdefs to do it only for gcc. clang doesn't need any asms. - maybe code analyzers don't understand the __packed that I used. That would be either a bug in the analysers or just another unportability in __packed. > Analysers have no such problems with the pure-C versions there today. Further checking on pluto shows that gcc pessimizes all cases in le32dec() and le32enc(). It forgets the alignment of variables in the caller when pointers to them are passed as void *, even though the function are inline. On little-endian arches like ia64, these functions do no byte swapping so they are equivalent to memcpy() (after loading or storing the non-memory parameter to memory) Here they are using _builtin_memcpy(), with namespace bugs fixed: static __inline uint32_t le32dec(const void *_p) { uint32_t _t; __builtin_memcpy(&_t, _p, sizeof(_t)); return (_t); } static __inline void le32enc(void *_p, uint32_t _u) { __builtin_memcpy(_p, &_u, sizeof(_u)); } gcc optimizes these as well as possible. It now sees alignment in the caller. The above assumes a little-endian host. I think adding htole32()'s makes it work for all hosts. This involves ifdefs, but the ifdefs are already there to select htole32(). Hopefully this remains optimal when htole32() is nontrivial. It is probably just a missing optimization that old gcc doesn't see the alignment in the caller for non-builtins too, but the above is also simpler. It moves most expressions to the nontrivial htole32(), etc. For other sizes, s/32/64/, etc. No need for shifts and masks. These are now in htole32(), etc., or implicit in memcpy(). For other endiannesses, s/le/be/. No need to reverse the expressions. The reversal is in htobe32(), etc. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140722065334.D3681>