From owner-freebsd-arch@FreeBSD.ORG Mon Jul 21 22:18:30 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DB697BFC; Mon, 21 Jul 2014 22:18:29 +0000 (UTC) Received: from mail108.syd.optusnet.com.au (mail108.syd.optusnet.com.au [211.29.132.59]) by mx1.freebsd.org (Postfix) with ESMTP id 6C5732AC0; Mon, 21 Jul 2014 22:18:28 +0000 (UTC) Received: from c122-106-147-133.carlnfd1.nsw.optusnet.com.au (c122-106-147-133.carlnfd1.nsw.optusnet.com.au [122.106.147.133]) by mail108.syd.optusnet.com.au (Postfix) with ESMTPS id 04CF61A2E38; Tue, 22 Jul 2014 08:18:19 +1000 (EST) Date: Tue, 22 Jul 2014 08:18:18 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Poul-Henning Kamp Subject: Re: [CFR] mge driver / elf reloc In-Reply-To: <38240.1405973083@critter.freebsd.dk> Message-ID: <20140722065334.D3681@besplex.bde.org> References: <14D22EA6-B73C-47BA-9A86-A957D24F23B8@freebsd.org> <1405810447.85788.41.camel@revolution.hippie.lan> <20140720220514.GP45513@funkthat.com> <20140720231056.GQ45513@funkthat.com> <9464C309-B390-4A27-981A-E854921B1C98@bsdimp.com> <20140721162559.GS45513@funkthat.com> <467619B1-F530-49AF-91BF-14CA3A31908B@bsdimp.com> <20140722041517.M3229@besplex.bde.org> <38240.1405973083@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=eojmkOZX c=1 sm=1 tr=0 a=7NqvjVvQucbO2RlWB8PEog==:117 a=PO7r1zJSAAAA:8 a=5zrNXb6StqYA:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=wuhp96_2kGOEJxqg69kA:9 a=CjuIK1q_8ugA:10 Cc: Tim Kientzle , Ian Lepore , John-Mark Gurney , arch@freebsd.org, freebsd-arm , Fabien Thomas X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Jul 2014 22:18:30 -0000 On Mon, 21 Jul 2014, Poul-Henning Kamp wrote: > In message <20140722041517.M3229@besplex.bde.org>, Bruce Evans writes: > >> This seems to make most of byteorder(9) a mistake. Just about everything >> can be done better using memcpy() and standard functions in byteorder(3), > > Covered by Bruce' "just about" is that code analysers like Coverity > and FlexeLint cannot see through his proposed version of the macros > on little-endian architectures because of the inline assembler used > to access the exchange instructions which the C-compiler do not expose. Not a problem: - byteorder(9) already uses the bswap macros in many cases (including for all conversions to and from big-endian on little-endian hosts) - we removed the pure exchange instructions from the x86 bswap16() macro some time ago - we still use inline asm and bswap instructions in it for the x86 bswap32() and bswap64(), but this is just to avoid pessimizing for gcc or ifdefs to do it only for gcc. clang doesn't need any asms. - maybe code analyzers don't understand the __packed that I used. That would be either a bug in the analysers or just another unportability in __packed. > Analysers have no such problems with the pure-C versions there today. Further checking on pluto shows that gcc pessimizes all cases in le32dec() and le32enc(). It forgets the alignment of variables in the caller when pointers to them are passed as void *, even though the function are inline. On little-endian arches like ia64, these functions do no byte swapping so they are equivalent to memcpy() (after loading or storing the non-memory parameter to memory) Here they are using _builtin_memcpy(), with namespace bugs fixed: static __inline uint32_t le32dec(const void *_p) { uint32_t _t; __builtin_memcpy(&_t, _p, sizeof(_t)); return (_t); } static __inline void le32enc(void *_p, uint32_t _u) { __builtin_memcpy(_p, &_u, sizeof(_u)); } gcc optimizes these as well as possible. It now sees alignment in the caller. The above assumes a little-endian host. I think adding htole32()'s makes it work for all hosts. This involves ifdefs, but the ifdefs are already there to select htole32(). Hopefully this remains optimal when htole32() is nontrivial. It is probably just a missing optimization that old gcc doesn't see the alignment in the caller for non-builtins too, but the above is also simpler. It moves most expressions to the nontrivial htole32(), etc. For other sizes, s/32/64/, etc. No need for shifts and masks. These are now in htole32(), etc., or implicit in memcpy(). For other endiannesses, s/le/be/. No need to reverse the expressions. The reversal is in htobe32(), etc. Bruce