Date: Tue, 29 Jul 2003 16:37:08 +1000 (EST) From: Bruce Evans <bde@zeta.org.au> To: Thomas Moestl <t.moestl@tu-bs.de> Cc: Current <freebsd-current@freebsd.org> Subject: Re: dereferencing type-punned pointer will break strict-aliasing rules Message-ID: <20030729154735.G2187@gamplex.bde.org> In-Reply-To: <20030728015900.GB5628@crow.dom2ip.de> References: <7mwue3v6gf.wl@black.imgsrc.co.jp> <20030728015900.GB5628@crow.dom2ip.de>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 28 Jul 2003, Thomas Moestl wrote: > On Mon, 2003/07/28 at 09:30:08 +0900, Jun Kuriyama wrote: > > > > Is this caused by -oS option? > > > > ----- in making BOOTMFS in make release > > cc -c -Os -pipe -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -fformat-extensions -std=c99 -nostdinc -I- -I. -I/usr/src/sys -I/usr/src/sys/dev -I/usr/src/sys/contrib/dev/acpica -I/usr/src/sys/contrib/ipfilter -I/usr/src/sys/contrib/dev/ath -I/usr/src/sys/contrib/dev/ath/freebsd -D_KERNEL -include opt_global.h -fno-common -finline-limit=15000 -mno-align-long-strings -mpreferred-stack-boundary=2 -ffreestanding -Werror /usr/src/sys/geom/geom_dev.c > > /usr/src/sys/geom/geom_dev.c: In function `g_dev_open': > > /usr/src/sys/geom/geom_dev.c:198: warning: dereferencing type-punned pointer will break strict-aliasing rules > > [...] > > Yes, by implying -fstrict-aliasing, so using -fno-strict-aliasing is a > workaround. gcc.info claims that -fstrict-aliasing is implied by -Wall, but this is apparently wrong -- adding -fstrict-aliasing to our standard warning flags gives the same warnings. -Os gives it since -Os is more like -O2 than -O, and gcc.info's claim that -fstrict-aliasing is implied by -O2 is apparently not wrong. Related bug: our warning flags are missing other new gcc warnings. > The problem is caused by the i386 PCPU_GET/PCPU_SET > implementation: > > #define __PCPU_GET(name) ({ \ > __pcpu_type(name) __result; \ > \ > [...] > } else if (sizeof(__result) == 4) { \ > u_int __i; \ > __asm __volatile("movl %%fs:%1,%0" \ > : "=r" (__i) \ > : "m" (*(u_int *)(__pcpu_offset(name)))); \ > __result = *(__pcpu_type(name) *)&__i; \ > [...] > > In this case, the PCPU_GET is used to retrieve curthread, causing > sizeof(__result) to be 4, so the cast at the end of the code snippet > is from a u_int * to struct thread *, and __i is accessed through the > casted pointer, which violates the C99 aliasing rules. > An alternative is to type-pun via a union, which is also a bit ugly, > but explicitly allowed by C99. Patch attached (but only superficially > tested). Uh, this macro is hideous enough without making it larger. I found that using __builtin_memcpy() avoids the warning at little or no cost in object code size (the memcpy gets optimized away in almost the same way as the assignment). Then I tried to improve the macro. I didn't quite succeed -- the smller versions gave slightly larger object code for 5-20 out of 300 object files that depend on pcpu.h. Annotated version: % Index: pcpu.h % =================================================================== % RCS file: /home/ncvs/src/sys/i386/include/pcpu.h,v % retrieving revision 1.35 % diff -u -2 -r1.35 pcpu.h % --- pcpu.h 1 Oct 2002 14:01:58 -0000 1.35 % +++ pcpu.h 28 Jul 2003 14:02:30 -0000 % @@ -78,5 +80,5 @@ % * Evaluates to the address of the per-cpu variable name. % */ % -#define __PCPU_PTR(name) ({ \ % +#define __PCPU_PTR(name) __extension__ ({ \ Unrelated change. This prevents warnings when compiled with certain strict warning flags (-pedantic?). % __pcpu_type(name) *__p; \ % \ % @@ -96,21 +98,21 @@ % \ % if (sizeof(__result) == 1) { \ % - u_char __b; \ % - __asm __volatile("movb %%fs:%1,%0" \ % - : "=r" (__b) \ % - : "m" (*(u_char *)(__pcpu_offset(name)))); \ % - __result = *(__pcpu_type(name) *)&__b; \ % + __pcpu_type(name) __res; \ The temporary variable with a basic type is to avoid gcc getting confused doing reloads in dead code. E.g., if __pcpu_type(name) is "struct timeval", then the type can't be loaded into a byte register, and gcc would die trying since it apparently tries to load things in dead code, especially in asms. However, this problem doesn't seem to affect __PCPU_GET(). There is no problem loading the source operand since only its address can really be loaded, and no problem loading the target operand since the asm does it. There might be a problem unloading the target operand, but there apparently isn't one for dead code. This should be tested with -O0. So we don't need the temporary variable hack and can use the natural type for __PCPU_GET(). This goes most of the way towards making the code the same for all type sizes 1/2/4. I still use a temporary variable and (now literally) duplicated code since it somehow gives slightly better object code. % + __asm __volatile("mov%L0 %%fs:%1,%0" \ "%L0" is to avoid spelling out the operand size. This makes this line literally the same for all cases. % + : "=r" (__res) \ The temporary variable is now spelled `__res' for all cases. % + : "m" (*(__pcpu_type(name) *)(__pcpu_offset(name)))); \ The type is now spelled `__pcpu_type(name)' for all cases. % + __result = *(__pcpu_type(name) *)&__res; \ `__result' and `__res' now have the same type (they could even be the same variable), so this could be a simple assignment now. However, type aliasing as above or using __builtin_memcpy() somehow gives better code. And (IIRC) at least the above gives binary identical code. % } else if (sizeof(__result) == 2) { \ % - u_short __w; \ % - __asm __volatile("movw %%fs:%1,%0" \ % - : "=r" (__w) \ % - : "m" (*(u_short *)(__pcpu_offset(name)))); \ % - __result = *(__pcpu_type(name) *)&__w; \ % + __pcpu_type(name) __res; \ % + __asm __volatile("mov%L0 %%fs:%1,%0" \ % + : "=r" (__res) \ % + : "m" (*(__pcpu_type(name) *)(__pcpu_offset(name)))); \ % + __result = *(__pcpu_type(name) *)&__res; \ % } else if (sizeof(__result) == 4) { \ % - u_int __i; \ % - __asm __volatile("movl %%fs:%1,%0" \ % - : "=r" (__i) \ % - : "m" (*(u_int *)(__pcpu_offset(name)))); \ % - __result = *(__pcpu_type(name) *)&__i; \ % + __pcpu_type(name) __res; \ % + __asm __volatile("mov%L0 %%fs:%1,%0" \ % + : "=r" (__res) \ % + : "m" (*(__pcpu_type(name) *)(__pcpu_offset(name)))); \ % + __result = *(__pcpu_type(name) *)&__res; \ Similar changes for sizes 2 and 4. Collapsing the now-identical code for the size 1/2/4 cases somehow gives worse object code. % } else { \ % __result = *__PCPU_PTR(name); \ % @@ -123,29 +125,29 @@ % * Sets the value of the per-cpu variable name to value val. % */ % -#define __PCPU_SET(name, val) ({ \ % +#define __PCPU_SET(name, val) { \ Warnings could be prevented by declaring the expression-statment as an __extension, but since __PCPU_SET() doesn't return anything, using an expression statement for it is just bogus; don't use one. % __pcpu_type(name) __val = (val); \ % \ % if (sizeof(__val) == 1) { \ % u_char __b; \ Basic types are still needed for __PCPU_SET(). % - __b = *(u_char *)&__val; \ % - __asm __volatile("movb %1,%%fs:%0" \ % - : "=m" (*(u_char *)(__pcpu_offset(name))) \ % + __builtin_memcpy(&__b, &__val, sizeof(__val)); \ This avoids the aliasing problem. % + __asm __volatile("mov%L1 %1,%%fs:%0" \ "%L1" is to avoid spelling out the operand size. This makes this line literally the same for all cases. % + : "=m" (*(__pcpu_type(name) *)(__pcpu_offset(name))) \ A basic type is not needed for this load, so we can use the natural type. This makes this line literally the same for all cases. % : "r" (__b)); \ % } else if (sizeof(__val) == 2) { \ % u_short __w; \ % - __w = *(u_short *)&__val; \ % - __asm __volatile("movw %1,%%fs:%0" \ % - : "=m" (*(u_short *)(__pcpu_offset(name))) \ % + __builtin_memcpy(&__w, &__val, sizeof(__val)); \ % + __asm __volatile("mov%L1 %1,%%fs:%0" \ % + : "=m" (*(__pcpu_type(name) *)(__pcpu_offset(name))) \ % : "r" (__w)); \ % } else if (sizeof(__val) == 4) { \ % u_int __i; \ % - __i = *(u_int *)&__val; \ % - __asm __volatile("movl %1,%%fs:%0" \ % - : "=m" (*(u_int *)(__pcpu_offset(name))) \ % + __builtin_memcpy(&__i, &__val, sizeof(__val)); \ % + __asm __volatile("mov%L1 %1,%%fs:%0" \ % + : "=m" (*(__pcpu_type(name) *)(__pcpu_offset(name))) \ % : "r" (__i)); \ Similarly. The code is now only mostly the same for the size 1/2/4 cases. % } else { \ % *__PCPU_PTR(name) = __val; \ __PCPU_SET() is not used very much, so maybe it should be optimized for code simplicity by always going through the pointer. BTW, how does locking for accesses through the pointer work? I think it can only work due to accidental CPU affinity if Giant is not held. % } \ % -}) % +} Part of not using an expression-statement for __PCPU_SET(). % % #define PCPU_GET(member) __PCPU_GET(pc_ ## member) Unannotated version: %%% Index: pcpu.h =================================================================== RCS file: /home/ncvs/src/sys/i386/include/pcpu.h,v retrieving revision 1.35 diff -u -2 -r1.35 pcpu.h --- pcpu.h 1 Oct 2002 14:01:58 -0000 1.35 +++ pcpu.h 28 Jul 2003 14:02:30 -0000 @@ -78,5 +80,5 @@ * Evaluates to the address of the per-cpu variable name. */ -#define __PCPU_PTR(name) ({ \ +#define __PCPU_PTR(name) __extension__ ({ \ __pcpu_type(name) *__p; \ \ @@ -96,21 +98,21 @@ \ if (sizeof(__result) == 1) { \ - u_char __b; \ - __asm __volatile("movb %%fs:%1,%0" \ - : "=r" (__b) \ - : "m" (*(u_char *)(__pcpu_offset(name)))); \ - __result = *(__pcpu_type(name) *)&__b; \ + __pcpu_type(name) __res; \ + __asm __volatile("mov%L0 %%fs:%1,%0" \ + : "=r" (__res) \ + : "m" (*(__pcpu_type(name) *)(__pcpu_offset(name)))); \ + __result = *(__pcpu_type(name) *)&__res; \ } else if (sizeof(__result) == 2) { \ - u_short __w; \ - __asm __volatile("movw %%fs:%1,%0" \ - : "=r" (__w) \ - : "m" (*(u_short *)(__pcpu_offset(name)))); \ - __result = *(__pcpu_type(name) *)&__w; \ + __pcpu_type(name) __res; \ + __asm __volatile("mov%L0 %%fs:%1,%0" \ + : "=r" (__res) \ + : "m" (*(__pcpu_type(name) *)(__pcpu_offset(name)))); \ + __result = *(__pcpu_type(name) *)&__res; \ } else if (sizeof(__result) == 4) { \ - u_int __i; \ - __asm __volatile("movl %%fs:%1,%0" \ - : "=r" (__i) \ - : "m" (*(u_int *)(__pcpu_offset(name)))); \ - __result = *(__pcpu_type(name) *)&__i; \ + __pcpu_type(name) __res; \ + __asm __volatile("mov%L0 %%fs:%1,%0" \ + : "=r" (__res) \ + : "m" (*(__pcpu_type(name) *)(__pcpu_offset(name)))); \ + __result = *(__pcpu_type(name) *)&__res; \ } else { \ __result = *__PCPU_PTR(name); \ @@ -123,29 +125,29 @@ * Sets the value of the per-cpu variable name to value val. */ -#define __PCPU_SET(name, val) ({ \ +#define __PCPU_SET(name, val) { \ __pcpu_type(name) __val = (val); \ \ if (sizeof(__val) == 1) { \ u_char __b; \ - __b = *(u_char *)&__val; \ - __asm __volatile("movb %1,%%fs:%0" \ - : "=m" (*(u_char *)(__pcpu_offset(name))) \ + __builtin_memcpy(&__b, &__val, sizeof(__val)); \ + __asm __volatile("mov%L1 %1,%%fs:%0" \ + : "=m" (*(__pcpu_type(name) *)(__pcpu_offset(name))) \ : "r" (__b)); \ } else if (sizeof(__val) == 2) { \ u_short __w; \ - __w = *(u_short *)&__val; \ - __asm __volatile("movw %1,%%fs:%0" \ - : "=m" (*(u_short *)(__pcpu_offset(name))) \ + __builtin_memcpy(&__w, &__val, sizeof(__val)); \ + __asm __volatile("mov%L1 %1,%%fs:%0" \ + : "=m" (*(__pcpu_type(name) *)(__pcpu_offset(name))) \ : "r" (__w)); \ } else if (sizeof(__val) == 4) { \ u_int __i; \ - __i = *(u_int *)&__val; \ - __asm __volatile("movl %1,%%fs:%0" \ - : "=m" (*(u_int *)(__pcpu_offset(name))) \ + __builtin_memcpy(&__i, &__val, sizeof(__val)); \ + __asm __volatile("mov%L1 %1,%%fs:%0" \ + : "=m" (*(__pcpu_type(name) *)(__pcpu_offset(name))) \ : "r" (__i)); \ } else { \ *__PCPU_PTR(name) = __val; \ } \ -}) +} #define PCPU_GET(member) __PCPU_GET(pc_ ## member) %%% Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030729154735.G2187>