From owner-freebsd-arch  Thu Oct 31 21:47:15 2002
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 31B5137B401
	for <arch@FreeBSD.ORG>; Thu, 31 Oct 2002 21:47:12 -0800 (PST)
Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 0A0DA43E4A
	for <arch@FreeBSD.ORG>; Thu, 31 Oct 2002 21:47:11 -0800 (PST)
	(envelope-from bde@zeta.org.au)
Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102])
	by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id QAA20904;
	Fri, 1 Nov 2002 16:46:59 +1100
Date: Fri, 1 Nov 2002 16:58:18 +1100 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@gamplex.bde.org
To: Marcel Moolenaar <marcel@xcllnt.net>
Cc: arch@FreeBSD.ORG
Subject: Re: i386: Bug in prototype for rgs()
In-Reply-To: <20021101015515.GA1707@dhcp01.pn.xcllnt.net>
Message-ID: <20021101155451.G13913-100000@gamplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-arch.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-arch>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-arch>
X-Loop: FreeBSD.ORG

On Thu, 31 Oct 2002, Marcel Moolenaar wrote:

> The prototype for rgs() in sys/i386/include/cpufunc.h claims that
> the result of the function is 32-bits (ie returns an u_int). As

This partly is an (out of date) efficiency hack dating from when gas
generated pessimal code for mov[w] between segment registers and r/m16.
It may still work provided the r/m16 is actually r/m32 (since it just
gives garbage in the upper bits but the upper bits are never really used.
obrien fixed this hack in *.s.

This is partially a non-out-of date efficiency hack for avoiding 16-bit
load/stores of copies of segment registers in 16-bit general registers.
It is more efficient to load/store the corresponding 32-bit registers
unless this causes partial register stalls, and this is correct provided
it doesn't leak kernel bits to userland.

> such, when inlining the function the compiler happy generates
> the following code:
>
>    11ed7:       8c 6d 80                movl   %gs,0xffffff80(%ebp)
>
> or
>
>    12175:       8c ad 14 fd ff ff       movl   %gs,0xfffffd14(%ebp)
>
> where in this case the memory operand is 32-bit. The source location
> that corresponds with this is sys/i386/linux/linux_sysvec.c:331 and
> sys/i386/linux/linux_sysvec.c:451

Seems OK, except we really want a movw.  gcc is smart to convert this
to a direct memory access.  gcc accidentally generates perfect code
although not what the assignment says: it stores to 16 bits of memory
and doesn't change the upper bits, although the assignment together with
the hacks say that it should write garbage to the upper bits.  If gcc
weren't so smart, then it would generate "movw %gs,%ax; movl %eax.mem"
and write the garbage in the upper bits of %eax.

> If you actually look at the frame being created in the debugger, you'll
> see:
>
> Breakpoint 4, linux_sendsig (catcher=0x28091468, sig=11, mask=0xc2827d78,
>     code=30) at ../../../i386/linux/linux_sysvec.c:472
> 472		if (copyout(&frame, fp, sizeof(frame)) != 0) {
> Current language:  auto; currently c
> (kgdb) p /x frame
> $21 = {sf_sig = 0xb, sf_sc = {sc_gs = 0xcdd3002f, sc_fs = 0xf, sc_es = 0x2f,
>     sc_ds = 0x2f, sc_edi = 0x2809aca8, sc_esi = 0xbfbff0e0,
> [snip]
>
> In words: the upper 32-bit of sf_sc.sc_gs are garbage. Different CPU
> implementations behave differently WRT to the upper 16-bits when the
> destination is known to be a 32-bit operand (ie register).

I think the upper bits are never accessed for moves to and from segment
registers (certainly not unless there are bogus operand size prefixes).
Here the upper bits are garbage simply because linux_sendsig() is missing
a bzero() and gcc doesn't write to them.  A bzero() is necessary in case
there are any [other] holes in the struct.

> The point: should we not do (whitespace corrupted diff):
>
> Index: cpufunc.h
> ===================================================================
> RCS file: /home/ncvs/src/sys/i386/include/cpufunc.h,v
> retrieving revision 1.130
> diff -u -r1.130 cpufunc.h
> --- cpufunc.h	22 Sep 2002 04:45:21 -0000	1.130
> +++ cpufunc.h	1 Nov 2002 01:08:45 -0000
> @@ -449,10 +449,10 @@
>  	return (sel);
>  }
>
> -static __inline u_int
> +static __inline u_int16_t
>  rgs(void)
>  {
> -	u_int sel;
> +	u_int16_t sel;
>  	__asm __volatile("movl %%gs,%0" : "=rm" (sel));
   	                     ^ should be 'w' or nothing (I prefer 'w', but
			       obrien used nothing in *.s (due to residual
			       bugs in gas?)
>  	return (sel);
>  }
>
>
> So that the compiler generates:
>
>      5c2:       8c e8                   mov    %gs,%eax
                                             ^       ^^^^ (low qual/wrong)
>      5c4:       0f b7 c0                movzwl %ax,%eax
>      5c7:       89 45 80                mov    %eax,0xffffff80(%ebp)
                                             ^        ^^^^^^^^^^ (low qual)

This is mostly a pessimization.  Cases where a memory operand is wanted
would work better (much the same as now, but for the right reasons) if
they were 16 bits (sc_gs would hen be followed by 16 bits of explicit
padding).

Doing things correctly probably works better for load_gs(), etc.  load_gs()
currently forces an extension of the operand to 32 bits if it is initially
16 bits.  Copies of segment registers in memory are currently always kept
in 32-bit garbage-extended form so that this extension is free (pushing
segment regsisters garbage-extends them in a different way).

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message