Date: Fri, 1 Nov 2002 16:58:18 +1100 (EST) From: Bruce Evans <bde@zeta.org.au> To: Marcel Moolenaar <marcel@xcllnt.net> Cc: arch@FreeBSD.ORG Subject: Re: i386: Bug in prototype for rgs() Message-ID: <20021101155451.G13913-100000@gamplex.bde.org> In-Reply-To: <20021101015515.GA1707@dhcp01.pn.xcllnt.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 31 Oct 2002, Marcel Moolenaar wrote: > The prototype for rgs() in sys/i386/include/cpufunc.h claims that > the result of the function is 32-bits (ie returns an u_int). As This partly is an (out of date) efficiency hack dating from when gas generated pessimal code for mov[w] between segment registers and r/m16. It may still work provided the r/m16 is actually r/m32 (since it just gives garbage in the upper bits but the upper bits are never really used. obrien fixed this hack in *.s. This is partially a non-out-of date efficiency hack for avoiding 16-bit load/stores of copies of segment registers in 16-bit general registers. It is more efficient to load/store the corresponding 32-bit registers unless this causes partial register stalls, and this is correct provided it doesn't leak kernel bits to userland. > such, when inlining the function the compiler happy generates > the following code: > > 11ed7: 8c 6d 80 movl %gs,0xffffff80(%ebp) > > or > > 12175: 8c ad 14 fd ff ff movl %gs,0xfffffd14(%ebp) > > where in this case the memory operand is 32-bit. The source location > that corresponds with this is sys/i386/linux/linux_sysvec.c:331 and > sys/i386/linux/linux_sysvec.c:451 Seems OK, except we really want a movw. gcc is smart to convert this to a direct memory access. gcc accidentally generates perfect code although not what the assignment says: it stores to 16 bits of memory and doesn't change the upper bits, although the assignment together with the hacks say that it should write garbage to the upper bits. If gcc weren't so smart, then it would generate "movw %gs,%ax; movl %eax.mem" and write the garbage in the upper bits of %eax. > If you actually look at the frame being created in the debugger, you'll > see: > > Breakpoint 4, linux_sendsig (catcher=0x28091468, sig=11, mask=0xc2827d78, > code=30) at ../../../i386/linux/linux_sysvec.c:472 > 472 if (copyout(&frame, fp, sizeof(frame)) != 0) { > Current language: auto; currently c > (kgdb) p /x frame > $21 = {sf_sig = 0xb, sf_sc = {sc_gs = 0xcdd3002f, sc_fs = 0xf, sc_es = 0x2f, > sc_ds = 0x2f, sc_edi = 0x2809aca8, sc_esi = 0xbfbff0e0, > [snip] > > In words: the upper 32-bit of sf_sc.sc_gs are garbage. Different CPU > implementations behave differently WRT to the upper 16-bits when the > destination is known to be a 32-bit operand (ie register). I think the upper bits are never accessed for moves to and from segment registers (certainly not unless there are bogus operand size prefixes). Here the upper bits are garbage simply because linux_sendsig() is missing a bzero() and gcc doesn't write to them. A bzero() is necessary in case there are any [other] holes in the struct. > The point: should we not do (whitespace corrupted diff): > > Index: cpufunc.h > =================================================================== > RCS file: /home/ncvs/src/sys/i386/include/cpufunc.h,v > retrieving revision 1.130 > diff -u -r1.130 cpufunc.h > --- cpufunc.h 22 Sep 2002 04:45:21 -0000 1.130 > +++ cpufunc.h 1 Nov 2002 01:08:45 -0000 > @@ -449,10 +449,10 @@ > return (sel); > } > > -static __inline u_int > +static __inline u_int16_t > rgs(void) > { > - u_int sel; > + u_int16_t sel; > __asm __volatile("movl %%gs,%0" : "=rm" (sel)); ^ should be 'w' or nothing (I prefer 'w', but obrien used nothing in *.s (due to residual bugs in gas?) > return (sel); > } > > > So that the compiler generates: > > 5c2: 8c e8 mov %gs,%eax ^ ^^^^ (low qual/wrong) > 5c4: 0f b7 c0 movzwl %ax,%eax > 5c7: 89 45 80 mov %eax,0xffffff80(%ebp) ^ ^^^^^^^^^^ (low qual) This is mostly a pessimization. Cases where a memory operand is wanted would work better (much the same as now, but for the right reasons) if they were 16 bits (sc_gs would hen be followed by 16 bits of explicit padding). Doing things correctly probably works better for load_gs(), etc. load_gs() currently forces an extension of the operand to 32 bits if it is initially 16 bits. Copies of segment registers in memory are currently always kept in 32-bit garbage-extended form so that this extension is free (pushing segment regsisters garbage-extends them in a different way). Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20021101155451.G13913-100000>