Date: Fri, 10 May 2002 20:29:49 -0400 (EDT) From: John Baldwin <jhb@FreeBSD.org> To: Andrew Gallatin <gallatin@cs.duke.edu> Cc: obrien@FreeBSD.ORG, alpha@FreeBSD.ORG, Jeff Roberson <jroberson@chesapeake.net> Subject: Re: gcc3 & alpha kernels Message-ID: <XFMail.20020510202949.jhb@FreeBSD.org> In-Reply-To: <15580.13914.162169.930227@grasshopper.cs.duke.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On 10-May-2002 Andrew Gallatin wrote:
>
> Jeff Roberson writes:
> > On Fri, 10 May 2002, Andrew Gallatin wrote:
> >
> > >
> > > Alan Cox writes:
> > > > >
> > > > > Did Jeff see a lockup at boot? Or was this on a running system?
> > > >
> > > > I believe it was at boot time. I can't swear to that, however.
> > > >
> > >
> > > Thanks.. that's the same as me. It would seem that the new compiler
> > > is botching the atomic inlines then.
> > >
> > > Hmm.. According to the disassembly, it looks like the correct
> > > sequences are there, though..
> > >
> > > Drew
> > >
> >
> > It was at boot time. I believe that this was the first time we ever did
> > negative atomic ints on alpha. This was with the old compiler as well. I
> > haven't looked at the gcc3 output.
> >
> > When I looked at the assembly it was pretty clear that the inline wasn't
> > written to support non sign extended values. If you change the prototype
> > the signed int everything works as expected though.
>
> FWIW, this (atomic) is the problem. I can boot a kernel
> where everything but vm_object.o is built with gcc 3.1 and vm_object.o
> is built by the -stable gcc 2.95 compiler.
>
> I'm not sure where I can go from here. David, is this enough
> information for you to use?
>
> I haven't used this kernel that I just built, as I'm not sure that I
> should trust it :-(
Ok, I've made a mostly clean diff between these two as follows:
(I've removed diff's between label symbol names due to different offsets in the
function):
--- one.1 Fri May 10 19:45:36 2002
+++ two.1 Fri May 10 19:45:45 2002
@@ -1,6 +1,6 @@
--------- GCC 3.1--------------------------------------
+--------------- gcc 2.95 ------------------------------------
-0000000000000068 <_vm_object_allocate>:
+00000000000000a0 <_vm_object_allocate>:
: 00 00 bb 27 ldah gp,0(t12)
: 00 00 bd 23 lda gp,0(gp)
: e0 ff de 23 lda sp,-32(sp)
@@ -9,74 +9,83 @@
: 10 00 5e b5 stq s1,16(sp)
: 0a 04 f1 47 mov a1,s1
: 09 04 f2 47 mov a2,s0
- : 30 00 f2 b7 stq zero,48(a2)
- : 30 00 32 20 lda t0,48(a2)
- : 38 00 32 b4 stq t0,56(a2)
- : 10 00 f2 b7 stq zero,16(a2)
- : 10 00 32 20 lda t0,16(a2)
- : 18 00 32 b4 stq t0,24(a2)
- : 5c 00 12 3a stb a0,92(a2)
- : 48 00 29 b6 stq a1,72(s0)
+ : 30 00 e9 b7 stq zero,48(s0)
+ : 01 14 26 41 addq s0,0x30,t0
+ : 38 00 29 b4 stq t0,56(s0)
+ : 10 00 e9 b7 stq zero,16(s0)
+ : 01 14 22 41 addq s0,0x10,t0
+ : 18 00 29 b4 stq t0,24(s0)
+ : 5c 00 09 3a stb a0,92(s0)
+ : 48 00 49 b5 stq s1,72(s0)
This hunk is just using a2 instead of s0 and using lda insetad of addq.
: 01 00 3f 20 lda t0,1
- : 50 00 32 b0 stl t0,80(a2)
- : 5e 00 f2 37 stw zero,94(a2)
- : 01 f0 1f 46 and a0,0xff,t0
- : a1 37 20 40 cmpule t0,0x1,t0
- : 06 00 20 e4 beq t0,d8 <_vm_object_allocate+0x70>
- : 10 04 f2 47 mov a2,a0
+ : 50 00 29 b0 stl t0,80(s0)
+ : 5e 00 e9 37 stw zero,94(s0)
+ : b0 37 00 42 cmpule a0,0x1,a0
+ : 07 00 00 e6 beq a0,110 <_vm_object_allocate+0x70>
+ : 10 04 e9 47 mov s0,a0
More a2 instead of s0. Uses a0 directly instead of making off bits and
using t0. I don't think this is harmful.
: 00 20 3f 22 lda a1,8192
: 00 00 7d a7 ldq t12,0(gp)
: 00 40 5b 6b jsr ra,(t12),104 <_vm_object_allocate+0x64>
: 00 00 ba 27 ldah gp,0(ra)
: 00 00 bd 23 lda gp,0(gp)
+ : 00 00 e0 2f unop
2.95 has an extra nop. Woo.
: a1 77 42 41 cmpule s1,0x13,t0
- : 02 00 5f 41 addl s1,zero,t1
: 13 00 3f 22 lda a1,19
- : d1 04 22 44 cmovne t0,t1,a1
+ : 01 00 20 e4 beq t0,120 <_vm_object_allocate+0x80>
+ : 11 00 5f 41 addl s1,zero,a1
Here 3.1 uses a conditional move to avoid a branch.
: 00 00 7d a4 ldq t2,0(gp)
+ : 00 00 e0 2f unop
+ : 1f 04 ff 47 nop
+ : 00 00 e0 2f unop
2.95 pads in some more nops.
: 00 00 23 30 ldwu t0,0(t2)
: 60 00 29 34 stw t0,96(s0)
- : 22 76 20 48 zapnot t0,0x3,t1
: 21 76 20 48 zapnot t0,0x3,t0
- : 01 04 21 42 addq a1,t0,t0
+ : 22 f6 21 48 zapnot t0,0xf,t1
+ : 01 04 31 40 addq t0,a1,t0
This one is perhaps the most questionable but probably ok. Here 3.1 doesn't
mask off as many bits when copying t0 to t1.
: 01 f0 23 44 and t0,0x1f,t0
: 00 00 83 a8 ldl_l t3,0(t2)
: 24 f6 81 48 zapnot t3,0xf,t3
: a4 05 82 40 cmpeq t3,t1,t3
: 04 00 80 e4 beq t3,168 <_vm_object_allocate+0xc8>
: 04 04 e1 47 mov t0,t3
: 00 00 83 b8 stl_c t3,0(t2)
: 00 00 80 e4 beq t3,164 <_vm_object_allocate+0xc4>
: 00 40 00 60 mb
This is our atomic operation number 1 unchanged.
- : 21 f6 81 48 zapnot t3,0xf,t0
- : f0 ff 3f e4 beq t0,ec <_vm_object_allocate+0x84>
+ : 01 04 e4 47 mov t3,t0
+ : 21 f6 21 48 zapnot t0,0xf,t0
+ : ef ff 3f e4 beq t0,130 <_vm_object_allocate+0x90>
gcc 3.1 is simply smarter about storing the result of the zapnot directly
into t0 to avoid a mov.
: 88 00 e9 b7 stq zero,136(s0)
: 68 00 e9 b7 stq zero,104(s0)
: 70 00 e9 b7 stq zero,112(s0)
: 00 00 7d a4 ldq t2,0(gp)
+ : 00 00 e0 2f unop
+ : 1f 04 ff 47 nop
+ : 00 00 e0 2f unop
More 2.95 padding.
: 00 00 43 a0 ldl t1,0(t2)
- : 7f ff 22 20 lda t0,-129(t1)
+ : 21 35 50 40 subq t1,0x81,t0
lda preferred to subq for some reason..
: 58 00 29 b0 stl t0,88(s0)
- : 01 00 3f 40 addl t0,zero,t0
+ : 22 f6 41 48 zapnot t1,0xf,t1
+ : 21 f6 21 48 zapnot t0,0xf,t0
This I do not grok. Here 3.1 adds zero to t0 and stores the result in t0,
but since it is an addl I guess that does the equivalent of the zapnot
to clear the sign bits. Note that 3.1 only does this for t0 and not t1
however.
: 00 00 83 a8 ldl_l t3,0(t2)
: 24 f6 81 48 zapnot t3,0xf,t3
: a4 05 82 40 cmpeq t3,t1,t3
: 04 00 80 e4 beq t3,1c4 <_vm_object_allocate+0x124>
: 04 04 e1 47 mov t0,t3
: 00 00 83 b8 stl_c t3,0(t2)
: 00 00 80 e4 beq t3,1c0 <_vm_object_allocate+0x120>
: 00 40 00 60 mb
Atomic operation number 2, also unchanged.
- : 21 f6 81 48 zapnot t3,0xf,t0
- : f2 ff 3f e4 beq t0,13c <_vm_object_allocate+0xd4>
+ : 01 04 e4 47 mov t3,t0
+ : 21 f6 21 48 zapnot t0,0xf,t0
+ : f0 ff 3f e4 beq t0,190 <_vm_object_allocate+0xf0>
3.1 again uses zapnot more efficiently to avoid a mov.
: 40 00 29 a0 ldl t0,64(s0)
- : 01 00 21 20 lda t0,1(t0)
+ : 01 34 20 40 addq t0,0x1,t0
lda instead of addq.
: 40 00 29 b0 stl t0,64(s0)
: 00 00 1d a6 ldq a0,0(gp)
: 11 04 ff 47 clr a1
: 00 00 5d a6 ldq a2,0(gp)
: e4 00 7f 22 lda a3,228
: 00 00 7d a7 ldq t12,0(gp)
: 00 40 5b 6b jsr ra,(t12),1f4 <_vm_object_allocate+0x154>
: 00 00 ba 27 ldah gp,0(ra)
: 00 00 bd 23 lda gp,0(gp)
: 00 00 e9 b7 stq zero,0(s0)
@@ -99,3 +108,8 @@
: 10 00 5e a5 ldq s1,16(sp)
: 20 00 de 23 lda sp,32(sp)
: 01 80 fa 6b ret
+ : 00 00 e0 2f unop
+ : 1f 04 ff 47 nop
+ : 00 00 e0 2f unop
+ : 1f 04 ff 47 nop
+ : 00 00 e0 2f unop
2.95 pads out with more nops.
As you can see, I don't see much of anything in this assembly which indicates
the function should execute any differently aside from the two weirdisms
involving t1. Using -fno-strict-aliasing might get rid of those btw, not sure.
--
John Baldwin <jhb@FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!" - http://www.FreeBSD.org/
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-alpha" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.20020510202949.jhb>
