From owner-freebsd-alpha  Fri May 10 17:30:45 2002
Delivered-To: freebsd-alpha@freebsd.org
Received: from mail.speakeasy.net (mail12.speakeasy.net [216.254.0.212])
	by hub.freebsd.org (Postfix) with ESMTP id 6936C37B403
	for <alpha@FreeBSD.ORG>; Fri, 10 May 2002 17:29:58 -0700 (PDT)
Received: (qmail 14001 invoked from network); 11 May 2002 00:29:57 -0000
Received: from unknown (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender <jhb@FreeBSD.org>)
          by mail12.speakeasy.net (qmail-ldap-1.03) with DES-CBC3-SHA encrypted SMTP
          for <jroberson@chesapeake.net>; 11 May 2002 00:29:57 -0000
Received: from laptop.baldwin.cx (laptop.baldwin.cx [192.168.0.4])
	by server.baldwin.cx (8.11.6/8.11.6) with ESMTP id g4B0TqF44742;
	Fri, 10 May 2002 20:29:52 -0400 (EDT)
	(envelope-from jhb@FreeBSD.org)
Message-ID: <XFMail.20020510202949.jhb@FreeBSD.org>
X-Mailer: XFMail 1.5.2 on FreeBSD
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <15580.13914.162169.930227@grasshopper.cs.duke.edu>
Date: Fri, 10 May 2002 20:29:49 -0400 (EDT)
From: John Baldwin <jhb@FreeBSD.org>
To: Andrew Gallatin <gallatin@cs.duke.edu>
Subject: Re: gcc3 & alpha kernels
Cc: obrien@FreeBSD.ORG, alpha@FreeBSD.ORG,
	Jeff Roberson <jroberson@chesapeake.net>
Sender: owner-freebsd-alpha@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-alpha.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-alpha>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-alpha>
X-Loop: FreeBSD.org


On 10-May-2002 Andrew Gallatin wrote:
> 
> Jeff Roberson writes:
>  > On Fri, 10 May 2002, Andrew Gallatin wrote:
>  > 
>  > >
>  > > Alan Cox writes:
>  > >  > >
>  > >  > > Did Jeff see a lockup at boot?  Or was this on a running system?
>  > >  >
>  > >  > I believe it was at boot time.  I can't swear to that, however.
>  > >  >
>  > >
>  > > Thanks..  that's the same as me.  It would seem that the new compiler
>  > > is botching the atomic inlines then.
>  > >
>  > > Hmm..  According to the disassembly, it looks like the correct
>  > > sequences are there, though..
>  > >
>  > > Drew
>  > >
>  > 
>  > It was at boot time.  I believe that this was the first time we ever did
>  > negative atomic ints on alpha.  This was with the old compiler as well.  I
>  > haven't looked at the gcc3 output.
>  > 
>  > When I looked at the assembly it was pretty clear that the inline wasn't
>  > written to support non sign extended values.  If you change the prototype
>  > the signed int everything works as expected though.
> 
> FWIW, this (atomic) is the problem.  I can boot a kernel
> where everything but vm_object.o is built with gcc 3.1 and vm_object.o
> is built by the -stable gcc 2.95 compiler.
> 
> I'm not sure where I can go from here.  David, is this enough
> information for you to use?
> 
> I haven't used this kernel that I just built, as I'm not sure that I
> should trust it :-(

Ok, I've made a mostly clean diff between these two as follows:

(I've removed diff's between label symbol names due to different offsets in the
function):

--- one.1       Fri May 10 19:45:36 2002
+++ two.1       Fri May 10 19:45:45 2002
@@ -1,6 +1,6 @@
--------- GCC 3.1--------------------------------------
+--------------- gcc 2.95 ------------------------------------
 
-0000000000000068 <_vm_object_allocate>:
+00000000000000a0 <_vm_object_allocate>:
         :      00 00 bb 27     ldah    gp,0(t12)
         :      00 00 bd 23     lda     gp,0(gp)
         :      e0 ff de 23     lda     sp,-32(sp)
@@ -9,74 +9,83 @@
         :      10 00 5e b5     stq     s1,16(sp)
         :      0a 04 f1 47     mov     a1,s1
         :      09 04 f2 47     mov     a2,s0
-        :      30 00 f2 b7     stq     zero,48(a2)
-        :      30 00 32 20     lda     t0,48(a2)
-        :      38 00 32 b4     stq     t0,56(a2)
-        :      10 00 f2 b7     stq     zero,16(a2)
-        :      10 00 32 20     lda     t0,16(a2)
-        :      18 00 32 b4     stq     t0,24(a2)
-        :      5c 00 12 3a     stb     a0,92(a2)
-        :      48 00 29 b6     stq     a1,72(s0)
+        :      30 00 e9 b7     stq     zero,48(s0)
+        :      01 14 26 41     addq    s0,0x30,t0
+        :      38 00 29 b4     stq     t0,56(s0)
+        :      10 00 e9 b7     stq     zero,16(s0)
+        :      01 14 22 41     addq    s0,0x10,t0
+        :      18 00 29 b4     stq     t0,24(s0)
+        :      5c 00 09 3a     stb     a0,92(s0)
+        :      48 00 49 b5     stq     s1,72(s0)

This hunk is just using a2 instead of s0 and using lda insetad of addq.

         :      01 00 3f 20     lda     t0,1
-        :      50 00 32 b0     stl     t0,80(a2)
-        :      5e 00 f2 37     stw     zero,94(a2)
-        :      01 f0 1f 46     and     a0,0xff,t0
-        :      a1 37 20 40     cmpule  t0,0x1,t0
-        :      06 00 20 e4     beq     t0,d8 <_vm_object_allocate+0x70>
-        :      10 04 f2 47     mov     a2,a0
+        :      50 00 29 b0     stl     t0,80(s0)
+        :      5e 00 e9 37     stw     zero,94(s0)
+        :      b0 37 00 42     cmpule  a0,0x1,a0
+        :      07 00 00 e6     beq     a0,110 <_vm_object_allocate+0x70>
+        :      10 04 e9 47     mov     s0,a0

More a2 instead of s0.  Uses a0 directly instead of making off bits and
using t0.  I don't think this is harmful.

         :      00 20 3f 22     lda     a1,8192
         :      00 00 7d a7     ldq     t12,0(gp)
         :      00 40 5b 6b     jsr     ra,(t12),104 <_vm_object_allocate+0x64>
         :      00 00 ba 27     ldah    gp,0(ra)
         :      00 00 bd 23     lda     gp,0(gp)
+        :      00 00 e0 2f     unop    

2.95 has an extra nop.  Woo.

         :      a1 77 42 41     cmpule  s1,0x13,t0
-        :      02 00 5f 41     addl    s1,zero,t1
         :      13 00 3f 22     lda     a1,19
-        :      d1 04 22 44     cmovne  t0,t1,a1
+        :      01 00 20 e4     beq     t0,120 <_vm_object_allocate+0x80>
+        :      11 00 5f 41     addl    s1,zero,a1

Here 3.1 uses a conditional move to avoid a branch.

         :      00 00 7d a4     ldq     t2,0(gp)
+        :      00 00 e0 2f     unop    
+        :      1f 04 ff 47     nop     
+        :      00 00 e0 2f     unop    

2.95 pads in some more nops.

         :      00 00 23 30     ldwu    t0,0(t2)
         :      60 00 29 34     stw     t0,96(s0)
-        :      22 76 20 48     zapnot  t0,0x3,t1
         :      21 76 20 48     zapnot  t0,0x3,t0
-        :      01 04 21 42     addq    a1,t0,t0
+        :      22 f6 21 48     zapnot  t0,0xf,t1
+        :      01 04 31 40     addq    t0,a1,t0

This one is perhaps the most questionable but probably ok.  Here 3.1 doesn't
mask off as many bits when copying t0 to t1.

         :      01 f0 23 44     and     t0,0x1f,t0
         :      00 00 83 a8     ldl_l   t3,0(t2)
         :      24 f6 81 48     zapnot  t3,0xf,t3
         :      a4 05 82 40     cmpeq   t3,t1,t3
         :      04 00 80 e4     beq     t3,168 <_vm_object_allocate+0xc8>
         :      04 04 e1 47     mov     t0,t3
         :      00 00 83 b8     stl_c   t3,0(t2)
         :      00 00 80 e4     beq     t3,164 <_vm_object_allocate+0xc4>
         :      00 40 00 60     mb

This is our atomic operation number 1 unchanged.

-        :      21 f6 81 48     zapnot  t3,0xf,t0
-        :      f0 ff 3f e4     beq     t0,ec <_vm_object_allocate+0x84>
+        :      01 04 e4 47     mov     t3,t0
+        :      21 f6 21 48     zapnot  t0,0xf,t0
+        :      ef ff 3f e4     beq     t0,130 <_vm_object_allocate+0x90>

gcc 3.1 is simply smarter about storing the result of the zapnot directly
into t0 to avoid a mov.

         :      88 00 e9 b7     stq     zero,136(s0)
         :      68 00 e9 b7     stq     zero,104(s0)
         :      70 00 e9 b7     stq     zero,112(s0)
         :      00 00 7d a4     ldq     t2,0(gp)
+        :      00 00 e0 2f     unop    
+        :      1f 04 ff 47     nop     
+        :      00 00 e0 2f     unop    

More 2.95 padding.

         :      00 00 43 a0     ldl     t1,0(t2)
-        :      7f ff 22 20     lda     t0,-129(t1)
+        :      21 35 50 40     subq    t1,0x81,t0

lda preferred to subq for some reason..

         :      58 00 29 b0     stl     t0,88(s0)
-        :      01 00 3f 40     addl    t0,zero,t0
+        :      22 f6 41 48     zapnot  t1,0xf,t1
+        :      21 f6 21 48     zapnot  t0,0xf,t0

This I do not grok.  Here 3.1 adds zero to t0 and stores the result in t0,
but since it is an addl I guess that does the equivalent of the zapnot
to clear the sign bits.  Note that 3.1 only does this for t0 and not t1
however.

         :      00 00 83 a8     ldl_l   t3,0(t2)
         :      24 f6 81 48     zapnot  t3,0xf,t3
         :      a4 05 82 40     cmpeq   t3,t1,t3
         :      04 00 80 e4     beq     t3,1c4 <_vm_object_allocate+0x124>
         :      04 04 e1 47     mov     t0,t3
         :      00 00 83 b8     stl_c   t3,0(t2)
         :      00 00 80 e4     beq     t3,1c0 <_vm_object_allocate+0x120>
         :      00 40 00 60     mb

Atomic operation number 2, also unchanged.

-        :      21 f6 81 48     zapnot  t3,0xf,t0
-        :      f2 ff 3f e4     beq     t0,13c <_vm_object_allocate+0xd4>
+        :      01 04 e4 47     mov     t3,t0
+        :      21 f6 21 48     zapnot  t0,0xf,t0
+        :      f0 ff 3f e4     beq     t0,190 <_vm_object_allocate+0xf0>

3.1 again uses zapnot more efficiently to avoid a mov.

         :      40 00 29 a0     ldl     t0,64(s0)
-        :      01 00 21 20     lda     t0,1(t0)
+        :      01 34 20 40     addq    t0,0x1,t0

lda instead of addq.

         :      40 00 29 b0     stl     t0,64(s0)
         :      00 00 1d a6     ldq     a0,0(gp)
         :      11 04 ff 47     clr     a1
         :      00 00 5d a6     ldq     a2,0(gp)
         :      e4 00 7f 22     lda     a3,228
         :      00 00 7d a7     ldq     t12,0(gp)
         :      00 40 5b 6b     jsr     ra,(t12),1f4 <_vm_object_allocate+0x154>
         :      00 00 ba 27     ldah    gp,0(ra)
         :      00 00 bd 23     lda     gp,0(gp)
         :      00 00 e9 b7     stq     zero,0(s0)
@@ -99,3 +108,8 @@
         :      10 00 5e a5     ldq     s1,16(sp)
         :      20 00 de 23     lda     sp,32(sp)
         :      01 80 fa 6b     ret
+        :      00 00 e0 2f     unop    
+        :      1f 04 ff 47     nop     
+        :      00 00 e0 2f     unop    
+        :      1f 04 ff 47     nop     
+        :      00 00 e0 2f     unop    

2.95 pads out with more nops.

As you can see, I don't see much of anything in this assembly which indicates
the function should execute any differently aside from the two weirdisms
involving t1.  Using -fno-strict-aliasing might get rid of those btw, not sure.

-- 

John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-alpha" in the body of the message