Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 16 Mar 2000 20:41:13 -0500 (EST)
From:      John Baldwin <jhb@FreeBSD.org>
To:        Warner Losh <imp@village.org>
Cc:        hackers@FreeBSD.org
Subject:   RE: Odd crash
Message-ID:  <200003170141.UAA25452@tisch.mail.mindspring.net>
In-Reply-To: <200003152346.QAA90746@harmony.village.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On 15-Mar-00 Warner Losh wrote:
> 
> I just got an odd crash:
> 
> Fatal trap 12: page fault while in kernel mode
> fault virtual address   = 0x8
> fault code              = supervisor read, page not present
> instruction pointer     = 0x8:0xc01d16ac
> stack pointer           = 0x10:0xc031e704
> frame pointer           = 0x10:0xc031e70c
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, def32 1, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = Idle
> interrupt mask          = 
> kernel: type 12 trap, code=0
> Stopped at      arpintr+0x9c:   movl    0x8(%ebx),%ecx
> db> trace
> arpintr(c02a997b,0,10,10,c5d20010) at arpintr+0x9c
> swi_net_next() at swi_net_next
> db>
> 
> I'm using the realtek driver with a RealTek 8139 built into the SBC
> that I have sitting on my desk.
> 
> rl0: <RealTek 8139 10/100BaseTX> port 0x6000-0x60ff mem 0xf9000000-0xf90000ff irq 11 at device 6.0 on
> pci0
> rl0: Ethernet address: 00:60:e0:00:7f:c8
> 
> Looking at the disassembled output of ddb, I think that I'm crashing
> at the following place.
>                 if (m->m_len < sizeof(struct arphdr) &&
>                     (m = m_pullup(m, sizeof(struct arphdr)) == NULL)) {
>                       log(LOG_ERR, "arp: runt packet -- m_pullup failed.");
>                       continue;
>               }
>               ar = mtod(m, struct arphdr *);
> 
> ==>           if (ntohs(ar->ar_hrd) != ARPHRD_ETHER
>                   && ntohs(ar->ar_hrd) != ARPHRD_IEEE802) {
>                       log(LOG_ERR,
>                           "arp: unknown hardware address format (%2D)",
>                           (unsigned char *)&ar->ar_hrd, "");
>                       m_freem(m);
>                       continue;
>               }
> 
> since ar is NULL for some reason.  I have no clue at all why this
> would happen.  This means that m->m_data has to be NULL.  But that
> doesn't make sense because of the m_pullup just before this.  If it
> doesn't return NULL, then I thought that m->m_data was guaranteed to
> be valid.
> 
> I think that there might be a bug in the code generation, but I don't
> know for sure.  If we look at the disassembled output:
> 
> arpintr+0x79:   testl   %eax,%eax
> arpintr+0x7b:   setz    %al
> arpintr+0x7e:   movzbl  %al,%ebx
> arpintr+0x81:   testl   %ebx,%ebx
> arpintr+0x83:   jz      arpintr+0x9c

Functionally, apart from spamming %ebx, these 5 instructions
are equivalent to:

  testl %eax, %eax
  jnz   arpintr+0x9c

> arpintr+0x85:   pushl   $0xc02f5c60
> arpintr+0x8a:   pushl   $0x3
> arpintr+0x8c:   call    log
> arpintr+0x91:   addl    $0x8,%esp
> arpintr+0x94:   jmp     arpintr+0x5
> arpintr+0x99:   leal    0(%esi),%esi

This instruction does nothing, so I assume this isn't
optimized code?

> arpintr+0x9c:   movl    0x8(%ebx),%ecx
> arpintr+0x9f:   movzwl  0(%ecx),%eax
> arpintr+0xa2:   xchgb   %ah,%al
> arpintr+0xa4:   cmpw    $0x1,%ax
> arpintr+0xa8:   jz      arpintr+0xd8
> arpintr+0xaa:   movzwl  0(%ecx),%eax
> arpintr+0xad:   xchgb   %ah,%al
> arpintr+0xaf:   cmpw    $0x6,%ax
> arpintr+0xb3:   jz      arpintr+0xd8
> arpintr+0xb5:   pushl   $0xc02f5c0e
> arpintr+0xba:   pushl   %ecx
> arpintr+0xbb:   pushl   $0xc02f5ca0
> arpintr+0xc0:   pushl   $0x3
> arpintr+0xc2:   call    log
> 
> So we're between the two log calls, which is good.  Notice that we
> effectively zero %ebx at 7e.  We then jump to 9c if it isss zero, and
> then dereference 3bx.  Bang, we're dead.    I think that the jz should
> be a jnz, no?

It looks like the compiler is making bad assumptions and/or trashing
%ebx.

 testl %eax,%eax   ; if %eax == 0, ZF = 1, else ZF = 0
 setz %al          ; if ZF, %al = 1, else %al = 0, so
                   ; %al = !%eax
 movzbl %al, %ebx  ; %ebx = zero sign extend of %al
                   ; so %ebx == 0 iff %eax != 0

So, %ebx is 0 (zero) if %eax != 0.  If %eax = m, then
%ebx is zero, and the jump is taken if %eax != NULL, i.e.
m != NULL, so that code generation is correct wrt to the if()
statement at least.  However, the stuff below that bothers me: 

  lea (%esi),%esi  ; basically does %esi = %esi

This probably is the

  'ar = mtod(m, struct arphdr *);'

In which case, if this is accurate, then %esi = ar,
and it should be:

  mov $8(%esi), %ecx  ; note %esi instead of %ebx

Also, if that is the case, then the jz in question
should jump to the lea instruction instead of the
mov instruction it faulted at.  It seems that the
compiler is assuming that %ebx = m, when in fact
%ebx != m, but is the boolean result of m != NULL.

I also don't like how it plays around with setz and
%ebx when it doesn't need to.  Also, it seems that
%eax == m, so perhaps if it were:

  mov $8(%eax),%ecx

it might work as well.  I'd have to see some of the
instructions beforehand to see what register m is in
to really know for sure, but %ebx is definitely not
valid when it is being looked at in that mov instruction.

> Warner

-- 

John Baldwin <jhb@FreeBSD.org> -- http://www.cslab.vt.edu/~jobaldwi/
PGP Key: http://www.cslab.vt.edu/~jobaldwi/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200003170141.UAA25452>