From owner-freebsd-hackers Thu Mar 16 17:41:59 2000 Delivered-To: freebsd-hackers@freebsd.org Received: from tisch.mail.mindspring.net (tisch.mail.mindspring.net [207.69.200.157]) by hub.freebsd.org (Postfix) with ESMTP id 99CE737BCE7 for ; Thu, 16 Mar 2000 17:41:55 -0800 (PST) (envelope-from jhb@FreeBSD.org) Received: from john.baldwin.cx (user-2ivetus.dialup.mindspring.com [165.247.119.220]) by tisch.mail.mindspring.net (8.9.3/8.8.5) with ESMTP id UAA25452; Thu, 16 Mar 2000 20:41:48 -0500 (EST) Message-Id: <200003170141.UAA25452@tisch.mail.mindspring.net> X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <200003152346.QAA90746@harmony.village.org> Date: Thu, 16 Mar 2000 20:41:13 -0500 (EST) From: John Baldwin To: Warner Losh Subject: RE: Odd crash Cc: hackers@FreeBSD.org Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On 15-Mar-00 Warner Losh wrote: > > I just got an odd crash: > > Fatal trap 12: page fault while in kernel mode > fault virtual address = 0x8 > fault code = supervisor read, page not present > instruction pointer = 0x8:0xc01d16ac > stack pointer = 0x10:0xc031e704 > frame pointer = 0x10:0xc031e70c > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = Idle > interrupt mask = > kernel: type 12 trap, code=0 > Stopped at arpintr+0x9c: movl 0x8(%ebx),%ecx > db> trace > arpintr(c02a997b,0,10,10,c5d20010) at arpintr+0x9c > swi_net_next() at swi_net_next > db> > > I'm using the realtek driver with a RealTek 8139 built into the SBC > that I have sitting on my desk. > > rl0: port 0x6000-0x60ff mem 0xf9000000-0xf90000ff irq 11 at device 6.0 on > pci0 > rl0: Ethernet address: 00:60:e0:00:7f:c8 > > Looking at the disassembled output of ddb, I think that I'm crashing > at the following place. > if (m->m_len < sizeof(struct arphdr) && > (m = m_pullup(m, sizeof(struct arphdr)) == NULL)) { > log(LOG_ERR, "arp: runt packet -- m_pullup failed."); > continue; > } > ar = mtod(m, struct arphdr *); > > ==> if (ntohs(ar->ar_hrd) != ARPHRD_ETHER > && ntohs(ar->ar_hrd) != ARPHRD_IEEE802) { > log(LOG_ERR, > "arp: unknown hardware address format (%2D)", > (unsigned char *)&ar->ar_hrd, ""); > m_freem(m); > continue; > } > > since ar is NULL for some reason. I have no clue at all why this > would happen. This means that m->m_data has to be NULL. But that > doesn't make sense because of the m_pullup just before this. If it > doesn't return NULL, then I thought that m->m_data was guaranteed to > be valid. > > I think that there might be a bug in the code generation, but I don't > know for sure. If we look at the disassembled output: > > arpintr+0x79: testl %eax,%eax > arpintr+0x7b: setz %al > arpintr+0x7e: movzbl %al,%ebx > arpintr+0x81: testl %ebx,%ebx > arpintr+0x83: jz arpintr+0x9c Functionally, apart from spamming %ebx, these 5 instructions are equivalent to: testl %eax, %eax jnz arpintr+0x9c > arpintr+0x85: pushl $0xc02f5c60 > arpintr+0x8a: pushl $0x3 > arpintr+0x8c: call log > arpintr+0x91: addl $0x8,%esp > arpintr+0x94: jmp arpintr+0x5 > arpintr+0x99: leal 0(%esi),%esi This instruction does nothing, so I assume this isn't optimized code? > arpintr+0x9c: movl 0x8(%ebx),%ecx > arpintr+0x9f: movzwl 0(%ecx),%eax > arpintr+0xa2: xchgb %ah,%al > arpintr+0xa4: cmpw $0x1,%ax > arpintr+0xa8: jz arpintr+0xd8 > arpintr+0xaa: movzwl 0(%ecx),%eax > arpintr+0xad: xchgb %ah,%al > arpintr+0xaf: cmpw $0x6,%ax > arpintr+0xb3: jz arpintr+0xd8 > arpintr+0xb5: pushl $0xc02f5c0e > arpintr+0xba: pushl %ecx > arpintr+0xbb: pushl $0xc02f5ca0 > arpintr+0xc0: pushl $0x3 > arpintr+0xc2: call log > > So we're between the two log calls, which is good. Notice that we > effectively zero %ebx at 7e. We then jump to 9c if it isss zero, and > then dereference 3bx. Bang, we're dead. I think that the jz should > be a jnz, no? It looks like the compiler is making bad assumptions and/or trashing %ebx. testl %eax,%eax ; if %eax == 0, ZF = 1, else ZF = 0 setz %al ; if ZF, %al = 1, else %al = 0, so ; %al = !%eax movzbl %al, %ebx ; %ebx = zero sign extend of %al ; so %ebx == 0 iff %eax != 0 So, %ebx is 0 (zero) if %eax != 0. If %eax = m, then %ebx is zero, and the jump is taken if %eax != NULL, i.e. m != NULL, so that code generation is correct wrt to the if() statement at least. However, the stuff below that bothers me: lea (%esi),%esi ; basically does %esi = %esi This probably is the 'ar = mtod(m, struct arphdr *);' In which case, if this is accurate, then %esi = ar, and it should be: mov $8(%esi), %ecx ; note %esi instead of %ebx Also, if that is the case, then the jz in question should jump to the lea instruction instead of the mov instruction it faulted at. It seems that the compiler is assuming that %ebx = m, when in fact %ebx != m, but is the boolean result of m != NULL. I also don't like how it plays around with setz and %ebx when it doesn't need to. Also, it seems that %eax == m, so perhaps if it were: mov $8(%eax),%ecx it might work as well. I'd have to see some of the instructions beforehand to see what register m is in to really know for sure, but %ebx is definitely not valid when it is being looked at in that mov instruction. > Warner -- John Baldwin -- http://www.cslab.vt.edu/~jobaldwi/ PGP Key: http://www.cslab.vt.edu/~jobaldwi/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message