From owner-freebsd-hackers Thu Sep 13 9:43:37 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from goose.mail.pas.earthlink.net (goose.mail.pas.earthlink.net [207.217.120.18]) by hub.freebsd.org (Postfix) with ESMTP id 941EF37B40C for ; Thu, 13 Sep 2001 09:43:29 -0700 (PDT) Received: from mindspring.com (dialup-209.247.137.158.Dial1.SanJose1.Level3.net [209.247.137.158]) by goose.mail.pas.earthlink.net (EL-8_9_3_3/8.9.3) with ESMTP id JAA26198; Thu, 13 Sep 2001 09:43:22 -0700 (PDT) Message-ID: <3BA0E256.10B8F05B@mindspring.com> Date: Thu, 13 Sep 2001 09:44:06 -0700 From: Terry Lambert Reply-To: tlambert2@mindspring.com X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Zhihui Zhang Cc: freebsd-hackers@freebsd.org Subject: Re: Kernel module debug help References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Ah. Interesting bug; perhaps related to a similar experience of my own... so let's stare at it! Zhihui Zhang wrote: > > I am debugging a KLD and I have got the following panic inside an > interrupt context: > > fault virutal address = 0x1080050 > ... > interrupt mask = bio > kernel trap: type 12, code = 0 > Stopped at vwakeup+0x14: decl 0x44(%eax) > > Where eax is 0x108000c and vwakeup() is called from biodone(). > > Since this panic occurs in an interrupt environment, I have no idea how to > trace it. Is there a way to find the bug by tracing or what is the prime > suspect in this case. Thanks! The best advice would be to repeat this failure in the context of linking the module in statically instead of dynamically. If it won't repeat for you then, the problem has to be in the form of memory allocation you are using as part of the module. If you want to brute-force the issue, find out what is being dereferenced at vwakeup+0x14 ...it looks to be: vp->v_numoutput--; though mine is at: 0x40189c9c : decl 0x44(%eax) which implies you have bad/older/newer vwakeup code. Maybe you are just missing the "if" test that verifies it's non-NULL vnode pointer being dereferenced??? That would match the number of bytes your "decl" instruction is off from mine: 614 void 615 vwakeup(bp) 616 register struct buf *bp; 617 { 618 register struct vnode *vp; 619 620 bp->b_flags &= ~B_WRITEINPROG; 621 if ((vp = bp->b_vp)) { 622 vp->v_numoutput--; 623 if (vp->v_numoutput < 0) 624 panic("vwakeup: neg numoutput"); 625 if ((vp->v_numoutput == 0) && (vp->v_flag & VBWAIT)) { 626 vp->v_flag &= ~VBWAIT; 627 wakeup((caddr_t) &vp->v_numoutput); 628 } 629 } 630 } I'll also note that 0x44 is 68, which implies 17 long words before v_numoutput is declared in struct vnode; this didn't match my quick count. I rather expect that it's in a swappable memory region that's currently not mapped, or NULL (we see it's not NULL), so this implies that it's an unitialized vnode from the zone -- a thing you can't initialize at interrupt. This can happen as the result of a kevent() completion being noted (e.g. readable) at interrupt context, since you can get swappable objects (it also looks like you may be on your way out of splbio, which implies networking -- my guess is therefore that you are working on network file system code, and have a "shadow" vnode that you are using as a context for the calls that should have been allocated out of an interrupt zone instead of out of the main memory allocator, which is not interrupt safe for new allocations... 8-)). For example, I use LRP, which drastically increases my connections per second out of the TCP stack and eliminates receiver livelock and a number of other problems for heavily loaded servers, but it means that sockets need to be able of accept'ing to completion (creating a new socket) at interrupt context. But when this happens, I don't have a proc structure handy to deal with the issue (since I'm at interrupt context). The sneaky way around this is to use the proc from the already existing socket on which the listen for which the accept is being completed was initially posted -- which gets me the proc struct, which gets me the ucred, so I have the proc pointer and the ucred pointer necessary to run the connection to completion. I rather expect that if you are depending on the existance of something similar at interrupt context, that you will have to either queue it and run to completion at a software interrupt level (e.g. NETISR -- not recommended, even for networking!), or just "lose" the wakeup (which is what the vwakeup code I have does, with it's "if" test). Still, your best bet is to compile the thing in static, repeat the problem, and then look at where things went wrong in the kernel debugger. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message