From owner-freebsd-hackers Fri Sep 14 8:47: 7 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from bingnet2.cc.binghamton.edu (bingnet2.cc.binghamton.edu [128.226.1.18]) by hub.freebsd.org (Postfix) with ESMTP id B09DC37B407 for ; Fri, 14 Sep 2001 08:46:59 -0700 (PDT) Received: from opal (cs.binghamton.edu [128.226.123.101]) by bingnet2.cc.binghamton.edu (8.11.4/8.11.4) with ESMTP id f8EFkwZ24443; Fri, 14 Sep 2001 11:46:58 -0400 (EDT) Date: Fri, 14 Sep 2001 11:46:55 -0400 (EDT) From: Zhihui Zhang X-Sender: zzhang@opal To: Terry Lambert Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: Kernel module debug help In-Reply-To: <3BA0E256.10B8F05B@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Thanks! It turns out the bug is caused by the following reason: I use bp=malloc() to allocate a buffer structure and issue an I/O with BUF_STRATEGY(). Then I use free() to free bp even before the I/O is completed. Really stupid. Memory trespass seems to be the most common source of panics I have met. -Zhihui On Thu, 13 Sep 2001, Terry Lambert wrote: > Ah. Interesting bug; perhaps related to a similar experience > of my own... so let's stare at it! > > > Zhihui Zhang wrote: > > > > I am debugging a KLD and I have got the following panic inside an > > interrupt context: > > > > fault virutal address = 0x1080050 > > ... > > interrupt mask = bio > > kernel trap: type 12, code = 0 > > Stopped at vwakeup+0x14: decl 0x44(%eax) > > > > Where eax is 0x108000c and vwakeup() is called from biodone(). > > > > Since this panic occurs in an interrupt environment, I have no idea how to > > trace it. Is there a way to find the bug by tracing or what is the prime > > suspect in this case. Thanks! > > The best advice would be to repeat this failure in the > context of linking the module in statically instead of > dynamically. > > If it won't repeat for you then, the problem has to be in > the form of memory allocation you are using as part of the > module. > > If you want to brute-force the issue, find out what is being > dereferenced at vwakeup+0x14 ...it looks to be: > > vp->v_numoutput--; > > though mine is at: > > 0x40189c9c : decl 0x44(%eax) > > which implies you have bad/older/newer vwakeup code. Maybe > you are just missing the "if" test that verifies it's non-NULL > vnode pointer being dereferenced??? That would match the number > of bytes your "decl" instruction is off from mine: > > 614 void > 615 vwakeup(bp) > 616 register struct buf *bp; > 617 { > 618 register struct vnode *vp; > 619 > 620 bp->b_flags &= ~B_WRITEINPROG; > 621 if ((vp = bp->b_vp)) { > 622 vp->v_numoutput--; > 623 if (vp->v_numoutput < 0) > 624 panic("vwakeup: neg numoutput"); > 625 if ((vp->v_numoutput == 0) && (vp->v_flag & VBWAIT)) { > 626 vp->v_flag &= ~VBWAIT; > 627 wakeup((caddr_t) &vp->v_numoutput); > 628 } > 629 } > 630 } > > > I'll also note that 0x44 is 68, which implies 17 long words > before v_numoutput is declared in struct vnode; this didn't > match my quick count. > > > I rather expect that it's in a swappable memory region that's > currently not mapped, or NULL (we see it's not NULL), so this > implies that it's an unitialized vnode from the zone -- a thing > you can't initialize at interrupt. > > This can happen as the result of a kevent() completion being > noted (e.g. readable) at interrupt context, since you can get > swappable objects (it also looks like you may be on your way > out of splbio, which implies networking -- my guess is therefore > that you are working on network file system code, and have a > "shadow" vnode that you are using as a context for the calls > that should have been allocated out of an interrupt zone instead > of out of the main memory allocator, which is not interrupt safe > for new allocations... 8-)). > > For example, I use LRP, which drastically increases my connections > per second out of the TCP stack and eliminates receiver livelock > and a number of other problems for heavily loaded servers, but it > means that sockets need to be able of accept'ing to completion > (creating a new socket) at interrupt context. > > But when this happens, I don't have a proc structure handy to > deal with the issue (since I'm at interrupt context). The > sneaky way around this is to use the proc from the already > existing socket on which the listen for which the accept is > being completed was initially posted -- which gets me the proc > struct, which gets me the ucred, so I have the proc pointer > and the ucred pointer necessary to run the connection to > completion. > > I rather expect that if you are depending on the existance of > something similar at interrupt context, that you will have to > either queue it and run to completion at a software interrupt > level (e.g. NETISR -- not recommended, even for networking!), > or just "lose" the wakeup (which is what the vwakeup code I > have does, with it's "if" test). > > Still, your best bet is to compile the thing in static, repeat > the problem, and then look at where things went wrong in the > kernel debugger. > > -- Terry > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-hackers" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message