Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 14 Sep 2001 11:46:55 -0400 (EDT)
From:      Zhihui Zhang <zzhang@cs.binghamton.edu>
To:        Terry Lambert <tlambert2@mindspring.com>
Cc:        freebsd-hackers@FreeBSD.ORG
Subject:   Re: Kernel module debug help
Message-ID:  <Pine.SOL.4.21.0109141141220.1077-100000@opal>
In-Reply-To: <3BA0E256.10B8F05B@mindspring.com>

next in thread | previous in thread | raw e-mail | index | archive | help

Thanks! It turns out the bug is caused by the following reason:

I use bp=malloc() to allocate a buffer structure and issue an I/O with
BUF_STRATEGY(). Then I use free() to free bp even before the I/O is
completed. Really stupid. Memory trespass seems to be the most common
source of panics I have met.

-Zhihui

On Thu, 13 Sep 2001, Terry Lambert wrote:

> Ah.  Interesting bug; perhaps related to a similar experience
> of my own... so let's stare at it!
> 
> 
> Zhihui Zhang wrote:
> > 
> > I am debugging a KLD and I have got the following panic inside an
> > interrupt context:
> > 
> > fault virutal address = 0x1080050
> > ...
> > interrupt mask = bio
> > kernel trap: type 12, code = 0
> > Stopped at vwakeup+0x14: decl 0x44(%eax)
> > 
> > Where eax is 0x108000c and vwakeup() is called from biodone().
> > 
> > Since this panic occurs in an interrupt environment, I have no idea how to
> > trace it. Is there a way to find the bug by tracing or what is the prime
> > suspect in this case.  Thanks!
> 
> The best advice would be to repeat this failure in the
> context of linking the module in statically instead of
> dynamically.
> 
> If it won't repeat for you then, the problem has to be in
> the form of memory allocation you are using as part of the
> module.
> 
> If you want to brute-force the issue, find out what is being
> dereferenced at vwakeup+0x14 ...it looks to be:
> 
> 	vp->v_numoutput--;
> 
> though mine is at:
> 
> 	0x40189c9c <vwakeup+20>:        decl   0x44(%eax)
> 
> which implies you have bad/older/newer vwakeup code.  Maybe
> you are just missing the "if" test that verifies it's non-NULL
> vnode pointer being dereferenced???  That would match the number
> of bytes your "decl" instruction is off from mine:
> 
> 614     void
> 615     vwakeup(bp)
> 616             register struct buf *bp;
> 617     {
> 618             register struct vnode *vp;
> 619
> 620             bp->b_flags &= ~B_WRITEINPROG;
> 621             if ((vp = bp->b_vp)) {
> 622                     vp->v_numoutput--;
> 623                     if (vp->v_numoutput < 0)
> 624                             panic("vwakeup: neg numoutput");
> 625                     if ((vp->v_numoutput == 0) && (vp->v_flag & VBWAIT)) {
> 626                             vp->v_flag &= ~VBWAIT;
> 627                             wakeup((caddr_t) &vp->v_numoutput);
> 628                     }
> 629             }
> 630     }
> 
> 
> I'll also note that 0x44 is 68, which implies 17 long words
> before v_numoutput is declared in struct vnode; this didn't
> match my quick count.
> 	
> 
> I rather expect that it's in a swappable memory region that's
> currently not mapped, or NULL (we see it's not NULL), so this
> implies that it's an unitialized vnode from the zone -- a thing
> you can't initialize at interrupt.
> 
> This can happen as the result of a kevent() completion being
> noted (e.g. readable) at interrupt context, since you can get
> swappable objects (it also looks like you may be on your way
> out of splbio, which implies networking -- my guess is therefore
> that you are working on network file system code, and have a
> "shadow" vnode that you are using as a context for the calls
> that should have been allocated out of an interrupt zone instead
> of out of the main memory allocator, which is not interrupt safe
> for new allocations... 8-)).
> 
> For example, I use LRP, which drastically increases my connections
> per second out of the TCP stack and eliminates receiver livelock
> and a number of other problems for heavily loaded servers, but it
> means that sockets need to be able of accept'ing to completion
> (creating a new socket) at interrupt context.
> 
> But when this happens, I don't have a proc structure handy to
> deal with the issue (since I'm at interrupt context).  The
> sneaky way around this is to use the proc from the already
> existing socket on which the listen for which the accept is
> being completed was initially posted -- which gets me the proc
> struct, which gets me the ucred, so I have the proc pointer
> and the ucred pointer necessary to run the connection to
> completion.
> 
> I rather expect that if you are depending on the existance of
> something similar at interrupt context, that you will have to
> either queue it and run to completion at a software interrupt
> level (e.g. NETISR -- not recommended, even for networking!),
> or just "lose" the wakeup (which is what the vwakeup code I
> have does, with it's "if" test).
> 
> Still, your best bet is to compile the thing in static, repeat
> the problem, and then look at where things went wrong in the
> kernel debugger.
> 
> -- Terry
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-hackers" in the body of the message
> 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.SOL.4.21.0109141141220.1077-100000>