Date: Sun, 6 Apr 1997 22:16:26 -0400 (EDT) From: Thomas David Rivers <ponds!rivers@dg-rtp.dg.com> To: ponds!zeta.org.au!bde@ucbvax.Berkeley.EDU, ponds!root.com!dg@ucbvax.Berkeley.EDU, ponds!freefall.cdrom.com!freebsd-hackers@ucbvax.Berkeley.EDU, ponds!lakes.water.net!rivers@ucbvax.Berkeley.EDU, ponds!lambert.org!terry@ucbvax.Berkeley.EDU Subject: Some insight on "dup alloc" problems..... Message-ID: <199704070216.WAA01714@lakes.water.net>
index | next in thread | raw e-mail
Ok -
Possible insight #1. The problem I have reproduced on the small
machine is not the same as the "dup alloc" problem - it appears to
be aha1542 specific. [The "real" problem I'm getting at here appears
on both aha1542 and IDE..]
In any event:
Possible insight #2.
Recall that I had mentioned that printf()'s in certain locations
masked the problem. I took that idea and began a binary search
in the kernel... [since I had learned how to use ddb.] What I
did was insert a call to a function I could break on, then under
ddb break on that function and see if the problem goes away.
I've narrowed the problem down to 8 lines (well, 5 if you don't
count comments) in aha1542.c (recall - this is 2.1.7 source); around
line 1670 or so, in the function aha_scsi_cmd():
if (!(flags & SCSI_NOMASK)) {
if(debug) break_tdr4();
s = splbio(); /* stop instant timeouts */
timeout(aha_timeout, (caddr_t)ccb, (xs->timeout * hz) / 1000);
aha_startmbx(ccb->mbx);
/*
* Usually return SUCCESSFULLY QUEUED
*/
splx(s);
SC_DEBUG(xs->sc_link, SDEV_DB3, ("sent\n"));
if(debug) break_tdr3();
debug = 0;
return (SUCCESSFULLY_QUEUED);
}
notice the calls to "break_tdr4() and "break_tdr3()" - these are two
empty functions just to be place-holders for ddb break points.
Also, the variable "debug" is an auto that is set at the top of the
function when the block is the one I'm interested in:
if(xs->bp && xs->bp->b_pblkno==65632) {
debug = 1;
}
So; here's what I've learned.
If I break at the call to break_tdr4() - the problem is masked - it
does *not* occur.
If I break at the call to break_tdr3() - the problem is *not* masked -
it does occur.
So - it would seem that some interrupt is coming in that causes this
timeout not to function properly...
I notice that ahaintr() does an untimeout() on the ccb it got from
the mailbox (aha_mbx)... but, that timeout() seems to be properly
protected by an splbio()/splx(); so I wouldn't think the untimeout
could incorrectly remove it... Besides - if aha_timeout() had run;
there is a nice little "timed out" message that gets printed - which
I'm not seeing.. That kinda discounts any timeout issue...
However, aha_startmbx() is simply a #define that notes the command has
been started and does an outb() to start the transfer.... so it's likely
not there as well.
Thus - it appears the command is started; then the aha_mbx that's being
used for the command must be getting messed up somehow. That seems like
it could happen if aha_get_ccb() didn't protect the ccb free list
appropriately; which could happen if the xs->flags aren't set correctly
and splbio() isn't done in aha_get_ccb()... I'm going to investigate
this idea unless someone comes up and says "Oh yeah; it must be XXXX"
Ideas???
- Thanks -
- Dave Rivers -
help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199704070216.WAA01714>
