Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 6 Apr 1997 22:16:26 -0400 (EDT)
From:      Thomas David Rivers <ponds!rivers@dg-rtp.dg.com>
To:        ponds!zeta.org.au!bde@ucbvax.Berkeley.EDU, ponds!root.com!dg@ucbvax.Berkeley.EDU, ponds!freefall.cdrom.com!freebsd-hackers@ucbvax.Berkeley.EDU, ponds!lakes.water.net!rivers@ucbvax.Berkeley.EDU, ponds!lambert.org!terry@ucbvax.Berkeley.EDU
Subject:   Some insight on "dup alloc" problems.....
Message-ID:  <199704070216.WAA01714@lakes.water.net>

next in thread | raw e-mail | index | archive | help

Ok -

 Possible insight #1.  The problem I have reproduced on the small
machine is not the same as the "dup alloc" problem - it appears to
be aha1542 specific.  [The "real" problem I'm getting at here appears
on both aha1542 and IDE..]

 In any event:

   Possible insight #2.

 Recall that I had mentioned that printf()'s in certain locations
masked the problem.   I took that idea and began a binary search
in the kernel... [since I had learned how to use ddb.]   What I 
did was insert a call to a function I could break on, then under
ddb break on that function and see if the problem goes away.

 I've narrowed the problem down to 8 lines (well, 5 if you don't
count comments) in aha1542.c (recall - this is 2.1.7 source); around
line 1670 or so, in the function aha_scsi_cmd():

        if (!(flags & SCSI_NOMASK)) {
if(debug) break_tdr4();
                s = splbio();   /* stop instant timeouts */
                timeout(aha_timeout, (caddr_t)ccb, (xs->timeout * hz) / 1000);
                aha_startmbx(ccb->mbx);
                /*      
                 * Usually return SUCCESSFULLY QUEUED
                 */
                splx(s);
                SC_DEBUG(xs->sc_link, SDEV_DB3, ("sent\n"));
if(debug) break_tdr3();
debug = 0;
                return (SUCCESSFULLY_QUEUED);
        }


 notice the calls to "break_tdr4() and "break_tdr3()" - these are two
empty functions just to be place-holders for ddb break points.
Also, the variable "debug" is an auto that is set at the top of the
function when the block is the one I'm interested in:

    if(xs->bp && xs->bp->b_pblkno==65632) {
       debug = 1;
    }


 So; here's what I've learned.

 If I break at the call to break_tdr4() - the problem is masked - it
does *not* occur.

 If I break at the call to break_tdr3() - the problem is *not* masked -
it does occur.

 So - it would seem that some interrupt is coming in that causes this
timeout not to function properly... 

 I notice that ahaintr() does an untimeout() on the ccb it got from
the mailbox (aha_mbx)... but, that timeout() seems to be properly
protected by an splbio()/splx(); so I wouldn't think the untimeout
could incorrectly remove it...  Besides - if aha_timeout() had run; 
there is a nice little "timed out" message that gets printed - which 
I'm not seeing..  That kinda discounts any timeout issue...

 However, aha_startmbx() is simply a #define that notes the command has
been started and does an outb() to start the transfer.... so it's likely
not there as well.

 Thus - it appears the command is started; then the aha_mbx that's being
used for the command must be getting messed up somehow.  That seems like
it could happen if aha_get_ccb() didn't protect the ccb free list
appropriately; which could happen if the xs->flags aren't set correctly
and splbio() isn't done in aha_get_ccb()...  I'm going to investigate
this idea unless someone comes up and says "Oh yeah; it must be XXXX"

 Ideas???

	  - Thanks -
	- Dave Rivers -



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199704070216.WAA01714>