Date: Sun, 6 Apr 1997 22:16:26 -0400 (EDT) From: Thomas David Rivers <ponds!rivers@dg-rtp.dg.com> To: ponds!zeta.org.au!bde@ucbvax.Berkeley.EDU, ponds!root.com!dg@ucbvax.Berkeley.EDU, ponds!freefall.cdrom.com!freebsd-hackers@ucbvax.Berkeley.EDU, ponds!lakes.water.net!rivers@ucbvax.Berkeley.EDU, ponds!lambert.org!terry@ucbvax.Berkeley.EDU Subject: Some insight on "dup alloc" problems..... Message-ID: <199704070216.WAA01714@lakes.water.net>
next in thread | raw e-mail | index | archive | help
Ok - Possible insight #1. The problem I have reproduced on the small machine is not the same as the "dup alloc" problem - it appears to be aha1542 specific. [The "real" problem I'm getting at here appears on both aha1542 and IDE..] In any event: Possible insight #2. Recall that I had mentioned that printf()'s in certain locations masked the problem. I took that idea and began a binary search in the kernel... [since I had learned how to use ddb.] What I did was insert a call to a function I could break on, then under ddb break on that function and see if the problem goes away. I've narrowed the problem down to 8 lines (well, 5 if you don't count comments) in aha1542.c (recall - this is 2.1.7 source); around line 1670 or so, in the function aha_scsi_cmd(): if (!(flags & SCSI_NOMASK)) { if(debug) break_tdr4(); s = splbio(); /* stop instant timeouts */ timeout(aha_timeout, (caddr_t)ccb, (xs->timeout * hz) / 1000); aha_startmbx(ccb->mbx); /* * Usually return SUCCESSFULLY QUEUED */ splx(s); SC_DEBUG(xs->sc_link, SDEV_DB3, ("sent\n")); if(debug) break_tdr3(); debug = 0; return (SUCCESSFULLY_QUEUED); } notice the calls to "break_tdr4() and "break_tdr3()" - these are two empty functions just to be place-holders for ddb break points. Also, the variable "debug" is an auto that is set at the top of the function when the block is the one I'm interested in: if(xs->bp && xs->bp->b_pblkno==65632) { debug = 1; } So; here's what I've learned. If I break at the call to break_tdr4() - the problem is masked - it does *not* occur. If I break at the call to break_tdr3() - the problem is *not* masked - it does occur. So - it would seem that some interrupt is coming in that causes this timeout not to function properly... I notice that ahaintr() does an untimeout() on the ccb it got from the mailbox (aha_mbx)... but, that timeout() seems to be properly protected by an splbio()/splx(); so I wouldn't think the untimeout could incorrectly remove it... Besides - if aha_timeout() had run; there is a nice little "timed out" message that gets printed - which I'm not seeing.. That kinda discounts any timeout issue... However, aha_startmbx() is simply a #define that notes the command has been started and does an outb() to start the transfer.... so it's likely not there as well. Thus - it appears the command is started; then the aha_mbx that's being used for the command must be getting messed up somehow. That seems like it could happen if aha_get_ccb() didn't protect the ccb free list appropriately; which could happen if the xs->flags aren't set correctly and splbio() isn't done in aha_get_ccb()... I'm going to investigate this idea unless someone comes up and says "Oh yeah; it must be XXXX" Ideas??? - Thanks - - Dave Rivers -
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199704070216.WAA01714>