From owner-freebsd-hackers Sun Apr 6 21:26:11 1997 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.5/8.8.5) id VAA16649 for hackers-outgoing; Sun, 6 Apr 1997 21:26:11 -0700 (PDT) Received: from who.cdrom.com (who.cdrom.com [204.216.27.3]) by freefall.freebsd.org (8.8.5/8.8.5) with ESMTP id VAA16636 for ; Sun, 6 Apr 1997 21:26:06 -0700 (PDT) Received: from dg-rtp.dg.com (dg-rtp.rtp.dg.com [128.222.1.2]) by who.cdrom.com (8.8.5/8.6.11) with SMTP id UAA28648 for ; Sun, 6 Apr 1997 20:20:43 -0700 (PDT) Received: by dg-rtp.dg.com (5.4R3.10/dg-rtp-v02) id AA10683; Sun, 6 Apr 1997 23:20:03 -0400 Received: from ponds by dg-rtp.dg.com.rtp.dg.com; Sun, 6 Apr 1997 23:20 EDT Received: from lakes.water.net (lakes [10.0.0.3]) by ponds.water.net (8.8.3/8.7.3) with ESMTP id WAA09402; Sun, 6 Apr 1997 22:10:24 -0400 (EDT) Received: (from rivers@localhost) by lakes.water.net (8.8.3/8.6.9) id WAA01714; Sun, 6 Apr 1997 22:16:26 -0400 (EDT) Date: Sun, 6 Apr 1997 22:16:26 -0400 (EDT) From: Thomas David Rivers Message-Id: <199704070216.WAA01714@lakes.water.net> To: ponds!zeta.org.au!bde@ucbvax.Berkeley.EDU, ponds!root.com!dg@ucbvax.Berkeley.EDU, ponds!freefall.cdrom.com!freebsd-hackers@ucbvax.Berkeley.EDU, ponds!lakes.water.net!rivers@ucbvax.Berkeley.EDU, ponds!lambert.org!terry@ucbvax.Berkeley.EDU Subject: Some insight on "dup alloc" problems..... Content-Type: text Sender: owner-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk Ok - Possible insight #1. The problem I have reproduced on the small machine is not the same as the "dup alloc" problem - it appears to be aha1542 specific. [The "real" problem I'm getting at here appears on both aha1542 and IDE..] In any event: Possible insight #2. Recall that I had mentioned that printf()'s in certain locations masked the problem. I took that idea and began a binary search in the kernel... [since I had learned how to use ddb.] What I did was insert a call to a function I could break on, then under ddb break on that function and see if the problem goes away. I've narrowed the problem down to 8 lines (well, 5 if you don't count comments) in aha1542.c (recall - this is 2.1.7 source); around line 1670 or so, in the function aha_scsi_cmd(): if (!(flags & SCSI_NOMASK)) { if(debug) break_tdr4(); s = splbio(); /* stop instant timeouts */ timeout(aha_timeout, (caddr_t)ccb, (xs->timeout * hz) / 1000); aha_startmbx(ccb->mbx); /* * Usually return SUCCESSFULLY QUEUED */ splx(s); SC_DEBUG(xs->sc_link, SDEV_DB3, ("sent\n")); if(debug) break_tdr3(); debug = 0; return (SUCCESSFULLY_QUEUED); } notice the calls to "break_tdr4() and "break_tdr3()" - these are two empty functions just to be place-holders for ddb break points. Also, the variable "debug" is an auto that is set at the top of the function when the block is the one I'm interested in: if(xs->bp && xs->bp->b_pblkno==65632) { debug = 1; } So; here's what I've learned. If I break at the call to break_tdr4() - the problem is masked - it does *not* occur. If I break at the call to break_tdr3() - the problem is *not* masked - it does occur. So - it would seem that some interrupt is coming in that causes this timeout not to function properly... I notice that ahaintr() does an untimeout() on the ccb it got from the mailbox (aha_mbx)... but, that timeout() seems to be properly protected by an splbio()/splx(); so I wouldn't think the untimeout could incorrectly remove it... Besides - if aha_timeout() had run; there is a nice little "timed out" message that gets printed - which I'm not seeing.. That kinda discounts any timeout issue... However, aha_startmbx() is simply a #define that notes the command has been started and does an outb() to start the transfer.... so it's likely not there as well. Thus - it appears the command is started; then the aha_mbx that's being used for the command must be getting messed up somehow. That seems like it could happen if aha_get_ccb() didn't protect the ccb free list appropriately; which could happen if the xs->flags aren't set correctly and splbio() isn't done in aha_get_ccb()... I'm going to investigate this idea unless someone comes up and says "Oh yeah; it must be XXXX" Ideas??? - Thanks - - Dave Rivers -