From owner-freebsd-hackers Fri Mar 21 09:21:22 1997 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.5/8.8.5) id JAA27404 for hackers-outgoing; Fri, 21 Mar 1997 09:21:22 -0800 (PST) Received: from dg-rtp.dg.com (dg-rtp.rtp.dg.com [128.222.1.2]) by freefall.freebsd.org (8.8.5/8.8.5) with SMTP id JAA27395 for ; Fri, 21 Mar 1997 09:21:09 -0800 (PST) Received: by dg-rtp.dg.com (5.4R3.10/dg-rtp-v02) id AA15127; Fri, 21 Mar 1997 12:20:03 -0500 Received: from ponds by dg-rtp.dg.com.rtp.dg.com; Fri, 21 Mar 1997 12:20 EST Received: from lakes.water.net (lakes [10.0.0.3]) by ponds.water.net (8.8.3/8.7.3) with ESMTP id MAA16226; Fri, 21 Mar 1997 12:01:42 -0500 (EST) Received: (from rivers@localhost) by lakes.water.net (8.8.3/8.6.9) id MAA03608; Fri, 21 Mar 1997 12:07:09 -0500 (EST) Date: Fri, 21 Mar 1997 12:07:09 -0500 (EST) From: Thomas David Rivers Message-Id: <199703211707.MAA03608@lakes.water.net> To: ponds!root.com!dg, ponds!freefall.cdrom.com!freebsd-hackers, ponds!lakes.water.net!rivers Subject: More ideas on "dup alloc" problem (I'm back.) Content-Type: text Sender: owner-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk Well - You haven't heard from me regarding this problem in a while; I've been investigating, however. What I've determined is that the bytes are getting all the way down into the SCSI routines (my test system is SCSI based, but this occurs on IDE as well) - any they are uncorrupted. In fact, using the nice debugging routines in scsi_base.c (I'm on 2.1.7.1 now) I was able to print the bytes for the block number in question. During the newfs of /dev/rsd0a, scsi_done is convinced that a buffer full of 0x00 bytes was written at the block associated with inode #7680. [The data associated with the xs buffer when the physical block number is the one associated with inode #7680 contains 0x00.] However, the subsequent fsck find's the 0xff's previosly written to the block... (again, dutifully reported by the scsi_done() debugging printf's.) So - what I'm thinking is that scsi_done() was called when, in fact, the operation wasn't complete. That is; an interrupt came through for something else and got handed off to the scsi interrupt routine instead of whatever it was for.... In fact, this would have to be "done" before it ever got started; or else a spurious SCSI interrupt would come through. That is: queue up SCSI I/O catch an interrupt - pass it to scsi interrupt handler scsi_done removes queued I/O from xs chain, calls biodone(), etc... nothing actually gets written. Another explanation is that at queing time for the I/O, we should be at splbio() up until the point that the SCSI start is issued... Now - that kinda is a plausible explanation of what's happening; I have been able to prove it yet... but hope to soon. Also, I'm not sure how this would relate to the IDE situation, or even the MFS situation. - Thoughts? - - Dave Rivers -