From owner-freebsd-hackers  Fri Mar 21 09:21:22 1997
Return-Path: <owner-hackers>
Received: (from root@localhost)
          by freefall.freebsd.org (8.8.5/8.8.5) id JAA27404
          for hackers-outgoing; Fri, 21 Mar 1997 09:21:22 -0800 (PST)
Received: from dg-rtp.dg.com (dg-rtp.rtp.dg.com [128.222.1.2])
          by freefall.freebsd.org (8.8.5/8.8.5) with SMTP id JAA27395
          for <freebsd-hackers@freefall.FreeBSD.org>; Fri, 21 Mar 1997 09:21:09 -0800 (PST)
Received: by dg-rtp.dg.com (5.4R3.10/dg-rtp-v02)
	id AA15127; Fri, 21 Mar 1997 12:20:03 -0500
Received: from ponds by dg-rtp.dg.com.rtp.dg.com; Fri, 21 Mar 1997 12:20 EST
Received: from lakes.water.net (lakes [10.0.0.3]) by ponds.water.net (8.8.3/8.7.3) with ESMTP id MAA16226; Fri, 21 Mar 1997 12:01:42 -0500 (EST)
Received: (from rivers@localhost) by lakes.water.net (8.8.3/8.6.9) id MAA03608; Fri, 21 Mar 1997 12:07:09 -0500 (EST)
Date: Fri, 21 Mar 1997 12:07:09 -0500 (EST)
From: Thomas David Rivers <ponds!rivers@dg-rtp.dg.com>
Message-Id: <199703211707.MAA03608@lakes.water.net>
To: ponds!root.com!dg, ponds!freefall.cdrom.com!freebsd-hackers,
        ponds!lakes.water.net!rivers
Subject: More ideas on "dup alloc" problem (I'm back.)
Content-Type: text
Sender: owner-hackers@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk


Well -

 You haven't heard from me regarding this problem in a while;
I've been investigating, however.

 What I've determined is that the bytes are getting all the way
down into the SCSI routines (my test system is SCSI based, but
this occurs on IDE as well) - any they are uncorrupted.

 In fact, using the nice debugging routines in scsi_base.c (I'm
on 2.1.7.1 now) I was able to print the bytes for the block number
in question.

 During the newfs of /dev/rsd0a, scsi_done is convinced that a buffer 
full of 0x00 bytes was written at the block associated with inode #7680.
[The data associated with the xs buffer when the physical block number is
the one associated with inode #7680 contains 0x00.]

 However, the subsequent fsck find's the 0xff's previosly written
to the block...  (again, dutifully reported by the scsi_done() debugging
printf's.)

 So - what I'm thinking is that scsi_done() was called when, in fact,
the operation wasn't complete.  That is; an interrupt came through for
something else and got handed off to the scsi interrupt routine instead
of whatever it was for....  In fact, this would have to be "done"
before it ever got started; or else a spurious SCSI interrupt would
come through.

 That is:
	queue up SCSI I/O
	catch an interrupt - pass it to scsi interrupt handler
	scsi_done removes queued I/O from xs chain, calls biodone(), etc...

 nothing actually gets written.

Another explanation is that at queing time for the I/O, we should be
at splbio() up until the point that the SCSI start is issued...

 Now - that kinda is a plausible explanation of what's happening; I have
been able to prove it yet... but hope to soon.  Also, I'm not sure
how this would relate to the IDE situation, or even the MFS situation.

	- Thoughts? -
	- Dave Rivers -