From owner-freebsd-fs  Tue Oct 13 21:40:18 1998
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id VAA05997
          for freebsd-fs-outgoing; Tue, 13 Oct 1998 21:40:18 -0700 (PDT)
          (envelope-from owner-freebsd-fs@FreeBSD.ORG)
Received: from gatekeeper.tsc.tdk.com (gatekeeper.tsc.tdk.com [207.113.159.21])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id VAA05980;
          Tue, 13 Oct 1998 21:40:12 -0700 (PDT)
          (envelope-from gdonl@tsc.tdk.com)
Received: from sunrise.gv.tsc.tdk.com (root@sunrise.gv.tsc.tdk.com [192.168.241.191])
	by gatekeeper.tsc.tdk.com (8.8.8/8.8.8) with ESMTP id VAA09333;
	Tue, 13 Oct 1998 21:39:53 -0700 (PDT)
	(envelope-from gdonl@tsc.tdk.com)
Received: from salsa.gv.tsc.tdk.com (salsa.gv.tsc.tdk.com [192.168.241.194])
	by sunrise.gv.tsc.tdk.com (8.8.5/8.8.5) with ESMTP id VAA23764;
	Tue, 13 Oct 1998 21:39:52 -0700 (PDT)
Received: (from gdonl@localhost)
	by salsa.gv.tsc.tdk.com (8.8.5/8.8.5) id VAA17463;
	Tue, 13 Oct 1998 21:39:50 -0700 (PDT)
From: Don Lewis <Don.Lewis@tsc.tdk.com>
Message-Id: <199810140439.VAA17463@salsa.gv.tsc.tdk.com>
Date: Tue, 13 Oct 1998 21:39:50 -0700
In-Reply-To: "Justin T. Gibbs" <gibbs@plutotech.com>
       "Re: filesystem safety and SCSI disk write caching" (Oct 13,  6:00pm)
X-Mailer: Mail User's Shell (7.2.6 alpha(3) 7/19/95)
To: "Justin T. Gibbs" <gibbs@plutotech.com>,
        Terry Lambert <tlambert@primenet.com>
Subject: Re: filesystem safety and SCSI disk write caching
Cc: Don.Lewis@tsc.tdk.com, freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Oct 13,  6:00pm, "Justin T. Gibbs" wrote:
} Subject: Re: filesystem safety and SCSI disk write caching

} >> I'd be more than happy to reproduce your failure scenario
} >> while recording a SCSI bus trace so that the fault is easy to interpret.
} >> Just send me any *modern* drive that you think fails.
} >
} >Sure; just define "modern" for me, since my personal definition is
} >"not IDE".
} 
} A drive manufactured within the last 3 years.

The drive in my experiment is:

	da0 at ahc0 bus 0 target 0 lun 0
	da0: <SEAGATE ST32151N 0590> Fixed Direct Access SCSI2 device 
	da0: 10.0MB/s transfers (10.0MHz, offset 15), Tagged Queueing Enabled
	da0: 2049MB (4197405 512 byte sectors: 255H 63S/T 261C)

I don't know when it was manufactured, but I've only had access to it
for less than a year.

} >> You should also ensure that your reset button does not cause any power
} >> spikes on the drive power lines.  That would be cheating.
} >
} >It doesn't, since "# of anomalies == 0" with write caching disabled.
} 
} This doesn't follow.  If the cache is disabled, it doesn't matter if
} the drive loses power due to hitting the reset button.  We already 
} know that losing power on a drive that cached data will not work.

I didn't hear the sound of any mechanical things spinning down.  The
machine just started going through its boot sequence.

} >> I'm still unclear as to whether Don was turning off power or hitting what I
} >> consider the reset button.  His comment about UPSes use makes me think he
} >> was testing power outage scenarios.
} >
} >Well, I know that this might sound insane, but we could ask Don, and
} >I could get out of the middle of this whole thing... ;-).
} 
} Well, if your offering, I'd be more than happy to take you up on your
} offer.

I was only playing with the reset button.  Justin was the first to mention
an UPS as the solution.

} >> Since you were able to test 4 drives so quickly, I'd love to see well
} >> documented information on exactly how the file system was inconsistent
} >> in the failure cases.
} >
} >There were directory dependencies which were committed out of order
} >(the modified fsck reports these as soft dependency errors...).
} 
} Can you be more specific?  Are you positive that the transactions
} were committed out of order or could it be that some transactions
} were never committed at all?  What was the size of the directory.
} Was the failure in directory creation or destruction?  Which portion
} of the dependency graph was violated?

The symptom was a directory entry that referenced an unallocated inode.

When creating a new file (or adding a link), softupdates writes the new
inode to disk (and waits for the driver to tell it the write is complete)
before writing the block containing the new directory entry to disk.
When doing an unlink, softupdates clears the directory entry, writes
the directory block to disk, and waits for the driver to tell it the
directory block has been written before writing the inode to disk (either
cleared or just with decreased reference count as appropriate).

If the writes that actually ended up on the platters were done in the
correct order, the only inconsistency that could result from a failure
to commit all the transactions would be an unreferenced inode on disk
that might get reconnected under lost+found by fsck.  The problem I
saw indicates that the writes to the platters were done in the wrong
order and not all of them were completed.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message