From owner-freebsd-current Sat Sep 19 01:02:34 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id BAA08614 for freebsd-current-outgoing; Sat, 19 Sep 1998 01:02:34 -0700 (PDT) (envelope-from owner-freebsd-current@FreeBSD.ORG) Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id BAA08609 for ; Sat, 19 Sep 1998 01:02:32 -0700 (PDT) (envelope-from tlambert@usr08.primenet.com) Received: (from daemon@localhost) by smtp04.primenet.com (8.8.8/8.8.8) id BAA22445; Sat, 19 Sep 1998 01:02:08 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp04.primenet.com, id smtpd022429; Sat Sep 19 01:02:02 1998 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id BAA19430; Sat, 19 Sep 1998 01:02:00 -0700 (MST) From: Terry Lambert Message-Id: <199809190802.BAA19430@usr08.primenet.com> Subject: Re: softupdates & fsck To: Don.Lewis@tsc.tdk.com (Don Lewis) Date: Sat, 19 Sep 1998 08:02:00 +0000 (GMT) Cc: current@FreeBSD.ORG In-Reply-To: <199809190520.WAA09949@salsa.gv.tsc.tdk.com> from "Don Lewis" at Sep 18, 98 10:20:40 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > While doing various evil things with a machine running current, I've managed > to get it to panic a number of times. That and getting some filesystem > damage as a result don't bother me, but I am bothered by the type of > filesystem damage that I'm seeing. When fsck runs at boot time, it finds > a number of orphaned directories, which it reconnects in lost+found. > For some reason, they end up with their link count being too low. If > I try to "rm -r" them, they are emptied of their contents, except for > "." and "..", at which point they are unremovable because their link > count is 1 instead of 2. I've also seen directories elsewhere in the > tree end up with a link count that's too low. If I unmount the filesystem > and run fsck again, fsck notices the problem, reports "UNEXPECTED SOFTDEP > INCONSISTENCY", and fixes the problem. There may be files with the > wrong link count as well. > > My suspicion is that the first fsck run is getting the link counts wrong > when it repairs the filesystem. I've taked a look at the fsck code, but > haven't gotten too far, mostly because the code is so well commented -- NOT! The theory behind soft updates is that things will be atomically committed in dependency order. For our purposes here, the fact that no on disk structure about which atomicity guarantees must be made spans a 512 byte boundary is of significance. Because of this, there is no modern hardware known that will not guarantee to write either all/none of a 512b region (one atomic disk block). As a result, this means that if soft updates is working correctly, the *only* type of error that can occur, and need to be corrected by fsck, is a cylinder group bitmap inconsistency, and, in fact, a cylinder group bitmap inconsistency that results in a bit being set (marked allocated) when, in fact it is not. This means that if you could lock down access to a cylinder group in the FS code for the dureation of a bitmap consistency check, you could do the necessary repairs following a power failure *while the FS was online*. In other words, fsck is unnecessary, except to deal with the fact that a cleaner daemon has not been written, and the possibility of physical hardware failure. That you are seeing these problems implies that the bwrite ordering guarantees that the driver must provide (i.e., that the blocks will be written in the order requested, and that the writes will not return as completed until the data has been committed to the disk) are not being honored. >From recent postings, it seems that CAM is not honoring the ordering guarantees that the previous driver code honored. If your problem is occurring on a non-CAM system, you should contact Julian Elisher with a detailed list of the errors you are getting; if it is occurring with a CAM system, you need to contact Justin Gibbs and the other CAM authors about *not* reordering blocking write requests. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message