From owner-freebsd-fs Wed Mar 13 11: 1:18 2002 Delivered-To: freebsd-fs@freebsd.org Received: from snipe.prod.itd.earthlink.net (snipe.mail.pas.earthlink.net [207.217.120.62]) by hub.freebsd.org (Postfix) with ESMTP id E54B337B400 for ; Wed, 13 Mar 2002 11:01:12 -0800 (PST) Received: from pool0082.cvx21-bradley.dialup.earthlink.net ([209.179.192.82] helo=mindspring.com) by snipe.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16lDzq-0006V8-00; Wed, 13 Mar 2002 11:01:10 -0800 Message-ID: <3C8FA1E4.A89F52FF@mindspring.com> Date: Wed, 13 Mar 2002 11:00:52 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Parity Error Cc: freebsd-fs@FreeBSD.org Subject: Re: metadata update durability ordering/soft updates References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Parity Error wrote: > with soft-updates metadata updates are delayed write. I am > wondering if, say there are two independent structural changes, > one after another, and then a crash happens. > > Is there a possibility that the latter structural change got > written to disk before the former due to some memory replacement > policy ? Independent writes are independent, by definition. They are permitted to occur in either order. Metadata updates are only ordered by soft updates insofar as necessary to satify dependencies. Thus indepependent writes can occur in any order, but will *usually* occur in order, due to the way that a scheduled write can not be reordered once it is given to the disk controller. This is due to a locking issue on the disk operations queue in the driver, and is arguably a bug. It's likely that some work currently in progress will forceed to the point that the "likely ordering" of independent operations will "go away in the future, so you can't even safely depend on it being likely. This is normally an issue only for updates that do things like update both an index and a record file, and imply a dependency order in the operation. In other words, there is implied metadata between the two files, and therefore an implied dependency. It's the application's responsibility to signal the dependency to the OS, so that the updates are ordered. The normal way to do this is to use a two stage commit operation (per standard database theoury, Circa IBM, 1965). In UNIX this is done by requesting that the first operation be committed, before making the request to begin the second operation (e.g. a software barrier instruction). To find out more about this, you should use "man fsync" and "man open" (in the "open" page, look for "O_FSYNC"). As to misordering of dependent writes, even if you use synchronous I/O properly... Yes, this can happen due to the memory replacement policy on many IDE hard drives, which lie about data having been committed to stable storage, when in fact it has only been written to the disk write cache, which is far from stable storage, being as it's not battery backed, and it is not guaranteed to be written to the disk after a power failure, except on some IBM and Quantum drives which are no longer manufactured. You can ensure this doesn't happen to you by using only disks which can correctly support cache flush primitives and tagged command queues, or disabling write caching on the device. SCSI devices don't have this problem. Another potential problem is that some IDE disks will acknowledge disabling write caching, but will in fact not disable it, no matter what commands you spit at them. For some of these disks, there are firmware updates available, but if you are unlucky enough to own one of these disks, then there is usually no option but to buy a good disk instead. May I recommend SCSI? > could this affect the correctness of some applications ? The disk caching issue could. The implied metadata could not. If you have an application that uses implied metadata, but does not take the necessary steps for UNIX to ensure that the OS is signalled about the implied ordering dependency, then by definition, your application can't have it's correctness effected... since it has no correctness to lose. 8-). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message