From owner-freebsd-stable@FreeBSD.ORG Thu Apr 17 09:55:42 2003 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0E18537B401; Thu, 17 Apr 2003 09:55:42 -0700 (PDT) Received: from puffin.mail.pas.earthlink.net (puffin.mail.pas.earthlink.net [207.217.120.139]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3FA3943FE1; Thu, 17 Apr 2003 09:55:41 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0064.cvx21-bradley.dialup.earthlink.net ([209.179.192.64] helo=mindspring.com) by puffin.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 196Cff-0005dn-00; Thu, 17 Apr 2003 09:55:36 -0700 Message-ID: <3E9EDC38.1CE381C6@mindspring.com> Date: Thu, 17 Apr 2003 09:54:16 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Marko Zec References: <200304162310.aa96829@salmon.maths.tcd.ie> <3E9E9827.4BB19197@tel.fer.hr> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4c5c06b35ece679a4cfdfcaf6b4f66f3993caf27dac41a8fd350badd9bab72f9c350badd9bab72f9c cc: freebsd-fs@freebsd.org cc: Ian Dowse cc: freebsd-stable@freebsd.org cc: Kirk McKusick Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Apr 2003 16:55:42 -0000 Marko Zec wrote: > Ian Dowse wrote: > > Note that the ATA "delayed write" mechanism only delays writes while > > the disk is spun down; at other times there is no change in behaviour. > > Since the disk only spins down after it has been idle for a time, > > it is very unlikely that the disk is left in an inconsistent state > > while it is stopped. I'm wondering if the ATA "delayed write" actually does this, or if it merely relaxes the cache restrictions, without retaining the ordering enforcement. I suspect that it does not retain the ordering enforcement, as there is no way to disconnect on a tagged queue write, because you must issue a request for status, and it can't be done as a seperate ATA operation (see the posts by the Maxtor employee, on and around January 20th of this year to the -FS list for details). You are much better off accumulating requests in the kernel in buffers, and then using the normal write mechanism to push them out to the drive ordered (IMO). This implies a barrier and new code above the bwrite interface, to keep the buffers from getting locked, and stalling you applications in user space. A problem I see here is that swap is on a totally different path, and in a different area of the disk (practically guaranteeing a seek, and a track buffer invalidation on the disk), even if you could cause swapping to be delayed (I don't think you can; FreeBSD aggressively uses memory, and so when you need to swap, you *need* to swap). > The OS _does_ know (approximately) when the disk is spinning and when not. > For example, if the disk is configured to stop spinning immediately after > the last I/O operation, the OS can safely assume 10 or more seconds > afterwards the spinning will be stopped. The OS only has to keep record (in > form of timestamp or something similar) when it has issued the last I/O > request to the disk. In my patch this is accomplished using the stratcalls > marker, which is increased every time the strategy routine of the ATA disk > driver is invoked. Therefore the OS can also successfully coalesce the > pending disk updates with other outstanding I/O disk operations, which are > typically reads of uncached sectors or VM swapping. This is useful, but not enough. You need to actually communicate the information above the block I/O layer, to the soft updates. I think, effectively, what you actually want to do is to stop the soft updates clock, rather than trying to play stupid disk tricks with timers, etc., above and beyond what you have to do. I can see it being useful on SCSI disks, as well, particularly where there are temperature issues. Though in that case, you probably are more memory starved than anything, and it will end up doing you no good. > I agree the ATA delayed writes is a great functionality that can help save > battery power. I don't; only if the write order is maintained is it "great". > I just want to point out that it can suffer from the same > consistency problems as the model of OS controlled delayed synching combined > with null fsync() processing. However, if the OS controls the delaying of > updates, you can turn on or off normal fsync() semantics as desired. With > delaying writes in ATA firmware, you simply do not have the choice :) I think people are confusing fsync() with syncd at this point. 8-(. -- Terry