Date: Mon, 20 Jan 2003 13:30:14 -0500 From: Steve Byan <stephen_byan@maxtor.com> To: freebsd-fs@FreeBSD.ORG Subject: Re: JFS vs. Soft Updates (again) (was: Re: large filesystem, journaling filesystem support) Message-ID: <37CA8FF0-2CA5-11D7-962B-00306548867E@maxtor.com> In-Reply-To: <3E27DA7F.D5DBEFB@mindspring.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Friday, January 17, 2003, at 05:27 AM, Terry Lambert wrote: > No, the worst case following a power failure is a screwed disk > track. I'm skeptical of this claim, unless you mean it in a way that strikes me a rather unusual. > Modern disk drives read and write a track at a time; this is to > avoid rotational latency that woul happen if you waited for a > hard "sector start" marker to come around, and it avoids the need > for "low level formatting". I'm familiar with drives which will re-order their queue of writes for a track (i.e. SCSI disks with write cache enabled, SCSI disks with command-queued writes without a "ordered task" tag, or ATA disks with caching enabled). But you seem to be implying by your mention of "avoiding rotationaly latency ... waiting for a ... sector start marker" and mention of "low level formatting" that there exists a modern SCSI or ATA disk which writes by simply blasting a whole new track whenever it writes, starting at the current rotational position. This would certainly open the possibility of making the remainder of the track unreadable. Perhaps one of Maxtor's competitors has such a disk, but I don't believe so, because the benchmark performance of such a disk would be abysmal due to the need to read/modify/write the entire track whenever a single sector changes. Are you saying that this whole-track-write mode happens conditionally, only if the queued writes happen to cover the entire track? Perhaps such a disk could be marketable, but the rotational latency advantages in this case are very small compared to the alternative of simply waiting until one of the sectors to be written comes under the head. I know that neither Maxtor's SCSI disks nor their ATA disks blast an entire track in one fell swoop. > For a very small window of time in > the late 1990's, two manufacturers, IBM and Quantum, created disk > drives which were capable of using rotational energy as a power > source (regenerative braking) to complete a write in progress, > following a DC failure (this provided a small post-failure > hold-up time. > > Modern disk drives no longer do this, because disk manufacturers > are morons (or one was a moron, and the others had to compete on > price, which amounts to the same thing). See below - changes made for higher capacity and higher RPM have made it impossible to use the regenerative braking trick on modern drives. > > The net result is that a DC failure can result in an entire track > getting trashed, if it happens at the right time. I'll agree that it can result in partial completion of a queue of writes, with the order of completion being essentially unknowable, and with at most one sector being corrupted, and hence having an invalid ECC (and therefor returning a hard error if read). If that is your definition of "trashing an entire track", I'll accept it. But if you are implying that more than one sector could be unreadable, or that any sector would return data that had not been written to it without giving an error indication, I disagree. The remaining sectors of the track may have new data or old data, depending on the disk scheduling algorithm, but they would not be "corrupt" in the sense of being unreadable, or of returning bogus data without also returning an error indication. If you wish to have writes complete to the media in the order in which you issued then, then you must either a) disable write caching and not use SCSI command queuing for ordered writes or b) enable write caching but do not use SCSI command queuing, and either b1) set the FUA bit in the SCSI CDB and not use command queuing for ordered writes, or b2) follow the ATA write command with a "flush cache" command or c) enable write caching and SCSI command queuing, but c1) set the FUA bit in the SCSI CDB and ensure the command has the "ordered task" attribute in its task tag, so that the command will not be reordered. Upon reflection, I suppose it is possible that if the DC voltage were to remain at the threshold for write-enable for an extended period of time and if the DC-low circuitry for the drive in question did not have hysteresis, then write-gate might toggle off and then back on a few times and as a result corrupt multiple sectors (all of which will show up as hard errors when read). But this would be the result of a design error, not a design intent, and would not apply to all makes and models of disk drives. I agree that it is a shame that drive manufacturers do not offer an "atomic write" feature for a sector. Convince the system manufacturers to supply a "power-fail" warning signal a few milliseconds in advance of the loss of DC power, and I think the drive manufacturers would be happy to provide an atomic write feature. We can no longer use the rotational energy in the platters to keep up the power, because the platter count and media diameter have both steadily decreased - as a result, there is no longer enough rotational inertial to provide the hold-up times needed. Note that it is this reduced platter count and smaller disks which has enabled 10K and 15K RPM disks within the power envelope allotted to a 3.5 inch disk drive. Regards, -Steve (not speaking officially for his employer) -------- Steve Byan <stephen_byan@maxtor.com> Design Engineer Maxtor Corp. MS 1-3/E23 333 South Street Shrewsbury, MA 01545 (508) 770-3414 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?37CA8FF0-2CA5-11D7-962B-00306548867E>
