FreeBSD Mail Archives

Date:      Mon, 20 Jan 2003 13:30:14 -0500
From:      Steve Byan <stephen_byan@maxtor.com>
To:        freebsd-fs@FreeBSD.ORG
Subject:   Re: JFS vs. Soft Updates (again) (was: Re: large filesystem, journaling  filesystem support)
Message-ID:  <37CA8FF0-2CA5-11D7-962B-00306548867E@maxtor.com>
In-Reply-To: <3E27DA7F.D5DBEFB@mindspring.com>

On Friday, January 17, 2003, at 05:27  AM, Terry Lambert wrote:

> No, the worst case following a power failure is a screwed disk
> track.

I'm skeptical of this claim, unless you mean it in a way that strikes 
me a rather unusual.

> Modern disk drives read and write a track at a time; this is to
> avoid rotational latency that woul happen if you waited for a
> hard "sector start" marker to come around, and it avoids the need
> for "low level formatting".

I'm familiar with drives which will re-order their queue of writes for 
a track (i.e. SCSI disks with write cache enabled, SCSI disks with 
command-queued writes without a "ordered task" tag, or ATA disks with 
caching enabled). But you seem to be implying by your mention of 
"avoiding rotationaly latency ... waiting for a ... sector start 
marker" and mention of "low level formatting" that there exists a 
modern SCSI or ATA disk which writes by simply blasting a whole new 
track whenever it writes, starting at the current rotational position. 
This would certainly open the possibility of making the remainder of 
the track unreadable. Perhaps one of Maxtor's competitors has such a 
disk, but I don't believe so, because the benchmark performance of such 
a disk would be abysmal due to the need to read/modify/write the entire 
track whenever a single sector changes.

Are you saying that this whole-track-write mode happens conditionally, 
only if the queued writes happen to cover the entire track? Perhaps 
such a disk could be marketable, but the rotational latency advantages 
in this case are very small compared to the alternative of simply 
waiting until one of the sectors to be written comes under the head.

I know that neither Maxtor's SCSI disks nor their ATA disks blast an 
entire track in one fell swoop.

> For a very small window of time in
> the late 1990's, two manufacturers, IBM and Quantum, created disk
> drives which were capable of using rotational energy as a power
> source (regenerative braking) to complete a write in progress,
> following a DC failure (this provided a small post-failure
> hold-up time.
>
> Modern disk drives no longer do this, because disk manufacturers
> are morons (or one was a moron, and the others had to compete on
> price, which amounts to the same thing).

See below - changes made for higher capacity and higher RPM have made 
it impossible to use the regenerative braking trick on modern drives.

>
> The net result is that a DC failure can result in an entire track
> getting trashed, if it happens at the right time.

I'll agree that it can result in partial completion of a queue of 
writes, with the order of completion being essentially unknowable, and 
with at most one sector being corrupted, and hence having an invalid 
ECC (and therefor returning a hard error if read).

If that is your definition of "trashing an entire track", I'll accept 
it. But if you are implying that more than one sector could be 
unreadable, or that any sector would return data that had not been 
written to it without giving an error indication, I disagree. The 
remaining sectors of the track may have new data or old data, depending 
on the disk scheduling algorithm, but they would not be "corrupt" in 
the sense of being unreadable, or of returning bogus data without also 
returning an error indication.

If you wish to have writes complete to the media in the order in which 
you issued then, then you must either
a) disable write caching and not use SCSI command queuing for ordered 
writes
or
b) enable write caching but do not use SCSI command queuing, and either
b1) set the FUA bit in the SCSI CDB and not use command queuing for 
ordered writes, or
b2) follow the ATA write command with a "flush cache" command
or
c) enable write caching and SCSI command queuing, but
c1) set the FUA bit in the SCSI CDB and ensure the command has the 
"ordered task" attribute in its task tag, so that the command will not 
be reordered.

Upon reflection, I suppose it is possible that if the DC voltage were 
to remain at the threshold for write-enable for an extended period of 
time and if the DC-low circuitry for the drive in question did not have 
hysteresis, then write-gate might toggle off and then back on a few 
times and as a result corrupt multiple sectors (all of which will show 
up as hard errors when read). But this would be the result of a design 
error, not a design intent, and would not apply to all makes and models 
of disk drives.

I agree that it is a shame that drive manufacturers do not offer an 
"atomic write" feature for a sector. Convince the system manufacturers 
to supply a "power-fail" warning signal a few milliseconds in advance 
of the loss of DC power, and I think the drive manufacturers would be 
happy to provide an atomic write feature. We can no longer use the 
rotational energy in the platters to keep up the power, because the 
platter count and media diameter have both steadily decreased - as a 
result, there is no longer enough rotational inertial to provide the 
hold-up times needed. Note that it is this reduced platter count and 
smaller disks which has enabled 10K and 15K RPM disks within the power 
envelope allotted to a 3.5 inch disk drive.

Regards,
-Steve (not speaking officially for his employer)
--------
Steve Byan <stephen_byan@maxtor.com>
Design Engineer
Maxtor Corp.
MS 1-3/E23
333 South Street
Shrewsbury, MA 01545
(508) 770-3414

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?37CA8FF0-2CA5-11D7-962B-00306548867E>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation