Date: Thu, 31 Aug 1995 14:10:55 -0700 (MST) From: Terry Lambert <terry@lambert.org> To: rgrimes@gndrsh.aac.dev.com (Rodney W. Grimes) Cc: terry@lambert.org, pete@kesa26.kesa.com, jbryant@argus.iadfw.net, freebsd-hackers@FreeBSD.ORG, pete@rahul.net Subject: Re: 4GB Drives Message-ID: <199508312110.OAA23399@phaeton.artisoft.com> In-Reply-To: <199508312010.NAA12388@gndrsh.aac.dev.com> from "Rodney W. Grimes" at Aug 31, 95 01:10:34 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> You see, in modern workstation disk drives you have something called > spindle sync. Well, when you set up spindle sync you have 2 modeselect > values you tweak. One bit says who is the sync master and who are > the sync slaves. Then for each slave drive you tweak another value > that is used to offset the spindles from perfect sync so that the I/O > of block zero of a track on drive 0 of a stripe set has just finished > the scsi bus transfer when block zero of a track on drive 1 is about to > come under the heads. One assumes that stripes will not cross cylinder boundries in this case, since doing so would preterb a articular stripe but not all stripes, then? One also assumes that the head positioning on both drives is synchronized so as to induce any seek delays simultaneously? > I was in no way talking about ``rotdelay'' in the file system since, > I am still playing with raw devices at the block level, no file systems > have been built since the slice code kinda screwed me up for getting > labels on the things. Rotational delay refers to the location of the head relative to the sector address within the track, and is thus independent of file system code unless the file system itself attempts to compensate. Ideally, you'd want spindle-synced drives with identical geometries and knowledge of the sector offsets at which a seek will occur so that it can be avoided on both drives simultaneously. Finally, you'd want the rotation *advanced* by the stripe length -- a function of block placement on writes -- given that the advance in the rotation will force the entire stripe into cache as the drive begins reading before the end of the stripe with reverse ordered sectors. In effect, a file system wants to be a variable block store, and have the driver worry about issues like this and media perfections, etc. Most file systems are not written this way, even "advanced" file systems like vxfs (Veritas), hpfs, and ntfs. The net effect on this is that you can not guarantee stripes to be consecutive except for as many drives as you have in the set. > Already looking at those factors. I am given the fact that my drives > will be SCSI-II, will report the zone pages, etc. Without that stripe > sets are pretty stupid and can never be made to go fast. I have been > able to get to 85% of theorotical bandwidth, not bad, but want to sqeeze > that on up to 95% before I go looking at laying file systems on this > thing. I think that unless you do the logical equivalent of predictive branching (which it might be possible to precalculate at drive set initialization), you are going to be limited to an effective hash efficiency with an expotential fall-off at about 85% (Knuth: Sorting and Searching). The predictor you'd use to defeat this would be stripe prescheduling, for instance by precalculating values for skip lists rather than a pure hash. The 10% "reserve" in UFS is actually a soft hash-fill limit to keep it reasonably close to the hash cost/benefit falloff of 85% that was calculated by Taylor series expansion by Knuth. Again, UFS is only an example, as it was in rotdelay, since file systems shouldn't be doing this type of crap, it should be at the driver level. Another thing that you might want to play with is turning *off* SCSI sector replacement. This may seem counter-intuitive, but in fact you might be better off handling your own media perfection issues to ensure that you don't get an unexpected seek in a stripe set. You'd be better off avoiding the bad block entirely than replacing it and taking the replacement lookup hit. 8-). How do you deal with thermal variance? The "AV" drives don't try to compensate while they are "busy" and so are quite fragile in this regard. I haven't looked into what would be required to precompensate in the driver for recalibration delays, or if it's even something that's possible at all. It might be better in the long run to take the risk to avoid the delay if you really feel "the need for speed". > > Without the physical seek locations, any benchmarking will be rather > > arbitrary based on the layout you end up with for a particular test. > > To eliminate this very problem whilst I work on the technological ends > of things I am simply doing raw disk I/O starting at the same logical > drive on all spindles. Those do often end up in the same physical > location, and when I want the best numbers simply start at logical sector > 0 which will always be physically the same location on all spindles sans > whatever value I put into scsi mode page 4:Rotational Offset:. This would definitely ensure internal consistency; I was thinking more in terms of the results of particular stripe set builds, not necessarily the identical build each run. The results you get with the identical build will be drive/instance dependent even after spindle sync without seek optimization of some kind. I think going to the engineering lengths to implement every possible optimization is probably not worth it, though it's damn fun to try, or at least talk about. 8-). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199508312110.OAA23399>