Date: Mon, 5 Feb 2001 13:21:52 -0800 From: Alfred Perlstein <bright@wintelcom.net> To: Poul-Henning Kamp <phk@critter.freebsd.dk> Cc: "Justin T. Gibbs" <gibbs@scsiguy.com>, Randell Jesup <rjesup@wgate.com>, Matt Dillon <dillon@earth.backplane.com>, Matthew Jacob <mjacob@feral.com>, Mike Smith <msmith@FreeBSD.ORG>, Dag-Erling Smorgrav <des@ofug.org>, Dan Nelson <dnelson@emsphone.com>, Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>, arch@FreeBSD.ORG Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) Message-ID: <20010205132152.E26076@fw.wintelcom.net> In-Reply-To: <28618.981406901@critter>; from phk@critter.freebsd.dk on Mon, Feb 05, 2001 at 10:01:41PM %2B0100 References: <20010205124707.Y26076@fw.wintelcom.net> <28618.981406901@critter>
next in thread | previous in thread | raw e-mail | index | archive | help
* Poul-Henning Kamp <phk@critter.freebsd.dk> [010205 13:01] wrote: > In message <20010205124707.Y26076@fw.wintelcom.net>, Alfred Perlstein writes: > > >One of the suggestions that Poul-Henning made was to have the device > >somehow specify an optimal clustering strategy, being able to specify > >bounds and sizes. > > > >[...] > > > >Currently (i think) we only cluster based on logical file offsets, > >it would be interesting to allow drivers to do callbacks into the > >FS to ask for blocks physically adjacent to the blocks being written. > > I've been playing with various ideas in this area, and to be frank, > totally failed to come up with a breakthrough. > > Give methods like striping and RAID-5, it becomes nontrivial to > find a specification language for the driver to say "it would be > quick to write the following blocks also" and it would be even > slower to determine if this was indeed feasible. You're right, it's non-trivial, however the difference between memory and disk speed is also non-trivial, almost every reasonable algorithm should be considered to reduce/optimize disk traffic. A simple call into the VFS should be able to accomplish, afaik when a VFS has a disk/physical backing it also hashes/sorts bufs based on physicall backing location. Although I may be remebering stuff from 4.3BSD or 4.4BSD instead of the current code... In fact if it is stored and hashed in the bufs you really don't need a callback into the VFS, you just need a generic function to call that gathers physically contig blocks that are dirty, unlocked and actually contiguous. > "feasible" covers not only "do we have it in RAM", but also "is it > already scheduled for writing", "is it dirty" and not the least > "would softupdates take a fit if we wrote it". This is why callbacks into the VFS are probably a good idea along with a generic function that accomplishes what we currently do, except without the vm-remapping into the pbuf. (use a linked chain of bufs instead) > The best I have been able to do so far is if the device-driver > can specify the following quantities: > > (M) maxmimum request size > (R) preferred request size > (B) preferred request sector boundary > > The clustering code would then try to increase request to: > > N * R sectors starting X > where X mod B == 0 > and N * R <= M > > Having found a cluster opportunity, the cluster code will > issue the read/write request specifying: > > (E) First possible sector in request > (S) First mandatory sector in request > (L) Last mandatory sector in request > (F) Lase possible sector in request > (B) Sector address of (S) on media. > > The driver has to process the data from [S ... L], > and can optionally process [E...S[ and ]L...F] if > that seems convenient. Well, there's some assertions and questions I have about this: 1) a device should not refuse to write a block unless there's an error, meaning if 'S' can't be satisfied, it should at least write the single block out. I think S & L pretty much have to be equal to each other otherwise we can have tricky issues to deal with there S through L never become clusterable (they are locked for long periods, or just clean) 2) the device should be able to allow a certain amount of fragmentation, currently (afaik) the clustering code does not tolerate gaps, clean bufs and locked bufs within the request, this ought to be changed, there's no reason why a request really needs to be completely contiguous as the really painful part of disk io, is the seek, being able to cluster data with gaps on the same track/cyl is much more important than not having any breaks in it at all. 3) with #2, it would be important to specify a tolerance for such 'holes' in the cluster operation in case the device does have a penalty for gaps. > If somebody is looking for a good project, benchmarking > the performance of our current clustering and playing > around with various changes would not be the worst > way to spend some winter evenings. Playing with FFS/UFS > options (block/fragment etc) at the same time may be > worth while. Actually, I'm not looking for a project, I'm looking for time. :) -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010205132152.E26076>