From owner-freebsd-arch Mon Feb 5 13:24:29 2001 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id B246C37B401; Mon, 5 Feb 2001 13:24:07 -0800 (PST) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id f15LLr011092; Mon, 5 Feb 2001 13:21:53 -0800 (PST) Date: Mon, 5 Feb 2001 13:21:52 -0800 From: Alfred Perlstein To: Poul-Henning Kamp Cc: "Justin T. Gibbs" , Randell Jesup , Matt Dillon , Matthew Jacob , Mike Smith , Dag-Erling Smorgrav , Dan Nelson , Seigo Tanimura , arch@FreeBSD.ORG Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) Message-ID: <20010205132152.E26076@fw.wintelcom.net> References: <20010205124707.Y26076@fw.wintelcom.net> <28618.981406901@critter> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <28618.981406901@critter>; from phk@critter.freebsd.dk on Mon, Feb 05, 2001 at 10:01:41PM +0100 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Poul-Henning Kamp [010205 13:01] wrote: > In message <20010205124707.Y26076@fw.wintelcom.net>, Alfred Perlstein writes: > > >One of the suggestions that Poul-Henning made was to have the device > >somehow specify an optimal clustering strategy, being able to specify > >bounds and sizes. > > > >[...] > > > >Currently (i think) we only cluster based on logical file offsets, > >it would be interesting to allow drivers to do callbacks into the > >FS to ask for blocks physically adjacent to the blocks being written. > > I've been playing with various ideas in this area, and to be frank, > totally failed to come up with a breakthrough. > > Give methods like striping and RAID-5, it becomes nontrivial to > find a specification language for the driver to say "it would be > quick to write the following blocks also" and it would be even > slower to determine if this was indeed feasible. You're right, it's non-trivial, however the difference between memory and disk speed is also non-trivial, almost every reasonable algorithm should be considered to reduce/optimize disk traffic. A simple call into the VFS should be able to accomplish, afaik when a VFS has a disk/physical backing it also hashes/sorts bufs based on physicall backing location. Although I may be remebering stuff from 4.3BSD or 4.4BSD instead of the current code... In fact if it is stored and hashed in the bufs you really don't need a callback into the VFS, you just need a generic function to call that gathers physically contig blocks that are dirty, unlocked and actually contiguous. > "feasible" covers not only "do we have it in RAM", but also "is it > already scheduled for writing", "is it dirty" and not the least > "would softupdates take a fit if we wrote it". This is why callbacks into the VFS are probably a good idea along with a generic function that accomplishes what we currently do, except without the vm-remapping into the pbuf. (use a linked chain of bufs instead) > The best I have been able to do so far is if the device-driver > can specify the following quantities: > > (M) maxmimum request size > (R) preferred request size > (B) preferred request sector boundary > > The clustering code would then try to increase request to: > > N * R sectors starting X > where X mod B == 0 > and N * R <= M > > Having found a cluster opportunity, the cluster code will > issue the read/write request specifying: > > (E) First possible sector in request > (S) First mandatory sector in request > (L) Last mandatory sector in request > (F) Lase possible sector in request > (B) Sector address of (S) on media. > > The driver has to process the data from [S ... L], > and can optionally process [E...S[ and ]L...F] if > that seems convenient. Well, there's some assertions and questions I have about this: 1) a device should not refuse to write a block unless there's an error, meaning if 'S' can't be satisfied, it should at least write the single block out. I think S & L pretty much have to be equal to each other otherwise we can have tricky issues to deal with there S through L never become clusterable (they are locked for long periods, or just clean) 2) the device should be able to allow a certain amount of fragmentation, currently (afaik) the clustering code does not tolerate gaps, clean bufs and locked bufs within the request, this ought to be changed, there's no reason why a request really needs to be completely contiguous as the really painful part of disk io, is the seek, being able to cluster data with gaps on the same track/cyl is much more important than not having any breaks in it at all. 3) with #2, it would be important to specify a tolerance for such 'holes' in the cluster operation in case the device does have a penalty for gaps. > If somebody is looking for a good project, benchmarking > the performance of our current clustering and playing > around with various changes would not be the worst > way to spend some winter evenings. Playing with FFS/UFS > options (block/fragment etc) at the same time may be > worth while. Actually, I'm not looking for a project, I'm looking for time. :) -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message