Date: Mon, 5 Feb 2001 13:51:49 -0800 From: Alfred Perlstein <bright@wintelcom.net> To: Poul-Henning Kamp <phk@critter.freebsd.dk> Cc: "Justin T. Gibbs" <gibbs@scsiguy.com>, Randell Jesup <rjesup@wgate.com>, Matt Dillon <dillon@earth.backplane.com>, Matthew Jacob <mjacob@feral.com>, Mike Smith <msmith@FreeBSD.ORG>, Dag-Erling Smorgrav <des@ofug.org>, Dan Nelson <dnelson@emsphone.com>, Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>, arch@FreeBSD.ORG Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) Message-ID: <20010205135149.G26076@fw.wintelcom.net> In-Reply-To: <28962.981408816@critter>; from phk@critter.freebsd.dk on Mon, Feb 05, 2001 at 10:33:36PM %2B0100 References: <20010205132152.E26076@fw.wintelcom.net> <28962.981408816@critter>
next in thread | previous in thread | raw e-mail | index | archive | help
* Poul-Henning Kamp <phk@critter.freebsd.dk> [010205 13:33] wrote: > > >You're right, it's non-trivial, however the difference between > >memory and disk speed is also non-trivial, almost every reasonable > >algorithm should be considered to reduce/optimize disk traffic. > > > >A simple call into the VFS should be able to accomplish, afaik when > >a VFS has a disk/physical backing it also hashes/sorts bufs based > >on physicall backing location. Although I may be remebering stuff > >from 4.3BSD or 4.4BSD instead of the current code... > > It's not "a simple call". > > By the time you can make the call, you have passed through the > target FS, through specfs and the disklabel/slice code, possibly > through a layer like vinum and ccd (which may have their own ideas > about clustering) and only then do you arrive at a place where you > know the actual sector address of the request. > > We can quickly dismiss the ccd/vinum case by saying that they > have to cater for the needs of the lower devices, and they > specify the clustering policy "like any other disk". > > But you still have to contend with the diskslice/label code, and > specfs, so even if you do an "upcall" and find more stuff you can > read/write, you need to pass this bit of the request down through > the specfs (for softupdates rollback/forward) and diskslice/label > code (because you want boundary checking). > > And having tried that, I can say with 100% conviction: that is not > an sane option, and if you do it anyway you will certainly not > gain any performance by the time you have resolved all the locking > issues. Well, my impression was that all locking operation (except mutexes) should be resolved by doing try_lockfoo() and if try_lock fails then don't cluster that object/buf/vnode (as the current code does). You are right though, I guess we don't need callbacks into the VFS, this can be resolved with just the buffer system via flags and locks. > Giving some kind of abstract hint from the driver/device and making > the clustering optional for the driver is the only path which does > not lead straight down to layering insanity. I'm not sure I understand what you mean, my vision of the current code is: Kernel IO request triggered via FS/bufdeamon/etc | 1 buf cluster_foo | 1-N bufs (in a pbuf) device | write What I'd like to see (considering we don't need to really involve VFS) is: Kernel IO request triggered via FS/bufdeamon/etc | 1 buf device ---------> cluster routine (A) | / device <----------------/ | 1-N bufs (linked list, no pbuf) write This way the device can call into any number of generic clustering routines if it wants to support them. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010205135149.G26076>