Date: Mon, 20 Mar 2000 10:46:15 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Poul-Henning Kamp <phk@critter.freebsd.dk> Cc: current@FreeBSD.ORG Subject: Re: patches for test / review Message-ID: <200003201846.KAA70820@apollo.backplane.com> References: <19790.953575942@critter.freebsd.dk>
next in thread | previous in thread | raw e-mail | index | archive | help
:Thanks for the sketch. It sounds really good. : :Is it your intention that drivers which cannot work from the b_pages[] :array will call to map them into VM, or will a flag on the driver/dev_t/ :whatever tell the generic code that it should be mapped before calling :the driver ? : :What about unaligned raw transfers, say a raw CD read of 2352 bytes :from userland ? I pressume we will need an offset into the first :page for that ? Well, let me tell you what the fuzzy goal is first and then maybe we can work backwards. Eventually all physical I/O needs a physical address. The quickest way to get to a physical address is to be given an array of vm_page_t's (which can be trivially translated to physical addresses). The buffer cache already has such an array, called b_pages[]. Any I/O that runs through b_data or runs through a uio must eventually be cut up into blocks of contiguous physical addresses. What we want to do is to try to extend VMIO (aka the vm_page_t) all the way through the I/O system - both VFS and DEV I/O, in order to remove all the nasty back and forth translations. In regards to raw devices I originally envisioned having two BUF_*() strategy calls - one that uses a page array, and one that uses b_data. But your idea below - using bio_ops[], is much better. In regards to odd block sizes and offsets the real question is whether an attempt should be made to translate UIO ops into buffer cache b_pages[] ops directly, maintaining offsets and odd sizes, or whether we should back-off to a copy scheme where we allocate b_pages[] for oddly sized uio's and then copy the data to the uio buffer. My personal preference is to not pollute the VMIO page-passing mechanism with all sorts of fields to handle weird offsets and sizes. Instead we ought to take the copy hit for the non-optimal cases, and simply fix all the programs doing the accesses to pass optimally aligned buffers. For example, for a raw-I/O on an audio CD track you would pass a page-aligned buffer with a request size of at least a page (e.g. 4K on IA32) in your read(), and the raw device would return '2352' as the result and the returned data would be page-aligned. This would allow the system call to use the b_pages[] strategy entry point even for devices with odd sizes and still get optimal (zero-copy) operation. If the user passes a non-aligned (or mulitiple of a page-sized) buffer, the system takes the copy hit in order to keep the lower level I/O interface clean. :One thing I would like to see is for the buffers to know how to :write themselves. There is nothing which mandates that a buffer :be backed by a disk-like device, and there are uses for buffers :which aren't. : :Being able to say bp->bop_write(bp) rather than bwrite(bp) would :allow that flexibility. Kirk already introduced a bio_ops[] but :made it global for now, that should be per buffer and have all the :bufferops in it, (except for the onces which instantiate the buffer). : :If we had this, pseudo filesystems like DEVFS could use UFS for :much of their naming management. This is currently impossible. : :-- :Poul-Henning Kamp FreeBSD coreteam member :phk@FreeBSD.ORG "Real hackers run -current on their laptop." :FreeBSD -- It will take a long time before progress goes too far! I like the idea of dynamicizing bio_ops[] and using that to issue struct buf based I/O. It fits very nicely into the general idea of separating the VFS and DEV I/O interfaces (they are currently hopelessly intertwined). Actually, the more I think about it the more I'm willing to just say to hell with it and start doing all the changes all at once, in parallel, including the two patches you wanted reviewed earlier (though I would request that you not combine disparate patch funcitonalities into a single patch set). I agree with Julian on the point about IPSEC. Dynamicizing bio_ops[] ought to be trivial. -Matt Matthew Dillon <dillon@backplane.com> To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200003201846.KAA70820>