Date: Thu, 23 Sep 1999 14:05:28 -0400 (EDT) From: Christopher Sedore <cmsedore@mailbox.syr.edu> To: Matthew Dillon <dillon@apollo.backplane.com> Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: mbufs, external storage, and MFREE Message-ID: <Pine.SOL.4.10.9909231350370.1589-100000@rodan.syr.edu> In-Reply-To: <199909231731.KAA28739@apollo.backplane.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 23 Sep 1999, Matthew Dillon wrote: > :I have the following question: Let's say that I have a block of user > :memory which I've mapped into the kernel, and would like to send on a > :network socket. I'd like to simply grab an mbuf, point to the memory as > :external storage, and queue it up for transmission. This would work fine, > :except that when MFREE gets called, I have to write an deallocator that > :maintains a table of all the different cases where I've done this, and do > :a reverse mapping back to the original block, and then deal with sending > :more, unmapping, etc. In other words, having MFREE call a deallocator > :with just the data pointer and the size is inconvenient (actually, it > :would make my scenario quite inefficient given the number of mappings back > :to the original block that would have to be done). > : > :Am I missing another mechanism to handle this? Does it not come up enough > :to matter? > : > :-Chris > > This is almost precisely the mechanism that the sendfile() system call > uses. In that case it maps VMIO-backed data rather then user memory, > but it is a very similar problem. > > There has been talk of implementing this type of mechanism not only for > sockets, but for file read()/write() as well. In fact, John Dyson had > delved into the issue with his vfs.ioopt stuff before he ran out of time. This is good--it seems a shame to copy things around all the time, though I'm not sure where the crossover is between copying and mapping into kernel space. (And, as a side note, what's up with struct buf? The thing is bloody huge if you only want to map user memory into kernel space :) > The one problem with using direct VM page mappings is that currently there > is no way for the socket to prevent the underlying data from being > modified in the middle of a transmission. And, in the same respect for > vfs.ioopt, no way to prevent the data the user ostensibly read() into > his 'private' buffer from changing out from under the user if the > underlying file is modified. Isn't this a case that the programmer has to handle? That is, if you mess with the data before it actually gets written, that's your problem. I take it that vfs.ioopt stuff is something like a temporary mmap() effect, since in the socket case once the data had been put in the buffer, I'd remove the kernel mapping and thus not be able to tweak it. > For user memory, the only way such a mechanism can currently be > implemented is by obtaining the underlying pages and busy'ing them > for the duration of their use by the system, causing anyone trying to > access them while the system operation is in progress to block. This > can cause a potential problem with TCP in that the mbuf data you send > to TCP sticks around until it gets pushed out the door *and* acknowledged > by the other end. i.e. the data is not disposed of as when read() or > write() returns but instead goes directly into TCP's outgoing queue. > If the TCP connection hangs, the process may hang. > I had been thinking about this in the context of async io operations, where its OK to have the operation not complete until the data has actually been ack'd by the remote end. With synchronous write() calls, this can be more problematic since it would significantly increase latency in cases where the original coder might not expect it. It might actually be nice to (optionally) have the same effect with async writes to disk, where the operation wouldn't actually complete until the data was known to be on the platter. -Chris To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.SOL.4.10.9909231350370.1589-100000>