Date: Fri, 25 Jan 2002 17:29:20 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Terry Lambert <tlambert2@mindspring.com> Cc: Julian Elischer <julian@elischer.org>, Robert Watson <rwatson@FreeBSD.ORG>, Poul-Henning Kamp <phk@critter.freebsd.dk>, Kirk McKusick <mckusick@mckusick.com>, Rik van Riel <riel@conectiva.com.br>, freebsd-hackers@FreeBSD.ORG, David Xu <bsddiy@21cn.com> Subject: Re: vm balance Message-ID: <200201260129.g0Q1TKU18106@apollo.backplane.com> References: <Pine.NEB.3.96L.1010418111539.2462G-100000@fledge.watson.org> <3ADDCE50.132B9F5D@elischer.org> <3C51DBF9.FE0B3A43@mindspring.com>
next in thread | previous in thread | raw e-mail | index | archive | help
: :Julian Elischer wrote: :> Actually there have been times when I did want to mmap a datastream.. :> I think a datastream mapped into a user buffer-space is one of the :> possible 0-copy methods people sometimes mention. : :This is ugly. There are prettier ways of doing it. : :-- Terry Considering that a number of failed attempts have already been made to optimize standard read()/write() calls, and that mmap() isn't really all that well suited to a datastream, I would be inclined to develop a set of system calls to deal with 0-copy streams. I did something similar in one of my embedded OS's. It could actually apply to normal files as easily as to pipes and both UDP and TCP data streams, and would not require any fancy splitting of network headers verses the data payload. It gives the OS the ultimate flexibility in managing 0-copy buffer spaces. actual = readptr(int fd, void **ptr, int bytes); Attempt to read 'bytes' bytes of data from the descriptor. The operating system will map the data read-only and supply a pointer to the base of the buffer (which may or may not be page-aligned). The actual number of bytes available is returned. actual < bytes does NOT signify EOF, because the OS may have other limitations such as having to return piecemeal mbufs, skip packet headers, and so forth. The data will remain valid until the next readptr(), read(), or lseek() call on the descriptor or until the descriptor is closed. You can inform the OS that you have read all the data by calling readptr(fd, NULL, 0) (i.e. if this is a TCP connection this would allow TCP to reclaim the related mbufs). The OS typically leaves the mapped space mapped for efficiency, but the only valid data exists within the specific portion represented by your last readptr() call. The OS is free to reuse its own mappings at any time as long as it leaves the data it has guarenteed to be valid in place. avail = writeptr(int fd, void **ptr, int bytes); Request buffer space to write 'bytes' bytes of data. The OS will map appropriate buffer space and return a pointer to it. This procedure returns the actual number of bytes that may be written into the returned buffer. The OS may limit the available buffer space to fit mbuf/MTU requirements on a TCP connection or for other reasons. You should fill the buffer with 'avail' bytes and call writeptr() again to commit your buffer. Calling lseek() or write() will abort the buffer. You can commit your last writeptr() by calling writeptr(fd, NULL, 0). Close()ing the descriptor without comitting the buffer will result in the loss of the buffer. note: readptr() and writeptr() do not interfere with each other when operating on streams, but one will abort the other when operating on files due to the seek position changing. IOCTL's ioctl(fd, IOPTR_WABORT, bytes); Abort <bytes> worth of a previously reserved write buffer. Passing -1 aborts the entire buffer. ioctl(fd, IOPTR_WCOMMIT, bytes); Commit <bytes> bytes worth of a previously reserved write buffer, aborting any remainder after that. Passing -1 commits the entire 'avail' space. This can be used to reserve a large write buffer and then commit a smaller data set. For example, a web server can reserve a 4K response buffer but only commit the actual length of the response. ioctl(fd, IOPTR_WCLEAR, 0); Abort any previously reserved write buffer and force the OS to unmap any cached memory space associated with writeptr(). ioctl(fd, IOPTR_RABORT, bytes); Abort any previously returned read buffer, allowing the OS to reclaim the buffer space if it wishes (especially useful for TCP connections which might have to hold onto mbufs). <bytes> bytes are aborted. Passing -1 aborts the entire buffer. ioctl(fd, IOPTR_RCLEAR, 0); Abort any previously reserved write buffer and force the OS to unmap any cached memory space associated with readptr(). -Matt Matthew Dillon <dillon@backplane.com> To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200201260129.g0Q1TKU18106>