Date: Fri, 25 Jan 2002 17:29:20 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Terry Lambert <tlambert2@mindspring.com> Cc: Julian Elischer <julian@elischer.org>, Robert Watson <rwatson@FreeBSD.ORG>, Poul-Henning Kamp <phk@critter.freebsd.dk>, Kirk McKusick <mckusick@mckusick.com>, Rik van Riel <riel@conectiva.com.br>, freebsd-hackers@FreeBSD.ORG, David Xu <bsddiy@21cn.com> Subject: Re: vm balance Message-ID: <200201260129.g0Q1TKU18106@apollo.backplane.com> References: <Pine.NEB.3.96L.1010418111539.2462G-100000@fledge.watson.org> <3ADDCE50.132B9F5D@elischer.org> <3C51DBF9.FE0B3A43@mindspring.com>
next in thread | previous in thread | raw e-mail | index | archive | help
:
:Julian Elischer wrote:
:> Actually there have been times when I did want to mmap a datastream..
:> I think a datastream mapped into a user buffer-space is one of the
:> possible 0-copy methods people sometimes mention.
:
:This is ugly. There are prettier ways of doing it.
:
:-- Terry
Considering that a number of failed attempts have already been made to
optimize standard read()/write() calls, and that mmap() isn't really
all that well suited to a datastream, I would be inclined to develop
a set of system calls to deal with 0-copy streams.
I did something similar in one of my embedded OS's. It could actually
apply to normal files as easily as to pipes and both UDP and TCP
data streams, and would not require any fancy splitting of network
headers verses the data payload. It gives the OS the ultimate
flexibility in managing 0-copy buffer spaces.
actual = readptr(int fd, void **ptr, int bytes);
Attempt to read 'bytes' bytes of data from the descriptor. The
operating system will map the data read-only and supply a pointer
to the base of the buffer (which may or may not be page-aligned).
The actual number of bytes available is returned. actual < bytes
does NOT signify EOF, because the OS may have other limitations
such as having to return piecemeal mbufs, skip packet headers,
and so forth.
The data will remain valid until the next readptr(), read(), or
lseek() call on the descriptor or until the descriptor is closed.
You can inform the OS that you have read all the data by calling
readptr(fd, NULL, 0) (i.e. if this is a TCP connection this would
allow TCP to reclaim the related mbufs).
The OS typically leaves the mapped space mapped for efficiency,
but the only valid data exists within the specific portion
represented by your last readptr() call. The OS is free to reuse
its own mappings at any time as long as it leaves the data it has
guarenteed to be valid in place.
avail = writeptr(int fd, void **ptr, int bytes);
Request buffer space to write 'bytes' bytes of data. The OS will
map appropriate buffer space and return a pointer to it. This
procedure returns the actual number of bytes that may be written
into the returned buffer. The OS may limit the available buffer
space to fit mbuf/MTU requirements on a TCP connection or for
other reasons.
You should fill the buffer with 'avail' bytes and call writeptr()
again to commit your buffer. Calling lseek() or write() will abort
the buffer. You can commit your last writeptr() by calling
writeptr(fd, NULL, 0).
Close()ing the descriptor without comitting the buffer will result
in the loss of the buffer.
note: readptr() and writeptr() do not interfere with each other
when operating on streams, but one will abort the other when
operating on files due to the seek position changing.
IOCTL's
ioctl(fd, IOPTR_WABORT, bytes);
Abort <bytes> worth of a previously reserved write buffer.
Passing -1 aborts the entire buffer.
ioctl(fd, IOPTR_WCOMMIT, bytes);
Commit <bytes> bytes worth of a previously reserved write buffer,
aborting any remainder after that. Passing -1 commits the
entire 'avail' space.
This can be used to reserve a large write buffer and then commit
a smaller data set. For example, a web server can reserve a
4K response buffer but only commit the actual length of the
response.
ioctl(fd, IOPTR_WCLEAR, 0);
Abort any previously reserved write buffer and force the OS
to unmap any cached memory space associated with writeptr().
ioctl(fd, IOPTR_RABORT, bytes);
Abort any previously returned read buffer, allowing the OS
to reclaim the buffer space if it wishes (especially useful
for TCP connections which might have to hold onto mbufs).
<bytes> bytes are aborted. Passing -1 aborts the entire buffer.
ioctl(fd, IOPTR_RCLEAR, 0);
Abort any previously reserved write buffer and force the OS
to unmap any cached memory space associated with readptr().
-Matt
Matthew Dillon
<dillon@backplane.com>
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200201260129.g0Q1TKU18106>
