Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 25 Jan 2002 17:29:20 -0800 (PST)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Terry Lambert <tlambert2@mindspring.com>
Cc:        Julian Elischer <julian@elischer.org>, Robert Watson <rwatson@FreeBSD.ORG>, Poul-Henning Kamp <phk@critter.freebsd.dk>, Kirk McKusick <mckusick@mckusick.com>, Rik van Riel <riel@conectiva.com.br>, freebsd-hackers@FreeBSD.ORG, David Xu <bsddiy@21cn.com>
Subject:   Re: vm balance
Message-ID:  <200201260129.g0Q1TKU18106@apollo.backplane.com>
References:  <Pine.NEB.3.96L.1010418111539.2462G-100000@fledge.watson.org> <3ADDCE50.132B9F5D@elischer.org> <3C51DBF9.FE0B3A43@mindspring.com>

next in thread | previous in thread | raw e-mail | index | archive | help

:
:Julian Elischer wrote:
:> Actually there have been times when I did want to mmap a datastream..
:> I think a datastream mapped into a user buffer-space is one of the
:> possible 0-copy methods people sometimes mention.
:
:This is ugly.  There are prettier ways of doing it.
:
:-- Terry

    Considering that a number of failed attempts have already been made to
    optimize standard read()/write() calls, and that mmap() isn't really
    all that well suited to a datastream, I would be inclined to develop
    a set of system calls to deal with 0-copy streams.

    I did something similar in one of my embedded OS's.  It could actually
    apply to normal files as easily as to pipes and both UDP and TCP
    data streams, and would not require any fancy splitting of network
    headers verses the data payload.  It gives the OS the ultimate
    flexibility in managing 0-copy buffer spaces.

    actual = readptr(int fd, void **ptr, int bytes);

	Attempt to read 'bytes' bytes of data from the descriptor.  The
	operating system will map the data read-only and supply a pointer
	to the base of the buffer (which may or may not be page-aligned).

	The actual number of bytes available is returned.  actual < bytes
	does NOT signify EOF, because the OS may have other limitations
	such as having to return piecemeal mbufs, skip packet headers,
	and so forth.

	The data will remain valid until the next readptr(), read(), or
	lseek() call on the descriptor or until the descriptor is closed.
	You can inform the OS that you have read all the data by calling
	readptr(fd, NULL, 0) (i.e. if this is a TCP connection this would
	allow TCP to reclaim the related mbufs).

	The OS typically leaves the mapped space mapped for efficiency,
	but the only valid data exists within the specific portion
	represented by your last readptr() call.  The OS is free to reuse
	its own mappings at any time as long as it leaves the data it has
	guarenteed to be valid in place.

    avail = writeptr(int fd, void **ptr, int bytes);

	Request buffer space to write 'bytes' bytes of data.  The OS will
	map appropriate buffer space and return a pointer to it.  This
	procedure returns the actual number of bytes that may be written 
	into the returned buffer.  The OS may limit the available buffer
	space to fit mbuf/MTU requirements on a TCP connection or for
	other reasons.

	You should fill the buffer with 'avail' bytes and call writeptr()
	again to commit your buffer.  Calling lseek() or write() will abort
	the buffer.  You can commit your last writeptr() by calling
	writeptr(fd, NULL, 0).

	Close()ing the descriptor without comitting the buffer will result
	in the loss of the buffer.

    note: readptr() and writeptr() do not interfere with each other
    when operating on streams, but one will abort the other when
    operating on files due to the seek position changing.


				IOCTL's

    ioctl(fd, IOPTR_WABORT, bytes);

	Abort <bytes> worth of a previously reserved write buffer.
	Passing -1 aborts the entire buffer.

    ioctl(fd, IOPTR_WCOMMIT, bytes);

	Commit <bytes> bytes worth of a previously reserved write buffer,
	aborting any remainder after that.  Passing -1 commits the 
	entire 'avail' space.

	This can be used to reserve a large write buffer and then commit
	a smaller data set.  For example, a web server can reserve a
	4K response buffer but only commit the actual length of the
	response.

    ioctl(fd, IOPTR_WCLEAR, 0);

	Abort any previously reserved write buffer and force the OS
	to unmap any cached memory space associated with writeptr().

    ioctl(fd, IOPTR_RABORT, bytes);

	Abort any previously returned read buffer, allowing the OS
	to reclaim the buffer space if it wishes (especially useful
	for TCP connections which might have to hold onto mbufs).
	<bytes> bytes are aborted.  Passing -1 aborts the entire buffer.

    ioctl(fd, IOPTR_RCLEAR, 0);

	Abort any previously reserved write buffer and force the OS
	to unmap any cached memory space associated with readptr().

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200201260129.g0Q1TKU18106>