Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 19 Jun 2000 13:43:29 -0400 (EDT)
From:      Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
To:        "Kenneth D. Merry" <ken@kdm.org>
Cc:        net@FreeBSD.ORG
Subject:   zero copy sockets and NFS code for FreeBSD
Message-ID:  <200006191743.NAA30993@khavrinen.lcs.mit.edu>
In-Reply-To: <20000616212545.A57840@panzer.kdm.org>
References:  <20000616212545.A57840@panzer.kdm.org>

next in thread | previous in thread | raw e-mail | index | archive | help
<<On Fri, 16 Jun 2000 21:25:45 -0600, "Kenneth D. Merry" <ken@kdm.org> said:

> Another problem with the current send side approach is that it requires
> page sized and page aligned data in order to apply the COW mapping.  Not
> all data sets fit this requirement.

Not necessarily.  There is some threshold length above which it makes
sense to flip the page, even if you are not transmitting the entire
page.  This is true even if the MTU is much less than a page (provided
you implement reference counting for m_copym et al).

Another possible optimization, which I've suggested a few times, is to
provide a flag on send() which allows the sender to say ``I'm through
with this page; please unmap it / give me a new ZFOD page'', which may
make sense for some kinds of applications (e.g., ftpd).

> One way to address both of the above problems is to implement an alternate
> zero copy send scheme that uses async I/O.  With async I/O semantics, it
> will be clear to the userland program that the buffer in question is not to
> be used until it is returned from the kernel.

As with most networking optimizations, programs will have to be
restructured in order to take maximal advantage -- no surprise there.

> One way to get around the restriction is if it were possible to do
> operations similar to a page flip on buffers that are less than a page
> size.

It is.  You simply need flag to recv() which says ``the page I'm
pointing to contains nothing of interest; you're free to trash it''.
(Obviously you still have to be careful about flipping pages which
contain data the user shouldn't see.)  You might also consider a
getsockopt() call which returns the preferred alignment and offset for
buffers on this particular connection.

> One drawback to this approach is that it requires support for RDMA on both
> ends of the connection.

Shades of trailers....

-GAWollman



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200006191743.NAA30993>