Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 19 Jun 2000 12:14:51 -0600
From:      "Kenneth D. Merry" <ken@kdm.org>
To:        Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
Cc:        net@FreeBSD.ORG
Subject:   Re: zero copy sockets and NFS code for FreeBSD
Message-ID:  <20000619121450.A78756@panzer.kdm.org>
In-Reply-To: <200006191743.NAA30993@khavrinen.lcs.mit.edu>; from wollman@khavrinen.lcs.mit.edu on Mon, Jun 19, 2000 at 01:43:29PM -0400
References:  <20000616212545.A57840@panzer.kdm.org> <200006191743.NAA30993@khavrinen.lcs.mit.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jun 19, 2000 at 13:43:29 -0400, Garrett Wollman wrote:
> <<On Fri, 16 Jun 2000 21:25:45 -0600, "Kenneth D. Merry" <ken@kdm.org> said:
> 
> > Another problem with the current send side approach is that it requires
> > page sized and page aligned data in order to apply the COW mapping.  Not
> > all data sets fit this requirement.
> 
> Not necessarily.  There is some threshold length above which it makes
> sense to flip the page, even if you are not transmitting the entire
> page.  This is true even if the MTU is much less than a page (provided
> you implement reference counting for m_copym et al).

This would require the recv() flag you mention below.  Otherwise, you would
have to do a copy and flip operation.  (i.e. copy part of the user's page
into the kernel, and then flip the kernel page in place of the user's
page.)

> Another possible optimization, which I've suggested a few times, is to
> provide a flag on send() which allows the sender to say ``I'm through
> with this page; please unmap it / give me a new ZFOD page'', which may
> make sense for some kinds of applications (e.g., ftpd).

Yeah, that's another possible way to do it.

> > One way to address both of the above problems is to implement an alternate
> > zero copy send scheme that uses async I/O.  With async I/O semantics, it
> > will be clear to the userland program that the buffer in question is not to
> > be used until it is returned from the kernel.
> 
> As with most networking optimizations, programs will have to be
> restructured in order to take maximal advantage -- no surprise there.

Yeah, it's difficult to get things to work 100% without breaking the
standard read/write semantics.

> > One way to get around the restriction is if it were possible to do
> > operations similar to a page flip on buffers that are less than a page
> > size.
> 
> It is.  You simply need flag to recv() which says ``the page I'm
> pointing to contains nothing of interest; you're free to trash it''.
> (Obviously you still have to be careful about flipping pages which
> contain data the user shouldn't see.)  You might also consider a
> getsockopt() call which returns the preferred alignment and offset for
> buffers on this particular connection.
> 
> > One drawback to this approach is that it requires support for RDMA on both
> > ends of the connection.
> 
> Shades of trailers....

Yep, but it is probably one of the cleaner ways of tackling the problem.

Ken
-- 
Kenneth Merry
ken@kdm.org


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000619121450.A78756>