Date: Mon, 19 Jun 2000 12:14:51 -0600 From: "Kenneth D. Merry" <ken@kdm.org> To: Garrett Wollman <wollman@khavrinen.lcs.mit.edu> Cc: net@FreeBSD.ORG Subject: Re: zero copy sockets and NFS code for FreeBSD Message-ID: <20000619121450.A78756@panzer.kdm.org> In-Reply-To: <200006191743.NAA30993@khavrinen.lcs.mit.edu>; from wollman@khavrinen.lcs.mit.edu on Mon, Jun 19, 2000 at 01:43:29PM -0400 References: <20000616212545.A57840@panzer.kdm.org> <200006191743.NAA30993@khavrinen.lcs.mit.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jun 19, 2000 at 13:43:29 -0400, Garrett Wollman wrote: > <<On Fri, 16 Jun 2000 21:25:45 -0600, "Kenneth D. Merry" <ken@kdm.org> said: > > > Another problem with the current send side approach is that it requires > > page sized and page aligned data in order to apply the COW mapping. Not > > all data sets fit this requirement. > > Not necessarily. There is some threshold length above which it makes > sense to flip the page, even if you are not transmitting the entire > page. This is true even if the MTU is much less than a page (provided > you implement reference counting for m_copym et al). This would require the recv() flag you mention below. Otherwise, you would have to do a copy and flip operation. (i.e. copy part of the user's page into the kernel, and then flip the kernel page in place of the user's page.) > Another possible optimization, which I've suggested a few times, is to > provide a flag on send() which allows the sender to say ``I'm through > with this page; please unmap it / give me a new ZFOD page'', which may > make sense for some kinds of applications (e.g., ftpd). Yeah, that's another possible way to do it. > > One way to address both of the above problems is to implement an alternate > > zero copy send scheme that uses async I/O. With async I/O semantics, it > > will be clear to the userland program that the buffer in question is not to > > be used until it is returned from the kernel. > > As with most networking optimizations, programs will have to be > restructured in order to take maximal advantage -- no surprise there. Yeah, it's difficult to get things to work 100% without breaking the standard read/write semantics. > > One way to get around the restriction is if it were possible to do > > operations similar to a page flip on buffers that are less than a page > > size. > > It is. You simply need flag to recv() which says ``the page I'm > pointing to contains nothing of interest; you're free to trash it''. > (Obviously you still have to be careful about flipping pages which > contain data the user shouldn't see.) You might also consider a > getsockopt() call which returns the preferred alignment and offset for > buffers on this particular connection. > > > One drawback to this approach is that it requires support for RDMA on both > > ends of the connection. > > Shades of trailers.... Yep, but it is probably one of the cleaner ways of tackling the problem. Ken -- Kenneth Merry ken@kdm.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000619121450.A78756>