From owner-freebsd-net Mon Jun 19 11:15:15 2000 Delivered-To: freebsd-net@freebsd.org Received: from panzer.kdm.org (panzer.kdm.org [216.160.178.169]) by hub.freebsd.org (Postfix) with ESMTP id BCD3937BD76 for ; Mon, 19 Jun 2000 11:15:07 -0700 (PDT) (envelope-from ken@panzer.kdm.org) Received: (from ken@localhost) by panzer.kdm.org (8.9.3/8.9.1) id MAA78802; Mon, 19 Jun 2000 12:14:51 -0600 (MDT) (envelope-from ken) Date: Mon, 19 Jun 2000 12:14:51 -0600 From: "Kenneth D. Merry" To: Garrett Wollman Cc: net@FreeBSD.ORG Subject: Re: zero copy sockets and NFS code for FreeBSD Message-ID: <20000619121450.A78756@panzer.kdm.org> References: <20000616212545.A57840@panzer.kdm.org> <200006191743.NAA30993@khavrinen.lcs.mit.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2i In-Reply-To: <200006191743.NAA30993@khavrinen.lcs.mit.edu>; from wollman@khavrinen.lcs.mit.edu on Mon, Jun 19, 2000 at 01:43:29PM -0400 Sender: owner-freebsd-net@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Mon, Jun 19, 2000 at 13:43:29 -0400, Garrett Wollman wrote: > < said: > > > Another problem with the current send side approach is that it requires > > page sized and page aligned data in order to apply the COW mapping. Not > > all data sets fit this requirement. > > Not necessarily. There is some threshold length above which it makes > sense to flip the page, even if you are not transmitting the entire > page. This is true even if the MTU is much less than a page (provided > you implement reference counting for m_copym et al). This would require the recv() flag you mention below. Otherwise, you would have to do a copy and flip operation. (i.e. copy part of the user's page into the kernel, and then flip the kernel page in place of the user's page.) > Another possible optimization, which I've suggested a few times, is to > provide a flag on send() which allows the sender to say ``I'm through > with this page; please unmap it / give me a new ZFOD page'', which may > make sense for some kinds of applications (e.g., ftpd). Yeah, that's another possible way to do it. > > One way to address both of the above problems is to implement an alternate > > zero copy send scheme that uses async I/O. With async I/O semantics, it > > will be clear to the userland program that the buffer in question is not to > > be used until it is returned from the kernel. > > As with most networking optimizations, programs will have to be > restructured in order to take maximal advantage -- no surprise there. Yeah, it's difficult to get things to work 100% without breaking the standard read/write semantics. > > One way to get around the restriction is if it were possible to do > > operations similar to a page flip on buffers that are less than a page > > size. > > It is. You simply need flag to recv() which says ``the page I'm > pointing to contains nothing of interest; you're free to trash it''. > (Obviously you still have to be careful about flipping pages which > contain data the user shouldn't see.) You might also consider a > getsockopt() call which returns the preferred alignment and offset for > buffers on this particular connection. > > > One drawback to this approach is that it requires support for RDMA on both > > ends of the connection. > > Shades of trailers.... Yep, but it is probably one of the cleaner ways of tackling the problem. Ken -- Kenneth Merry ken@kdm.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message