From owner-freebsd-net Mon Jun 19 10:43:51 2000 Delivered-To: freebsd-net@freebsd.org Received: from khavrinen.lcs.mit.edu (khavrinen.lcs.mit.edu [18.24.4.193]) by hub.freebsd.org (Postfix) with ESMTP id 34DD837BD32 for ; Mon, 19 Jun 2000 10:43:47 -0700 (PDT) (envelope-from wollman@khavrinen.lcs.mit.edu) Received: (from wollman@localhost) by khavrinen.lcs.mit.edu (8.9.3/8.9.3) id NAA30993; Mon, 19 Jun 2000 13:43:29 -0400 (EDT) (envelope-from wollman) Date: Mon, 19 Jun 2000 13:43:29 -0400 (EDT) From: Garrett Wollman Message-Id: <200006191743.NAA30993@khavrinen.lcs.mit.edu> To: "Kenneth D. Merry" Cc: net@FreeBSD.ORG Subject: zero copy sockets and NFS code for FreeBSD In-Reply-To: <20000616212545.A57840@panzer.kdm.org> References: <20000616212545.A57840@panzer.kdm.org> Sender: owner-freebsd-net@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org < said: > Another problem with the current send side approach is that it requires > page sized and page aligned data in order to apply the COW mapping. Not > all data sets fit this requirement. Not necessarily. There is some threshold length above which it makes sense to flip the page, even if you are not transmitting the entire page. This is true even if the MTU is much less than a page (provided you implement reference counting for m_copym et al). Another possible optimization, which I've suggested a few times, is to provide a flag on send() which allows the sender to say ``I'm through with this page; please unmap it / give me a new ZFOD page'', which may make sense for some kinds of applications (e.g., ftpd). > One way to address both of the above problems is to implement an alternate > zero copy send scheme that uses async I/O. With async I/O semantics, it > will be clear to the userland program that the buffer in question is not to > be used until it is returned from the kernel. As with most networking optimizations, programs will have to be restructured in order to take maximal advantage -- no surprise there. > One way to get around the restriction is if it were possible to do > operations similar to a page flip on buffers that are less than a page > size. It is. You simply need flag to recv() which says ``the page I'm pointing to contains nothing of interest; you're free to trash it''. (Obviously you still have to be careful about flipping pages which contain data the user shouldn't see.) You might also consider a getsockopt() call which returns the preferred alignment and offset for buffers on this particular connection. > One drawback to this approach is that it requires support for RDMA on both > ends of the connection. Shades of trailers.... -GAWollman To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message