From owner-freebsd-arch Sun Dec 3 19: 9: 3 2000 From owner-freebsd-arch@FreeBSD.ORG Sun Dec 3 19:09:01 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mobile.wemm.org (adsl-64-163-195-99.dsl.snfc21.pacbell.net [64.163.195.99]) by hub.freebsd.org (Postfix) with ESMTP id 9A06A37B400; Sun, 3 Dec 2000 19:09:00 -0800 (PST) Received: from netplex.com.au (localhost [127.0.0.1]) by mobile.wemm.org (8.11.1/8.11.1) with ESMTP id eB438tD52326; Sun, 3 Dec 2000 19:08:55 -0800 (PST) (envelope-from peter@netplex.com.au) Message-Id: <200012040308.eB438tD52326@mobile.wemm.org> X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4 To: "Kenneth D. Merry" Cc: arch@FreeBSD.ORG, gallatin@FreeBSD.ORG, dillon@FreeBSD.ORG Subject: Re: zero copy code review In-Reply-To: <20001129231653.A1503@panzer.kdm.org> Date: Sun, 03 Dec 2000 19:08:55 -0800 From: Peter Wemm Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG "Kenneth D. Merry" wrote: > [ -net and -current BCCed for wider coverage, this is probably best > handled on -arch ] > > I would like to request reviews of the zero copy sockets and NFS code I've > been posting about for months: > > http://people.FreeBSD.org/~ken/zero_copy Hmm.. I see one danger item: " 5.Configuration and performance tuning. There are a number of options that need to be turned on for various things to work: options ZERO_COPY_SOCKETS # Turn on zero copy send code options ENABLE_VFS_IOOPT # Turn on zero copy receive options NMBCLUSTERS=(512+512*32) # lots of mbuf clusters options TI_JUMBO_HDRSPLIT # Turn on Tigon header splitting [..] Turn on vfs.ioopt to enable zero copy receive: sysctl -w vfs.ioopt=1 " I know Matt Dillon was intending to remove the ENABLE_VFS_IOOPT code and vfs.ioopt because it is presently fundamentally broken and causes devastating userland semantics impact. For example, at it exists in the tree *right now*, if one does this: buf = malloc(PAGE_SIZE); /* malloc does page alignment here */ read(fd, buf, PAGE_SIZE); .. it would be eligible for ioopt treatment (page lending). Normally, you would have a *private* copy of the page of data. If somebody modifies the backing file, your private copy does not change. However, turning on ioopt causes it to be mmapped in with MAP_PRIVATE.. But this does **NOT** give the same semantics. Sure, if you modify the buffer yourself, you get a Copy-on-write fault and your own private page to mess with. But if somebody else modifies the file before you dirty the page then your supposedly static private copy silently changes out from underneath you because you have been loaned a mapping from the vm/buffer cache. The infrastructure to track "loaned out" pages in the vm page cache isn't present. The pages must be read-only to the kernel and DMA engines and a fault must be taken giving the kernel a chance to fully donate the orignal page to the mapping processes and generate it's own writable version. I have not read the patch extensively, but I am not sure that it is handled completely. There are a few patches to vm_fault(), but I am not sure if these are to handle the problem I described above or something else. In particular, if it is intended to handle the problem, then it seems to depend on being able to make pages unwritable by the kernel. This isn't possible on i386 cpus (only 486 and later). I did not see any busmaster DMA checking either, but I could have missed it.. What about drivers that DMA to pages mapped into KVM without checking writability (and hence COW)? Cheers, -Peter -- Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message