From owner-freebsd-hackers Mon Aug 17 13:48:35 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id NAA16118 for freebsd-hackers-outgoing; Mon, 17 Aug 1998 13:48:35 -0700 (PDT) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id NAA16107 for ; Mon, 17 Aug 1998 13:48:29 -0700 (PDT) (envelope-from tlambert@usr06.primenet.com) Received: (from daemon@localhost) by smtp04.primenet.com (8.8.8/8.8.8) id NAA13037; Mon, 17 Aug 1998 13:47:51 -0700 (MST) Received: from usr06.primenet.com(206.165.6.206) via SMTP by smtp04.primenet.com, id smtpd012948; Mon Aug 17 13:47:40 1998 Received: (from tlambert@localhost) by usr06.primenet.com (8.8.5/8.8.5) id NAA06478; Mon, 17 Aug 1998 13:47:37 -0700 (MST) From: Terry Lambert Message-Id: <199808172047.NAA06478@usr06.primenet.com> Subject: Re: sendfile() API? To: oppermann@pipeline.ch (Andre Oppermann) Date: Mon, 17 Aug 1998 20:47:37 +0000 (GMT) Cc: shocking@prth.pgs.com, hackers@FreeBSD.ORG In-Reply-To: <35D85D39.8ED6BD8E@pipeline.ch> from "Andre Oppermann" at Aug 17, 98 06:41:29 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > The benefit of something like sendfile() is (according to Marc and my > understanding) to save the time something needs to get read from an > FS and write to the network (very common in webserver and fileserver > applications) via userland (actually the process that does the handling > of the serving). OK, here we go again... On a non-unified VM and buffer cache OS, a write requires a bmap operation to map the user's buffer into the kernel virtual address sapce. On a Unified VM and buffer cache, this is not necessary. To establish an mmap mapping of a file into a process address space on a non-unified VM and buffer cache machine, seperate VM pages are needed, and these shadow buffer cache contents, requiring a copy of the buffer cache contents into VM to instantiate the mapping and make it visible to the process. On a unified VM and buffer cache system, a buffer is a VM mapping, and no copy is necessary. So if you mmap the file on FreeBSD, and then write a memory range in the file to a socket, then the only triggered copy is from a kernel space VM buffer to a kernel space anonymous VM mapping (an mbuf). There are two unavoidable copies here: (1), the copy from the disk controller to the VM buffer for the page demand, and (2) the copy from the mbuf to the ethernet controller. Technically, you could argue that you should be able to give a VM object to the networking stack, and save the triggered copy. The problem with this is the page size on the system. True, you could do the first page of a file this way, by putting the TCP header at the end of an anonymous page, and then butting it up against the start of the data; but unless your MTU is 4k, you will *have* to fragment pages. This requires a complicated automaton to get right, and while this is worthwhile on a CPU-poor machine, like a VAX, it's less of an issue for FreeBSD. FreeBSD, by it's architecture, already saves 2 of the 5 copies needed on most other systems, and the adulteration of the network architecture to save the 3rd is probably not worth it. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message