Date: Mon, 17 Aug 1998 20:47:37 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: oppermann@pipeline.ch (Andre Oppermann) Cc: shocking@prth.pgs.com, hackers@FreeBSD.ORG Subject: Re: sendfile() API? Message-ID: <199808172047.NAA06478@usr06.primenet.com> In-Reply-To: <35D85D39.8ED6BD8E@pipeline.ch> from "Andre Oppermann" at Aug 17, 98 06:41:29 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> The benefit of something like sendfile() is (according to Marc and my > understanding) to save the time something needs to get read from an > FS and write to the network (very common in webserver and fileserver > applications) via userland (actually the process that does the handling > of the serving). OK, here we go again... On a non-unified VM and buffer cache OS, a write requires a bmap operation to map the user's buffer into the kernel virtual address sapce. On a Unified VM and buffer cache, this is not necessary. To establish an mmap mapping of a file into a process address space on a non-unified VM and buffer cache machine, seperate VM pages are needed, and these shadow buffer cache contents, requiring a copy of the buffer cache contents into VM to instantiate the mapping and make it visible to the process. On a unified VM and buffer cache system, a buffer is a VM mapping, and no copy is necessary. So if you mmap the file on FreeBSD, and then write a memory range in the file to a socket, then the only triggered copy is from a kernel space VM buffer to a kernel space anonymous VM mapping (an mbuf). There are two unavoidable copies here: (1), the copy from the disk controller to the VM buffer for the page demand, and (2) the copy from the mbuf to the ethernet controller. Technically, you could argue that you should be able to give a VM object to the networking stack, and save the triggered copy. The problem with this is the page size on the system. True, you could do the first page of a file this way, by putting the TCP header at the end of an anonymous page, and then butting it up against the start of the data; but unless your MTU is 4k, you will *have* to fragment pages. This requires a complicated automaton to get right, and while this is worthwhile on a CPU-poor machine, like a VAX, it's less of an issue for FreeBSD. FreeBSD, by it's architecture, already saves 2 of the 5 copies needed on most other systems, and the adulteration of the network architecture to save the 3rd is probably not worth it. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199808172047.NAA06478>