From owner-freebsd-hackers  Mon Aug 17 13:48:35 1998
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id NAA16118
          for freebsd-hackers-outgoing; Mon, 17 Aug 1998 13:48:35 -0700 (PDT)
          (envelope-from owner-freebsd-hackers@FreeBSD.ORG)
Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id NAA16107
          for <hackers@FreeBSD.ORG>; Mon, 17 Aug 1998 13:48:29 -0700 (PDT)
          (envelope-from tlambert@usr06.primenet.com)
Received: (from daemon@localhost)
	by smtp04.primenet.com (8.8.8/8.8.8) id NAA13037;
	Mon, 17 Aug 1998 13:47:51 -0700 (MST)
Received: from usr06.primenet.com(206.165.6.206)
 via SMTP by smtp04.primenet.com, id smtpd012948; Mon Aug 17 13:47:40 1998
Received: (from tlambert@localhost)
	by usr06.primenet.com (8.8.5/8.8.5) id NAA06478;
	Mon, 17 Aug 1998 13:47:37 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199808172047.NAA06478@usr06.primenet.com>
Subject: Re: sendfile() API?
To: oppermann@pipeline.ch (Andre Oppermann)
Date: Mon, 17 Aug 1998 20:47:37 +0000 (GMT)
Cc: shocking@prth.pgs.com, hackers@FreeBSD.ORG
In-Reply-To: <35D85D39.8ED6BD8E@pipeline.ch> from "Andre Oppermann" at Aug 17, 98 06:41:29 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> The benefit of something like sendfile() is (according to Marc and my
> understanding) to save the time something needs to get read from an
> FS and write to the network (very common in webserver and fileserver
> applications) via userland (actually the process that does the handling
> of the serving).


OK, here we go again...

On a non-unified VM and buffer cache OS, a write requires a bmap operation
to map the user's buffer into the kernel virtual address sapce.

On a Unified VM and buffer cache, this is not necessary.

To establish an mmap mapping of a file into a process address space
on a non-unified VM and buffer cache machine, seperate VM pages are
needed, and these shadow buffer cache contents, requiring a copy of
the buffer cache contents into VM to instantiate the mapping and make
it visible to the process.

On a unified VM and buffer cache system, a buffer is a VM mapping,
and no copy is necessary.

So if you mmap the file on FreeBSD, and then write a memory range in
the file to a socket, then the only triggered copy is from a kernel
space VM buffer to a kernel space anonymous VM mapping (an mbuf).

There are two unavoidable copies here: (1), the copy from the disk
controller to the VM buffer for the page demand, and (2) the copy
from the mbuf to the ethernet controller.

Technically, you could argue that you should be able to give a VM
object to the networking stack, and save the triggered copy.  The
problem with this is the page size on the system.  True, you could
do the first page of a file this way, by putting the TCP header at
the end of an anonymous page, and then butting it up against the start
of the data; but unless your MTU is 4k, you will *have* to fragment
pages.  This requires a complicated automaton to get right, and while
this is worthwhile on a CPU-poor machine, like a VAX, it's less of
an issue for FreeBSD.

FreeBSD, by it's architecture, already saves 2 of the 5 copies needed on
most other systems, and the adulteration of the network architecture
to save the 3rd is probably not worth it.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message