From owner-freebsd-hackers@FreeBSD.ORG  Tue Nov  4 09:02:39 2003
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 8071916A4CE
	for <freebsd-hackers@freebsd.org>;
	Tue,  4 Nov 2003 09:02:39 -0800 (PST)
Received: from relay.pair.com (relay.pair.com [209.68.1.20])
	by mx1.FreeBSD.org (Postfix) with SMTP id 385D743FDF
	for <freebsd-hackers@freebsd.org>;
	Tue,  4 Nov 2003 09:02:38 -0800 (PST)	(envelope-from silby@silby.com)
Received: (qmail 71171 invoked from network); 4 Nov 2003 17:02:36 -0000
Received: from niwun.pair.com (HELO localhost) (209.68.2.70)
  by relay.pair.com with SMTP; 4 Nov 2003 17:02:36 -0000
X-pair-Authenticated: 209.68.2.70
Date: Tue, 4 Nov 2003 11:02:35 -0600 (CST)
From: Mike Silbersack <silby@silby.com>
To: Vivek Pai <vivek@CS.Princeton.EDU>
In-Reply-To: <3FA2C43E.3030204@cs.princeton.edu>
Message-ID: <20031104104729.S1684@odysseus.silby.com>
References: <1066789354.21430.39.camel@boxster.onthenet.com.au>   
	<20031022082953.GA69506@rot13.obsecurity.org>   
	<1066816287.25609.34.camel@boxster.onthenet.com.au>   
	<20031022095754.GA70026@rot13.obsecurity.org>   
	<xzpk76sc425.fsf@dwp.des.no>
	<1067183332.3f9bece4c0cf4@webmail.cs.princeton.edu>   
	<3FA2C43E.3030204@cs.princeton.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-hackers@freebsd.org
cc: Alan Cox <alc@cs.rice.edu>
Subject: Update: Debox sendfile modifications
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 04 Nov 2003 17:02:39 -0000


Ok, I've reread the debox paper, looked over the patch, and talked to Alan
Cox about his present and upcoming work on the vm system.

The debox patch does three basic things (if I'm understanding everything
correctly.):

1.  It ensures that the header is sent in the same packet as the first
part of the data, fixing performance with small files.

- This part of the patch needs a little cleanup, but that's easy enough.
I will try to integrate it next week.

2.  The patch merges sendfile buffers so that when the same page is sent
to multiple connections, kernel address space is not wasted.

- While this is the part of the patch with the widest benefit, it will be
the most difficult to integrate.  In order to support 64-bit architectures
better, Alan has refactored the sendfile code, meaning that the patch
would have to be rewritten to fit this new layout.

3.  The patch returns a new error when sendfile realizes that it will have
to block on disk I/O, thereby allowing Flash to have a helper do the
blocking call.

- While this change could be made easily enough, I'm not sure that it
would benefit anything other than Flash, so I'm not certain if we should
do it.  However, based on what you learned with Flash, I have an alternate
idea:

---

Suppose that sendfile is called to send to a non-blocking socket, and that
it detects that the page(s) required are not in memory, and that disk I/O
will be necessary.  Instead of blocking, sendfile would call a sendfile
helper kernel thread (either by calling kthread_create, or by having a
preexisting pool.)  After dispatching this thread, sendfile would return
EWOULDBLOCK to the caller.  Note that only a limited number of threads
would exist (perhaps 8?), so, if all threads were busy, sendfile would
have to block like it does at present.

Once the I/O was complete, the thread would call sowakeup (or whatever is
called typically when a thread is now ready for writing) for the socket in
question.  The application would call sendfile, like normal, but this time
everything would succeed because the page would be in memory.

---

If such a feature were implemented, it might have the same increased
performance effect that your new return value does, except that it would
require no modification for a non-blocking sendfile based application to
take advantage of it.

Alan, would this be possible from the VM system's perspective?  Is it safe
to assume that once the page in question was in the page cache that it
would hang around long enough for the second sendfile call to access it
before it is paged back out again?

Thanks,

Mike "Silby" Silbersack