From owner-freebsd-hackers@FreeBSD.ORG Tue Nov 4 09:02:39 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8071916A4CE for ; Tue, 4 Nov 2003 09:02:39 -0800 (PST) Received: from relay.pair.com (relay.pair.com [209.68.1.20]) by mx1.FreeBSD.org (Postfix) with SMTP id 385D743FDF for ; Tue, 4 Nov 2003 09:02:38 -0800 (PST) (envelope-from silby@silby.com) Received: (qmail 71171 invoked from network); 4 Nov 2003 17:02:36 -0000 Received: from niwun.pair.com (HELO localhost) (209.68.2.70) by relay.pair.com with SMTP; 4 Nov 2003 17:02:36 -0000 X-pair-Authenticated: 209.68.2.70 Date: Tue, 4 Nov 2003 11:02:35 -0600 (CST) From: Mike Silbersack To: Vivek Pai In-Reply-To: <3FA2C43E.3030204@cs.princeton.edu> Message-ID: <20031104104729.S1684@odysseus.silby.com> References: <1066789354.21430.39.camel@boxster.onthenet.com.au> <20031022082953.GA69506@rot13.obsecurity.org> <1066816287.25609.34.camel@boxster.onthenet.com.au> <20031022095754.GA70026@rot13.obsecurity.org> <1067183332.3f9bece4c0cf4@webmail.cs.princeton.edu> <3FA2C43E.3030204@cs.princeton.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-hackers@freebsd.org cc: Alan Cox Subject: Update: Debox sendfile modifications X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Nov 2003 17:02:39 -0000 Ok, I've reread the debox paper, looked over the patch, and talked to Alan Cox about his present and upcoming work on the vm system. The debox patch does three basic things (if I'm understanding everything correctly.): 1. It ensures that the header is sent in the same packet as the first part of the data, fixing performance with small files. - This part of the patch needs a little cleanup, but that's easy enough. I will try to integrate it next week. 2. The patch merges sendfile buffers so that when the same page is sent to multiple connections, kernel address space is not wasted. - While this is the part of the patch with the widest benefit, it will be the most difficult to integrate. In order to support 64-bit architectures better, Alan has refactored the sendfile code, meaning that the patch would have to be rewritten to fit this new layout. 3. The patch returns a new error when sendfile realizes that it will have to block on disk I/O, thereby allowing Flash to have a helper do the blocking call. - While this change could be made easily enough, I'm not sure that it would benefit anything other than Flash, so I'm not certain if we should do it. However, based on what you learned with Flash, I have an alternate idea: --- Suppose that sendfile is called to send to a non-blocking socket, and that it detects that the page(s) required are not in memory, and that disk I/O will be necessary. Instead of blocking, sendfile would call a sendfile helper kernel thread (either by calling kthread_create, or by having a preexisting pool.) After dispatching this thread, sendfile would return EWOULDBLOCK to the caller. Note that only a limited number of threads would exist (perhaps 8?), so, if all threads were busy, sendfile would have to block like it does at present. Once the I/O was complete, the thread would call sowakeup (or whatever is called typically when a thread is now ready for writing) for the socket in question. The application would call sendfile, like normal, but this time everything would succeed because the page would be in memory. --- If such a feature were implemented, it might have the same increased performance effect that your new return value does, except that it would require no modification for a non-blocking sendfile based application to take advantage of it. Alan, would this be possible from the VM system's perspective? Is it safe to assume that once the page in question was in the page cache that it would hang around long enough for the second sendfile call to access it before it is paged back out again? Thanks, Mike "Silby" Silbersack