From owner-freebsd-stable@FreeBSD.ORG Fri Mar 24 21:38:32 2006 Return-Path: X-Original-To: stable@freebsd.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3E4C016A401; Fri, 24 Mar 2006 21:38:32 +0000 (UTC) (envelope-from bakul@bitblocks.com) Received: from mail.bitblocks.com (bitblocks.com [209.204.185.216]) by mx1.FreeBSD.org (Postfix) with ESMTP id ECD4F43D45; Fri, 24 Mar 2006 21:38:31 +0000 (GMT) (envelope-from bakul@bitblocks.com) Received: from bitblocks.com (localhost [127.0.0.1]) by mail.bitblocks.com (Postfix) with ESMTP id 5D48A294E8; Fri, 24 Mar 2006 13:38:31 -0800 (PST) To: Mikhail Teterin In-reply-to: Your message of "Fri, 24 Mar 2006 15:18:00 EST." <200603241518.01027.mi+mx@aldan.algebra.com> Date: Fri, 24 Mar 2006 13:38:31 -0800 From: Bakul Shah Message-Id: <20060324213831.5D48A294E8@mail.bitblocks.com> Cc: alc@freebsd.org, Peter Jeremy , stable@freebsd.org Subject: Re: Reading via mmap stinks (Re: weird bugs with mmap-ing via NFS) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Mar 2006 21:38:32 -0000 > > May be the OS needs "reclaim-behind" for the sequential case? > > This way you can mmap many many pages and use a much smaller > > pool of physical pages to back them. šThe idea is for the VM > > to reclaim pages N-k..N-1 when page N is accessed and allow > > the same process to reuse this page. > > Although it may hard for the kernel to guess, which pages it can reclaim > efficiently in the general case, my issuing of madvise with MADV_SEQUENTIONAL > should've given it a strong hint. Yes, that is what I was saying. If mmap read can be made as efficient as the use of read() for this most common case, there are benefits. In effect we set up a fifo that rolls along the mapped address range and the kernel processing and the user processing are somewhat decoupled. > Reading via mmap should never be slower, than via read > -- it should be just a notch faster, in fact... Depends on the cost of mostly redundant processing of N read() syscalls versus the cost of setting up and tearing down multiple v2p mappings -- presumably page faults can be avoided if the kernel fills in pages ahead of when they are first accessed. The cost of tlbmiss is likely minor. Probably the breakeven point is just a few read() calls. > I'm also quite certain, that fulfulling my "demands" would add quite a bit of > complexity to the mmap support in kernel, but hey, that's what the kernel is > there for :-) An interesting thought experiment is to assume the system has *no* read and write calls and see how far you can get with the present mmap scheme and what extensions are needed to get back the same functionality. Yes, assume mmap & friends even for serial IO! I am betting that mmap can be simplified. [Proof by handwaving elided; this screen is too small to fit my hands :-)]