From owner-freebsd-current@FreeBSD.ORG  Tue Mar 21 21:24:02 2006
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: current@freebsd.org
Delivered-To: freebsd-current@FreeBSD.ORG
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id A1AB916A41F;
	Tue, 21 Mar 2006 21:24:02 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 64ABF43DB5;
	Tue, 21 Mar 2006 21:23:48 +0000 (GMT)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	by apollo.backplane.com (8.13.4/8.13.4) with ESMTP id k2LLNM4S006345;
	Tue, 21 Mar 2006 13:23:22 -0800 (PST)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.13.4/8.13.4/Submit) id k2LLNMhO006344;
	Tue, 21 Mar 2006 13:23:22 -0800 (PST)
Date: Tue, 21 Mar 2006 13:23:22 -0800 (PST)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200603212123.k2LLNMhO006344@apollo.backplane.com>
To: Mikhail Teterin <mi+mx@aldan.algebra.com>
References: <200603211607.30372.mi+mx@aldan.algebra.com>
Cc: alc@freebsd.org, current@freebsd.org
Subject: Re: weird bugs with mmap-ing via NFS
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Mar 2006 21:24:02 -0000


:Hello!
:
:I have a program, that writes a file via mmap. Normally the target is on a 
:local filesystem, so there are no issues.
:
:Today, however, I tried running it on another machine writing via NFS.
:
:If the output share is mounted with default parameters, the writing succeeds, 
:but involves very high READ bandwidth (the client is not reading anything). 
:For example, here is the output of `netstat -1' on the client:
:
:            input        (Total)           output
:   packets  errs      bytes    packets  errs      bytes colls
:         2     0        152          0     0          0     0 
:      3081     0    4369834        519     0      82006     0 
:...

    You might be doing just writes to the mmap()'d memory, but the system
    doesn't know that.  The moment you touch any mmap()'d page, reading or
    writing, the system has to fault it in, which means it has to read it
    and load valid data into the page.

:When I mount with large read and write sizes:
:
:	mount_nfs -r 65536 -w 65536 -U -ointr pandora:/backup /backup
:
:it changes -- for the worse. Short time into it -- the file stops growing 
:according to the `ls -sl' run on the NFS server (pandora) at exactly 3200 FS 
:blocks (the FS was created with `-b 65536 -f 8129').
:
:At the same time, according to `systat -if' on both client and server, the  
:client continues to send (and the server continues to receive) about 30Mb of 
:some (?) data per second.
:
:The client is the freshly rebuilt FreeBSD-6.1/i386 -- with alc's recent big 
:MFC included. The server is an older 6.1/amd64 from Feb 7.
:
:Please, advise. Thanks!
:
:	-mi

    It kinda sounds like the buffer cache is getting blown out, but not
    having seen the program I can't really analyze it.

    It will always be more efficient to write to a file using write() then
    using mmap(), and it will always be far, far more efficient to write
    to an NFS file in nfs block-sized chunks rather then in smaller chunks
    due to the way the buffer cache works.  The only write case using
    write lengths less then the NFS block size that is optimized is the
    file-append case.  All other cases (when writing less then the NFS block
    size) will have to perform a read-before-write to validate the buffer
    cache buffer.  Writes that are multiples of the NFS block size (and
    aligned to the NFS block size) should be optimized and will not have to
    perform a read-before-write.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>