From owner-freebsd-arch@FreeBSD.ORG  Thu Apr 15 02:34:33 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 04B401065670;
	Thu, 15 Apr 2010 02:34:33 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id 94B0C8FC1E;
	Thu, 15 Apr 2010 02:34:32 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AvsEAH8XxkuDaFvK/2dsb2JhbACbW3G+M4UNBA
X-IronPort-AV: E=Sophos;i="4.52,209,1270440000"; d="scan'208";a="72811042"
Received: from fraser.cs.uoguelph.ca ([131.104.91.202])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 14 Apr 2010 22:34:31 -0400
Received: from localhost (localhost.localdomain [127.0.0.1])
	by fraser.cs.uoguelph.ca (Postfix) with ESMTP id 2A7E7109C31C;
	Wed, 14 Apr 2010 22:34:31 -0400 (EDT)
X-Virus-Scanned: amavisd-new at fraser.cs.uoguelph.ca
Received: from fraser.cs.uoguelph.ca ([127.0.0.1])
	by localhost (fraser.cs.uoguelph.ca [127.0.0.1]) (amavisd-new,
	port 10024)
	with ESMTP id D5JnLDqJsxal; Wed, 14 Apr 2010 22:34:30 -0400 (EDT)
Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102])
	by fraser.cs.uoguelph.ca (Postfix) with ESMTP id 7FE84109C28B;
	Wed, 14 Apr 2010 22:34:30 -0400 (EDT)
Received: from localhost (rmacklem@localhost)
	by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id
	o3F2mNg00557; Wed, 14 Apr 2010 22:48:24 -0400 (EDT)
X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing
	-bs
Date: Wed, 14 Apr 2010 22:48:23 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
X-X-Sender: rmacklem@muncher.cs.uoguelph.ca
To: Bruce Evans <brde@optusnet.com.au>
In-Reply-To: <20100414135230.U12587@delplex.bde.org>
Message-ID: <Pine.GSO.4.63.1004142234200.28565@muncher.cs.uoguelph.ca>
References: <4BBEE2DD.3090409@freebsd.org>
	<Pine.GSO.4.63.1004090941200.14439@muncher.cs.uoguelph.ca>
	<4BBF3C5A.7040009@freebsd.org> <20100411114405.L10562@delplex.bde.org>
	<Pine.GSO.4.63.1004110946400.27203@muncher.cs.uoguelph.ca>
	<20100414135230.U12587@delplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: arch@FreeBSD.org, Andriy Gapon <avg@FreeBSD.org>
Subject: Re: (in)appropriate uses for MAXBSIZE
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 15 Apr 2010 02:34:33 -0000


On Wed, 14 Apr 2010, Bruce Evans wrote:

> On Sun, 11 Apr 2010, Rick Macklem wrote:
>
>> On Sun, 11 Apr 2010, Bruce Evans wrote:
>> 
>>> Er, the maximum size of buffers in the buffer cache is especially
>>> irrelevant for nfs.  It is almost irrelevant for physical disks because
>>> clustering normally increases the bulk transfer size to MAXPHYS.
>>> Clustering takes a lot of CPU but doesn't affect the transfer rate much
>>> unless there is not enough CPU.  It is even less relevant for network
>>> i/o since there is a sort of reverse-clustering -- the buffers get split
>>> up into tiny packets (normally 1500 bytes less some header bytes) at
>>> the hardware level.  ...
>> 
[stuff snipped]
>
> Indeed, I was only caring about a LAN environment.  Especially with
> LANs optimized for latency (50-100 uS), nfs performance is poor for
> small files, at least for the old nfs client, mainly due to close to
> open consistency defeating caching, but not a problem for bulk transfers.
>

And I'll admit I was thinking that for a low latency LAN, a large 
read/write RPC wouldn't have a negative impact, but it sounds like
you've found 16Kb to be optimal for this case.

For NFSv4, if the client has a delegation for the file, it doesn't
have worry about close/open consistency, so there is some hope w.r.t.
small files for this case.

>
> Clustering is currently only for the local file system, at least for
> the old nfs server.  nfs just does a VOP_READ() into its own buffer,
> with ioflag set to indicate nfs's idea of sequentialness.  (User reads
> are similar except their uio destination is UIO_USERSPACE instead of
> UIO_SYSSPACE and their sequentialness is set generically and thus not
> so well (but the nfs setting isn't very good either).)  The local file
> system then normally does a clustered read into a larger buffer, with
> the sequentialness affecting mainly startup (per-file), and virtually
> copies the results to the local file system's smaller buffers.  VOP_READ()
> completes by physically copying the results to nfs's buffer (using
> bcopy() for UIO_SYSSPACE and copyout() for UIO_USERSPACE).  nfs can't
> easily get at the larger clustering buffers or even the local file
> system's buffers.  It can more easily benefit from larger MAXBSIZE.
> There is still the bcopy() to take a lot of CPU and memory bus resources,
> but that is insignifcant compared with WAN latency.  But as I said in
> a related thread, even the current MAXBSIZE is too large to use
> routinely, due to buffer cache fragmentation causing significant latency
> problems, so any increase in MAXBSIZE and/or routine use of buffers
> of that size needs to be accompanied by avoiding the fragmentation.
> Note that the fragmentation is avoided for the larger clustering buffers
> by allocating them from a different pool.
>
Ah, now I know what you were referring to w.r.t. clustering. I haven't
looked at the mechanism used to allocate buffer space in the buffer
cache, so I'll just take your word for it w.r.t. fragmentation. It
sounds like the allocation mechanism needs to be thought about if/when
MAXBSIZE gets increased.

Thanks for your input and I hope I didn't upset you when I jumped on
the "I care about WANs" bandwagon, while basically ignoring the LAN case.

rick