From owner-freebsd-net@FreeBSD.ORG  Sat Mar  9 00:48:22 2013
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id CC08DDF5;
 Sat,  9 Mar 2013 00:48:22 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 39715224;
 Sat,  9 Mar 2013 00:48:21 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqEEAIyFOlGDaFvO/2dsb2JhbABDiCW8NoFzdIIsAQEBAwEBAQEgBCcgCxsYAgINGQIpAQkmBggHBAEcBIdsBgypZ5I3gSOMMwV9NAeCLYETA4hxiySCPoEej1SDKE99CBce
X-IronPort-AV: E=Sophos;i="4.84,810,1355115600"; d="scan'208";a="20199368"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 08 Mar 2013 19:47:13 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id A6806B4032;
 Fri,  8 Mar 2013 19:47:13 -0500 (EST)
Date: Fri, 8 Mar 2013 19:47:13 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Garrett Wollman <wollman@bimajority.org>
Message-ID: <2050712270.3721724.1362790033662.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20794.7012.265887.99878@hergotha.csail.mit.edu>
Subject: Re: Limits on jumbo mbuf cluster allocation
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: jfv@freebsd.org, freebsd-net@freebsd.org,
 Andre Oppermann <andre@freebsd.org>, Garrett Wollman <wollman@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Mar 2013 00:48:22 -0000

Garrett Wollman wrote:
> <<On Fri, 08 Mar 2013 08:54:14 +0100, Andre Oppermann
> <andre@freebsd.org> said:
> 
> > [stuff I wrote deleted]
> > You have an amd64 kernel running HEAD or 9.x?
> 
> Yes, these are 9.1 with some patches to reduce mutex contention on the
> NFS server's replay "cache".
> 
The cached replies are copies of the mbuf list done via m_copym().
As such, the clusters in these replies won't be free'd (ref cnt -> 0)
until the cache is trimmed (nfsrv_trimcache() gets called after the
TCP layer has received an ACK for receipt of the reply from the client).

If reducing the size to 4K doesn't fix the problem, you might want to
consider shrinking the tunable vfs.nfsd.tcphighwater and suffering
the increased CPU overhead (and some increased mutex contention) of
calling nfsrv_trimcache() more frequently.
(I'm assuming that you are using drc2.patch + drc3.patch. If you are
 using one of ivoras@'s variants of the patch, I'm not sure if the
 tunable is called the same thing, although it should have basically
 the same effect.)

Good luck with it and thanks for running on the "bleeding edge" so
these issues get identified, rick

> > Jumbo pages come directly from the kernel_map which on amd64 is
> > 512GB.
> > So KVA shouldn't be a problem. Your problem indeed appears to come
> > physical memory fragmentation in pmap.
> 
> I hadn't realized that they were physically contiguous, but that makes
> perfect sense.
> 
> > pages. Also since you're doing NFS serving almost all memory will be
> > in use for file caching.
> 
> I actually had the ZFS ARC tuned down to 64 GB (out of 96 GB physmem)
> when I experienced this, but there are plenty of data structures in
> the kernel that aren't subject to this limit and I could easily
> imagine them checkerboarding physical memory to the point where no
> contiguous three-page allocations were possible.
> 
> -GAWollman
> 
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"