From owner-freebsd-stable@FreeBSD.ORG  Fri Mar 30 08:38:34 2012
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9E749106566C
	for <freebsd-stable@freebsd.org>; Fri, 30 Mar 2012 08:38:34 +0000 (UTC)
	(envelope-from prvs=043690aa12=ob@gruft.de)
Received: from main.mx.e-gitt.net (service.rules.org [IPv6:2001:1560:2342::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 2DC488FC08
	for <freebsd-stable@freebsd.org>; Fri, 30 Mar 2012 08:38:34 +0000 (UTC)
Received: from ob by main.mx.e-gitt.net with local (Exim 4.77 (FreeBSD))
	(envelope-from <ob@gruft.de>) id 1SDXLq-0001x9-TM
	for freebsd-stable@freebsd.org; Fri, 30 Mar 2012 10:38:30 +0200
Date: Fri, 30 Mar 2012 10:38:30 +0200
From: Oliver Brandmueller <ob@e-Gitt.NET>
To: freebsd-stable@freebsd.org
Message-ID: <20120330083830.GB65313@e-Gitt.NET>
Mail-Followup-To: freebsd-stable@freebsd.org
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
X-Face: "TT~P'b_)-jKU_0^a=usXryz`YTz)z.[FZrI,A~PREI2U}frrZ`>_J&;
	^t|^.dR/mqtC,Vb.Y>~u8(|aL)vAv(k">zY"]*m*y|b8S7:WK[/qP5i>HO#Ek;
	C[X:b|FP0*Ly_4Ni
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: Oliver Brandmueller <ob@gruft.de>
Subject: 9-STABLE, ZFS, NFS, ggatec - suspected memory leak
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Mar 2012 08:38:34 -0000

Hi all,

Setup:

I'm running 2 machines (amd64, 16GB) with FreeBSD 9-STABLE (Mar 14 so 
far) acting as NFS servers. They each serve 3 zpools (holding a single 
zfs, hourly snapshots). The zpools each are 3-way mirrors of ggate 
devices, each 2 TB, so 2 TB per zpool. Compression is "on" (to save 
bandwith to the backend, compressratio around 1.05 to 1.15), atime is 
off.

There is no special tuning in loader.conf (except I tried to limit ZFS 
ARC to 8GB lately, which doesn't change a lot). sysctl.conf has:

kern.ipc.maxsockbuf=33554432
net.inet.tcp.sendspace=8388608
net.inet.tcp.recvspace=8388608
kern.maxfiles=64000
vfs.nfsd.maxthreads=254

Without the first three zfs+ggate goes bad after a short time (checksum 
errors, stall), the latter are mainly for NFS and some regular local 
cleanup stuff.

The machines have 4 em and 2 igb network interfaces. 3 of the are 
dedicated links (with no switches) to the ggate servers, one is a 
dedicated link to a third machine which gets feeded with incremental 
snapshots by ZFS send (as backup and fallaback of last resort), one 
interface for management tasks and one to an internal network with the 
NFS clients.

The NFS clients are mostly FreeBSD 6, 7 and 9 STABLE machines (migration 
to 9 is running), no NFSv4 (by now), all tcp NFS links, merely no 
locking.

Data consists of a lot of files, this is mainly mailboxes (IMAP: 
dovecot, incoming mail with exim, some simple web stuff with apache), so 
lots of small files, only few bigger ones. Directory structures to a 
reasonable depth.

System is on a SSD (ufs, trim), additionally there are 3 (4 actually, 1 
unused by now) 120GB SSDs serving as cache devices for the zpools. I 
first starting using the whole device, but in my hopes to change 
something limited cache to partitions of 32GB without change in 
behaviour.


Problem:

After about a week of runtime in normal workload the systems starts to 
swap (with about 300 to 500 MB of RAM free). Lots of swapping in and 
out, but only very few swap space used (30 to 50 MB). ZFS ARC at that 
point reaches it's minimum (while using up to it's configured maximum 
before). Most of the RAM is wired. L2ARC headers, accourding to 
zfs-stats eat about 1GB, ARC is at 1.8GB at this time. No userland 
processes using lots of RAM.

After some time the system becomes unresponsive, the only way to fix 
this I had found by now is to reboot the machine (which of course gives 
a service interruption).

>From the start of swapping to unresponsiveness I have about 2 to 4 hours 
to check several things (if I just knew what!).

Workload distribution is not even over the day, from my munin graphs I 
can see, that wired grows at time of higher workload. At night with 
lower workload (but far from nothing, let's say about 1/3 to 1/2 in 
writes, but probably <1/4 in reads from weekday workload) I can barely 
see any groth of the wired graph.

So where is my memory going, any ideas what to change?

Kernel is stripped down from GENERIC and then everything I need loaded 
as modules.

Kernel config: http://sysadm.in/zprob/ZSTOR
loader.conf  : http://sysadm.in/zprob/loader.conf
dmesg.boot   : http://sysadm.in/zprob/dmesg.boot


-- 
| Oliver Brandmueller          http://sysadm.in/         ob@sysadm.in |
|                        Ich bin das Internet. Sowahr ich Gott helfe. |