From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 18:40:07 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E64A9106566C; Tue, 28 Sep 2010 18:40:06 +0000 (UTC) (envelope-from ben@wanderview.com) Received: from mail.wanderview.com (mail.wanderview.com [66.92.166.102]) by mx1.freebsd.org (Postfix) with ESMTP id 9EB6F8FC1D; Tue, 28 Sep 2010 18:40:06 +0000 (UTC) Received: from xykon.in.wanderview.com (xykon.in.wanderview.com [10.76.10.152]) (authenticated bits=0) by mail.wanderview.com (8.14.4/8.14.4) with ESMTP id o8SIdx1R028419 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Tue, 28 Sep 2010 18:40:00 GMT (envelope-from ben@wanderview.com) Mime-Version: 1.0 (Apple Message framework v1081) Content-Type: text/plain; charset=us-ascii From: Ben Kelly In-Reply-To: <4CA22337.2010900@icyb.net.ua> Date: Tue, 28 Sep 2010 14:40:00 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan> <4CA1DDE9.8090107@icyb.net.ua> <20100928132355.GA63149@icarus.home.lan> <4CA1EF69.4040402@icyb.net.ua> <4CA21809.7090504@icyb.net.ua> <71D54408-4B97-4F7A-BD83-692D8D23461A@wanderview.com> <4CA22337.2010900@icyb.net.ua> To: Andriy Gapon X-Mailer: Apple Mail (2.1081) X-Spam-Score: -1.01 () ALL_TRUSTED,T_RP_MATCHES_RCVD X-Scanned-By: MIMEDefang 2.67 on 10.76.20.1 Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 18:40:07 -0000 On Sep 28, 2010, at 1:17 PM, Andriy Gapon wrote: > on 28/09/2010 19:46 Ben Kelly said the following: >> Hmm. My server is currently idle with no I/O happening: >>=20 >> kstat.zfs.misc.arcstats.c: 25165824 >> kstat.zfs.misc.arcstats.c_max: 46137344 >> kstat.zfs.misc.arcstats.size: 91863156 >>=20 >> If what you say is true, this shouldn't happen, should it? This = system is an i386 machine with kmem max at 800M and arc set to 40M. = This is running head from April 6, 2010, so it is a bit old, though. >=20 > Well, your system is a bit old indeed. > And the branch is unknown, so I can't really see what sources you = have. > And I am not sure if I'll be able to say anything about those sources. Quite old. I've been intending to update, but haven't found the time = lately. I'll try to do the upgrade this weekend and see if it changes = anything. > As to the numbers - yes, with current code I'd expect arcstats.size to = go down to > arcstats.c when there is no I/O. arc_reclaim_thread should do that. Thats what I thought as well, but when I debugged it a year or two ago I = found that the buffers were still referenced and thus could not be = reclaimed. As far as I can remember they needed a vfs/vnops like = zfs_vnops_inactive or zfs_vnops_reclaim to be executed in order to free = the reference. What is responsible for making those calls? >=20 >> At one point I had patches running on my system that triggered the = pagedaemon based on arc load and it did allow me to keep my arc below = the max. Or at least I thought it did. >>=20 >> In any case, I've never really been able to wrap my head around the = VFS layer and how it interacts with zfs. So I'm more than willing to = believe I'm confused. Any insights are greatly appreciated. >=20 > ARC is a ZFS private cache. > ZFS doesn't use unified buffer/page cache. > So ARC is not directly affected by pagedaemon. > But this is not exactly VFS layer thing. Can you explain the difference in how the vfs/vnode operations are = called or used for those two situations? I thought that the buffer cache was used by filesystems to implement = these operations. So that the buffer cache was below the vfs/vnops = layer. So while zfs implemented its operations in terms of the arc, = things like UFS implemented vfs/vnops in terms of the buffer cache. I = thought the layers further up the chain like the page daemon did not = distinguish that much between these two implementation due to the VFS = interface layer. (Although there seems to be a layering violation in = that the buffer cache signals directly to the upper page daemon layer to = trigger page reclamation.) The old (ancient) patch I tried previously to help reduce the arc = working set and allow it to shrink is here: http://www.wanderview.com/svn/public/misc/zfs/zfs_kmem_limit.diff Unfortunately, there are a couple ideas on fighting fragmentation mixed = into that patch. See the part about arc_reclaim_pages(). This patch = did seem to allow my arc to stay under the target maximum even when = under load that previously caused the system to exceed the maximum. = When I update this weekend I'll try a stripped down version of the patch = to see if it helps or not with the latest zfs. Thanks for your help in understanding this stuff! - Ben=