From owner-freebsd-stable@FreeBSD.ORG  Sun Nov 22 16:51:27 2009
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9F17F106566C
	for <freebsd-stable@freebsd.org>; Sun, 22 Nov 2009 16:51:27 +0000 (UTC)
	(envelope-from mcdouga9@egr.msu.edu)
Received: from mx.egr.msu.edu (surfnturf.egr.msu.edu [35.9.37.164])
	by mx1.freebsd.org (Postfix) with ESMTP id 5AFFE8FC14
	for <freebsd-stable@freebsd.org>; Sun, 22 Nov 2009 16:51:27 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
	by mx.egr.msu.edu (Postfix) with ESMTP id 562E97A5B8;
	Sun, 22 Nov 2009 11:51:26 -0500 (EST)
X-Virus-Scanned: amavisd-new at egr.msu.edu
Received: from mx.egr.msu.edu ([127.0.0.1])
	by localhost (surfnturf.egr.msu.edu [127.0.0.1]) (amavisd-new,
	port 10024)
	with ESMTP id 0kxQT47RITAS; Sun, 22 Nov 2009 11:51:26 -0500 (EST)
Received: from localhost (daemon.egr.msu.edu [35.9.44.65])
	by mx.egr.msu.edu (Postfix) with ESMTP id 162B97A5B5;
	Sun, 22 Nov 2009 11:51:25 -0500 (EST)
Received: by localhost (Postfix, from userid 21281)
	id ED094286; Sun, 22 Nov 2009 11:51:25 -0500 (EST)
Date: Sun, 22 Nov 2009 11:51:25 -0500
From: Adam McDougall <mcdouga9@egr.msu.edu>
To: "Svein Skogen (listmail account)" <svein-listmail@stillbilde.net>
Message-ID: <20091122165125.GN1213@egr.msu.edu>
References: <m2skcajavv.wl%randy@psg.com> <m2r5ruja6v.wl%randy@psg.com>
	<4B066B13.1070006@freebsd.org> <m2bpiwitz7.wl%randy@psg.com>
	<4b07ac59.A2Afaf4X0IZlrgGU%perryh@pluto.rain.com>
	<57200BF94E69E54880C9BB1AF714BBCBA5722E@w2003s01.double-l.local>
	<20091121193643.GA14122@icarus.home.lan>
	<20091122052030.GL1213@egr.msu.edu>
	<4B08FD93.4070409@stillbilde.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4B08FD93.4070409@stillbilde.net>
User-Agent: Mutt/1.5.20 (2009-06-14)
Cc: freebsd-stable@freebsd.org
Subject: Re: 7.2 dies in zfs
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 22 Nov 2009 16:51:27 -0000

On Sun, Nov 22, 2009 at 10:00:03AM +0100, Svein Skogen (listmail account) wrote:

  -----BEGIN PGP SIGNED MESSAGE-----
  Hash: SHA1
  
  Adam McDougall wrote:
  > On Sat, Nov 21, 2009 at 11:36:43AM -0800, Jeremy Chadwick wrote:
  > 
  >   
  >   On Sat, Nov 21, 2009 at 08:07:40PM +0100, Johan Hendriks wrote:
  >   > Randy Bush <randy@psg.com> wrote:
  >   > > imiho, zfs can not be called production ready if it crashes if you
  >   > > do not stand on your left leg, put your right hand in the air, and
  >   > > burn some eye of newt.
  >   > 
  >   > This is not a rant, but where do you read that on FreeBSD 7.2 ZFS has
  >   > been marked as production ready.
  >   > As far as i know, on FreeBSD 8.0 ZFS is called production ready.
  >   > 
  >   > If you boot your system it probably tell you it is still experimental.
  >   > 
  >   > Try running FreeBSD 7-Stable to get the latest ZFS version which on
  >   > FreeBSD is 13
  >   > On 7.2 it is still at 6 (if I remember it right).
  >   
  >   RELENG_7 uses ZFS v13, RELENG_8 uses ZFS v18.
  >   
  >   RELENG_7 and RELENG_8 both, more or less, behave the same way with
  >   regards to ZFS.  Both panic on kmem exhaustion.  No one has answered my
  >   question as far as what's needed to stabilise ZFS on either 7.x or 8.x.
  > 
  > I have a stable public ftp/http/rsync/cvsupd mirror that runs ZFS v13.
  > It has been stable since mid may.  I have not had a kmem panic on any
  > of my ZFS systems for a long time, its a matter of making sure there is
  > enough kmem at boot (not depending on kmem_size_max) and that it is big enough
  > that fragmentation does not cause a premature allocation failure due to lack
  > of large-enough contiguous chunk.  This requires the platform to support a
  > kmem size that is "big enough"... i386 can barely muster 1.6G and sometimes
  > that might not be enough.  I'm pretty sure all of my currently existing ZFS
  > systems are amd64 where the kmem can now be huge.  On the busy fileserver with
  > 20 gigs of ram running FreeBSD 8.0-RC2 #21: Tue Oct 27 21:45:41 EDT 2009,
  > I currently have:
  > vfs.zfs.arc_max=16384M
  > vfs.zfs.arc_min=4096M
  > vm.kmem_size=18G
  > The arc settings here are to try to encourage it to favor the arc cache
  > instead of whatever else Inactive memory in 'top' contains.
  
  Very interesting. For my iscsi backend (running istgt from ports), I had
  to change the arc_max below 128M to stop iSCSI initiators generating
  timeouts when the cache flushed. (This is on a system with a megaraid
  8308ELP handling the disk back end, with the disks in two RAID5 arrays
  of four disks each, zpooled as one big pool).
  
  When I had more than 128M arc_max, zfs on regular times ate all
  available resources to flush to disk, leaving the istgt waiting, and
  iSCSI initiators timed out and had to reconnect. The iSCSI initiators
  are the built-in software initator in VMWare ESX 4i.
  
  //Svein
  
I could understand that happening.  I've seen situations in the past where my
kmem was smaller than I wanted it to be, and within a few days the overall
ZFS disk IO would become incredibly slow because it was trying to flush out
the ARC way too often because of external intense memory pressure on the ARC.
Assuming you have a large amount of ram, I wonder if setting kmem_size, arc_min
and arc_max sufficiently large and using modern code would help as long as you
made sure other processes on the machine don't squeeze down Wired memory in top
too much.  In such a situation, I would expect it to operate fine while the ARC
has enough kmem to expand as much as it wants to, and it might either hit a wall
later or perhaps given enough ARC the reclamation might be tolerable.  Or, if 
128M ARC is good enough for you, leave it :)