From owner-freebsd-fs@FreeBSD.ORG  Sun Apr 12 03:52:07 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 17D94106566C
	for <freebsd-fs@freebsd.org>; Sun, 12 Apr 2009 03:52:07 +0000 (UTC)
	(envelope-from james-freebsd-fs2@jrv.org)
Received: from mail.jrv.org (rrcs-24-73-246-106.sw.biz.rr.com [24.73.246.106])
	by mx1.freebsd.org (Postfix) with ESMTP id BA1458FC12
	for <freebsd-fs@freebsd.org>; Sun, 12 Apr 2009 03:52:06 +0000 (UTC)
	(envelope-from james-freebsd-fs2@jrv.org)
Received: from kremvax.housenet.jrv (kremvax.housenet.jrv [192.168.3.124])
	by mail.jrv.org (8.14.3/8.14.3) with ESMTP id n3C3TbbM001217
	for <freebsd-fs@freebsd.org>; Sat, 11 Apr 2009 22:29:37 -0500 (CDT)
	(envelope-from james-freebsd-fs2@jrv.org)
Authentication-Results: mail.jrv.org; domainkeys=pass (testing)
	header.from=james-freebsd-fs2@jrv.org
DomainKey-Signature: a=rsa-sha1; s=enigma; d=jrv.org; c=nofws; q=dns;
	h=message-id:date:from:user-agent:mime-version:to:subject:
	content-type:content-transfer-encoding;
	b=RZT1d3Jtiaaqs+lcx7ZRlEJiCDm6J47s1UYDP7P3b56PELUayAhKcfGQXF+Xt+adH
	1BVOUW9/cIvI3QLxRYp6iVsWEfJyszcS2ySiTKMz47pTtAxOj3TCi1OdZsnggLtx5cB
	XBL6pGxVnDpozfisXkrwXj81lujylvKcbxRUu9Y=
Message-ID: <49E16021.6040900@jrv.org>
Date: Sat, 11 Apr 2009 22:29:37 -0500
From: "James R. Van Artsdalen" <james-freebsd-fs2@jrv.org>
User-Agent: Thunderbird 2.0.0.21 (Macintosh/20090302)
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: turning off ZFS mountpoint property behavior?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 12 Apr 2009 03:52:07 -0000

Is there a knob to turn off ZFS's mounting of filesystems based on the
mountpoint property?  It is most unhelpful when receiving replicas of
filesystems to have a received snapshot suddenly mounted over /usr.

I have two systems "prime" and "subprime", both of which have a large
ZFS pool and a small UFS partition for maintenance.  They are
essentially the same except that /boot/loader.conf boots one into ZFS
and the other into UFS.

"prime" is the operational server using ZFS.  "subprime" is essentially
a hot spare booting UFS whose ZFS pool is to be kept in sync with the
pool on "prime" sync zfs send/recv replication.  Should the pool on
"prime" fail, /boot/load.conf on "subprime" can be changed to boot its
ZFS pool and the server is quickly available again, at the last snapshot
replicated.

Unfortunately when zfs recv runs and it receive a filesystem with
property mountpoint=/usr it mounts that filesystem there.  That's not
desirable in my situation nor I suspect many others.

Is there a sysctl or some other way to disable the automatic mount behavior?

From owner-freebsd-fs@FreeBSD.ORG  Sun Apr 12 20:06:18 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BBC32106566C
	for <freebsd-fs@freebsd.org>; Sun, 12 Apr 2009 20:06:18 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from acadia.cs.uoguelph.ca (acadia.cs.uoguelph.ca [131.104.94.221])
	by mx1.freebsd.org (Postfix) with ESMTP id 7DB618FC0A
	for <freebsd-fs@freebsd.org>; Sun, 12 Apr 2009 20:06:18 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102])
	by acadia.cs.uoguelph.ca (8.13.1/8.13.1) with ESMTP id n3CK6Hd3002721
	for <freebsd-fs@freebsd.org>; Sun, 12 Apr 2009 16:06:17 -0400
Received: from localhost (rmacklem@localhost)
	by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id
	n3CKCed06307
	for <freebsd-fs@freebsd.org>; Sun, 12 Apr 2009 16:12:40 -0400 (EDT)
X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing
	-bs
Date: Sun, 12 Apr 2009 16:12:40 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
X-X-Sender: rmacklem@muncher.cs.uoguelph.ca
To: freebsd-fs@freebsd.org
Message-ID: <Pine.GSO.4.63.0904121553050.3872@muncher.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Scanned-By: MIMEDefang 2.63 on 131.104.94.221
Subject: changing semantics of the va_filerev (code review)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 12 Apr 2009 20:06:19 -0000

In summary, the nfsv4 server needs 3 changes to the FreeBSD kernel:
1 - Sharing of nfssvc(). (This was just checked into FreeBSD-CURRENT.)
2 - Some calls that recall delegations must be done before local
     VOP_RENAME() and VOP_ADVLOCK(). I am waiting for comments to a
     vague post on this before I mail my first stab at coding this.
3 - Support for the Change attribute, which is what this post is about.

Once the above 3 things are resolved, the code should drop in without
further changes outside of its subtree.

As background, I believe va_filerev/i_modrev was added for nqnfs
long long ago. Since it is not exposed to userland via the stat structure,
I don't believe anything outside of the kernel uses it. Inside the kernel,
the only thing that currently uses it is the nfs server, which uses it as
the cookie verifier. (It really doesn't use it, since a client 
regurgitates it back to the server as opaque bits in the next readdir rpc
and the server then ignores those bits. This is correct, since va_filerev 
is a bogus cookie verifier.) As such, I don't believe changing the 
semantics of va_filerev will break anything in FreeBSD.

I'd like to change the semantics of va_filerev so that it can be used
by the nfsv4 server as the Change attribute. To do this, it needs to
change in 2 ways:
- must change upon metadata changes as well as data changes
- must persist across server reboots (ie. be moved to spare space in
 	the on-disk i-node instead of in memory i-node)

Here is the patch to ufs for the above, that I have been using for some
time. Please review and comment.

Thanks, rick
--- ufs patch to change va_filerev semantics ---
--- ufs/ufs/inode.h.sav	2009-04-12 02:29:05.000000000 -0400
+++ ufs/ufs/inode.h	2009-03-20 12:18:20.000000000 -0400
@@ -74,7 +74,6 @@

  	struct	 fs *i_fs;	/* Associated filesystem superblock. */
  	struct	 dquot *i_dquot[MAXQUOTAS]; /* Dquot structures. */
-	u_quad_t i_modrev;	/* Revision level for NFS lease. */
  	/*
  	 * Side effects; used during directory lookup.
  	 */
--- ufs/ufs/dinode.h.sav	2009-04-12 02:29:40.000000000 -0400
+++ ufs/ufs/dinode.h	2008-08-25 17:31:55.000000000 -0400
@@ -145,7 +145,8 @@
  	ufs2_daddr_t	di_extb[NXADDR];/*  96: External attributes block. */
  	ufs2_daddr_t	di_db[NDADDR];	/* 112: Direct disk blocks. */
  	ufs2_daddr_t	di_ib[NIADDR];	/* 208: Indirect disk blocks. */
-	int64_t		di_spare[3];	/* 232: Reserved; currently unused */
+	u_int64_t	di_modrev;	/* 232: i_modrev for NFSv4 */
+	int64_t		di_spare[2];	/* 240: Reserved; currently unused */
  };

  /*
@@ -183,7 +184,7 @@
  	int32_t		di_gen;		/* 108: Generation number. */
  	u_int32_t	di_uid;		/* 112: File owner. */
  	u_int32_t	di_gid;		/* 116: File group. */
-	int32_t		di_spare[2];	/* 120: Reserved; currently unused */
+	u_int64_t	di_modrev;	/* 120: i_modrev for NFSv4 */
  };
  #define	di_ogid		di_u.oldids[1]
  #define	di_ouid		di_u.oldids[0]
--- ufs/ufs/ufs_vnops.c.sav	2009-04-12 02:28:41.000000000 -0400
+++ ufs/ufs/ufs_vnops.c	2009-03-10 16:47:11.000000000 -0400
@@ -157,11 +157,12 @@
  	if (ip->i_flag & IN_UPDATE) {
  		DIP_SET(ip, i_mtime, ts.tv_sec);
  		DIP_SET(ip, i_mtimensec, ts.tv_nsec);
-		ip->i_modrev++;
+		DIP_SET(ip, i_modrev, DIP(ip, i_modrev) + 1);
  	}
  	if (ip->i_flag & IN_CHANGE) {
  		DIP_SET(ip, i_ctime, ts.tv_sec);
  		DIP_SET(ip, i_ctimensec, ts.tv_nsec);
+		DIP_SET(ip, i_modrev, DIP(ip, i_modrev) + 1);
  	}

   out:
@@ -446,6 +447,7 @@
  		vap->va_ctime.tv_sec = ip->i_din1->di_ctime;
  		vap->va_ctime.tv_nsec = ip->i_din1->di_ctimensec;
  		vap->va_bytes = dbtob((u_quad_t)ip->i_din1->di_blocks);
+		vap->va_filerev = ip->i_din1->di_modrev;
  	} else {
  		vap->va_rdev = ip->i_din2->di_rdev;
  		vap->va_size = ip->i_din2->di_size;
@@ -456,12 +458,12 @@
  		vap->va_birthtime.tv_sec = ip->i_din2->di_birthtime;
  		vap->va_birthtime.tv_nsec = ip->i_din2->di_birthnsec;
  		vap->va_bytes = dbtob((u_quad_t)ip->i_din2->di_blocks);
+		vap->va_filerev = ip->i_din2->di_modrev;
  	}
  	vap->va_flags = ip->i_flags;
  	vap->va_gen = ip->i_gen;
  	vap->va_blocksize = vp->v_mount->mnt_stat.f_iosize;
  	vap->va_type = IFTOVT(ip->i_mode);
-	vap->va_filerev = ip->i_modrev;
  	return (0);
  }

@@ -2223,7 +2225,6 @@
  	ASSERT_VOP_LOCKED(vp, "ufs_vinit");
  	if (ip->i_number == ROOTINO)
  		vp->v_vflag |= VV_ROOT;
-	ip->i_modrev = init_va_filerev();
  	*vpp = vp;
  	return (0);
  }

From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 13 01:28:09 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9090C106572A
	for <freebsd-fs@freebsd.org>; Mon, 13 Apr 2009 01:28:09 +0000 (UTC)
	(envelope-from tcberner@gmail.com)
Received: from mail-fx0-f167.google.com (mail-fx0-f167.google.com
	[209.85.220.167])
	by mx1.freebsd.org (Postfix) with ESMTP id 15C0A8FC17
	for <freebsd-fs@freebsd.org>; Mon, 13 Apr 2009 01:28:08 +0000 (UTC)
	(envelope-from tcberner@gmail.com)
Received: by fxm11 with SMTP id 11so1852547fxm.43
	for <freebsd-fs@freebsd.org>; Sun, 12 Apr 2009 18:28:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:date:to:subject:from
	:organization:content-type:mime-version:content-transfer-encoding
	:message-id:user-agent;
	bh=11U0ZIBBiEtJvEyKwNts20VuXKZyt72X6I147uzTzkQ=;
	b=X/nlgiAaVZMqawTRwRsiq2Y2g83p3BTS9ViFzCz/k5igyX/sxr0TO84J6Ll2VOwsWN
	WFTeakswE9D7T6CNjVWZ7I56HuNRbuleERagRu/Mfm9GKHEhORDr57zz5ed6aiKSHTYv
	9M0ZP4MNYG0BaUxCJUwhC8yfLejwFrU8VHyPM=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=date:to:subject:from:organization:content-type:mime-version
	:content-transfer-encoding:message-id:user-agent;
	b=DHRBVv1U8015QFrv3SDzGlDoPqBKdnaiHgdM3I0I9EYlLtS/a5OuV15yFC/KgHfGRY
	qyVnkvmAmH0LIzQTZkzifjEkDg/JXwPbC7uu0MPLLzKUBiq+hxGfd8f3qEjnxDTmKsa8
	LbHShNYt0LtNAbvsnaGfhO9/S0k9lPg+4D/4o=
Received: by 10.86.61.13 with SMTP id j13mr4321661fga.65.1239584625706;
	Sun, 12 Apr 2009 18:03:45 -0700 (PDT)
Received: from sam.firefly (132-37.105-92.cust.bluewin.ch [92.105.37.132])
	by mx.google.com with ESMTPS id 12sm6360561fgg.22.2009.04.12.18.03.45
	(version=TLSv1/SSLv3 cipher=RC4-MD5);
	Sun, 12 Apr 2009 18:03:45 -0700 (PDT)
Date: Mon, 13 Apr 2009 03:03:44 +0200
To: freebsd-fs@freebsd.org
From: "Tobias C. Berner" <tcberner@gmail.com>
Organization: -
Content-Type: text/plain; charset=utf-8
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Message-ID: <op.usavwhqoa4qlja@sam.firefly>
User-Agent: Opera Mail/10.00 (FreeBSD)
Subject: zfs and moving devices
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Apr 2009 01:28:12 -0000

Hi

I have a zfs pool 

	NAME        STATE     READ WRITE CKSUM
	multimedia  ONLINE       0     0     0
	  ad8       ONLINE       0     0     0
	  ad10      ONLINE       0     0     0
	  ad12      ONLINE       0     0     0
	  ad14      ONLINE       0     0     0

Now, I need more sata-connecters. If I activate 
an other onboard-controller, the device names 
move: 

   ad8  -> ad14
   ad10 -> ad16
   ad12 -> ad18
   ad14 -> ad20


What is the proper way to handle this in zfs?


thanks, Tobias 


From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 13 04:23:30 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DD7B31065673
	for <freebsd-fs@freebsd.org>; Mon, 13 Apr 2009 04:23:30 +0000 (UTC)
	(envelope-from dimitar.vassilev@gmail.com)
Received: from mail-ew0-f171.google.com (mail-ew0-f171.google.com
	[209.85.219.171])
	by mx1.freebsd.org (Postfix) with ESMTP id 4865E8FC16
	for <freebsd-fs@freebsd.org>; Mon, 13 Apr 2009 04:23:30 +0000 (UTC)
	(envelope-from dimitar.vassilev@gmail.com)
Received: by ewy19 with SMTP id 19so1820821ewy.43
	for <freebsd-fs@freebsd.org>; Sun, 12 Apr 2009 21:23:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:in-reply-to:references
	:date:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=f0Vx2jMSOf0JY9YL1UdU4EfcnWHbDrMmI1XgUwR/Uus=;
	b=FTQoVkT5SJ8eoou89l+ZNNx0PaNx1Xem0WQmgBkbgq7oxCGJXWKa/7bmpzMhB2pHvp
	293p91OCBOghYo1Ng7jg6Bmp8wXsXejvr2WZNnuzZE7a7r9NBqBsgeIdZI3p9veWtd+/
	P1JIUWb41mOTn3pYwdZYNy4A7bY/o9p6JCAQQ=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	b=PN/PenCmDa1DWlvmk9uSzUbRFEZN2GmQcw6UeImfy64FtaT34EEhFxwRzkg5rrx+L0
	3TVeQu3rSJK2ePXtd8zr0qgjNfSgM0vISjGEBx7tDLTNxFjXuFbZxjw6h1k0o/IcgE4B
	KWyPMjw1FengEv1YZt/Gshg142plxbTrr1+uU=
MIME-Version: 1.0
Received: by 10.216.72.209 with SMTP id t59mr1429829wed.27.1239594852050; Sun, 
	12 Apr 2009 20:54:12 -0700 (PDT)
In-Reply-To: <op.usavwhqoa4qlja@sam.firefly>
References: <op.usavwhqoa4qlja@sam.firefly>
Date: Mon, 13 Apr 2009 06:54:12 +0300
Message-ID: <59adc1a0904122054j52cf9c60h6b3909379e04463@mail.gmail.com>
From: Dimitar Vasilev <dimitar.vassilev@gmail.com>
To: "Tobias C. Berner" <tcberner@gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org
Subject: Re: zfs and moving devices
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Apr 2009 04:23:31 -0000

2009/4/13 Tobias C. Berner <tcberner@gmail.com>:
> Hi
>
> I have a zfs pool
>
> =C2=A0 =C2=A0 =C2=A0 =C2=A0NAME =C2=A0 =C2=A0 =C2=A0 =C2=A0STATE =C2=A0 =
=C2=A0 READ WRITE CKSUM
> =C2=A0 =C2=A0 =C2=A0 =C2=A0multimedia =C2=A0ONLINE =C2=A0 =C2=A0 =C2=A0 0=
 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 0
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0ad8 =C2=A0 =C2=A0 =C2=A0 ONLINE =C2=A0 =
=C2=A0 =C2=A0 0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 0
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0ad10 =C2=A0 =C2=A0 =C2=A0ONLINE =C2=A0 =
=C2=A0 =C2=A0 0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 0
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0ad12 =C2=A0 =C2=A0 =C2=A0ONLINE =C2=A0 =
=C2=A0 =C2=A0 0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 0
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0ad14 =C2=A0 =C2=A0 =C2=A0ONLINE =C2=A0 =
=C2=A0 =C2=A0 0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 0
>
> Now, I need more sata-connecters. If I activate
> an other onboard-controller, the device names
> move:
>
> =C2=A0 ad8 =C2=A0-> ad14
> =C2=A0 ad10 -> ad16
> =C2=A0 ad12 -> ad18
> =C2=A0 ad14 -> ad20
>
>
> What is the proper way to handle this in zfs?
>
>
> thanks, Tobias
>
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>

There was an option for ata_static_id's in $KERNCONF  - you need to
enable this to keep the sata from shifting.Don't remember the exact
magic instance - should be somewhere in LINT/hint/GENERIC.
Should resemble something like ATA_STATIC_ID.
Cheers,
Dimitar

From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 13 05:05:51 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2F18E1065674
	for <freebsd-fs@freebsd.org>; Mon, 13 Apr 2009 05:05:51 +0000 (UTC)
	(envelope-from gpalmer@freebsd.org)
Received: from noop.in-addr.com (mail.in-addr.com [IPv6:2001:470:8:162::1])
	by mx1.freebsd.org (Postfix) with ESMTP id 0678A8FC0A
	for <freebsd-fs@freebsd.org>; Mon, 13 Apr 2009 05:05:51 +0000 (UTC)
	(envelope-from gpalmer@freebsd.org)
Received: from gjp by noop.in-addr.com with local (Exim 4.54 (FreeBSD))
	id 1LtEMg-0009pT-C4
	for freebsd-fs@freebsd.org; Mon, 13 Apr 2009 01:05:50 -0400
Date: Mon, 13 Apr 2009 01:05:50 -0400
From: Gary Palmer <gpalmer@freebsd.org>
To: freebsd-fs@freebsd.org
Message-ID: <20090413050550.GA44022@in-addr.com>
References: <op.usavwhqoa4qlja@sam.firefly>
	<59adc1a0904122054j52cf9c60h6b3909379e04463@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <59adc1a0904122054j52cf9c60h6b3909379e04463@mail.gmail.com>
Subject: Re: zfs and moving devices
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Apr 2009 05:05:51 -0000

On Mon, Apr 13, 2009 at 06:54:12AM +0300, Dimitar Vasilev wrote:
> 2009/4/13 Tobias C. Berner <tcberner@gmail.com>:
> > Hi
> >
> > I have a zfs pool
> >
> > ?? ?? ?? ??NAME ?? ?? ?? ??STATE ?? ?? READ WRITE CKSUM
> > ?? ?? ?? ??multimedia ??ONLINE ?? ?? ?? 0 ?? ?? 0 ?? ?? 0
> > ?? ?? ?? ?? ??ad8 ?? ?? ?? ONLINE ?? ?? ?? 0 ?? ?? 0 ?? ?? 0
> > ?? ?? ?? ?? ??ad10 ?? ?? ??ONLINE ?? ?? ?? 0 ?? ?? 0 ?? ?? 0
> > ?? ?? ?? ?? ??ad12 ?? ?? ??ONLINE ?? ?? ?? 0 ?? ?? 0 ?? ?? 0
> > ?? ?? ?? ?? ??ad14 ?? ?? ??ONLINE ?? ?? ?? 0 ?? ?? 0 ?? ?? 0
> >
> > Now, I need more sata-connecters. If I activate
> > an other onboard-controller, the device names
> > move:
> >
> > ?? ad8 ??-> ad14
> > ?? ad10 -> ad16
> > ?? ad12 -> ad18
> > ?? ad14 -> ad20
> >
> >
> > What is the proper way to handle this in zfs?
> >
> >
> > thanks, Tobias
> >
> > _______________________________________________
> > freebsd-fs@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> >
> 
> There was an option for ata_static_id's in $KERNCONF  - you need to
> enable this to keep the sata from shifting.Don't remember the exact
> magic instance - should be somewhere in LINT/hint/GENERIC.
> Should resemble something like ATA_STATIC_ID.

% grep STATIC /sys/i386/conf/GENERIC
options         ATA_STATIC_ID   # Static device numbering

Regards,

Gary

From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 13 05:22:01 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 279511065672
	for <freebsd-fs@freebsd.org>; Mon, 13 Apr 2009 05:22:01 +0000 (UTC)
	(envelope-from chris@young-alumni.com)
Received: from mail.oldschoolpunx.net (cpe-66-68-98-234.austin.res.rr.com
	[66.68.98.234]) by mx1.freebsd.org (Postfix) with ESMTP id F381E8FC15
	for <freebsd-fs@freebsd.org>; Mon, 13 Apr 2009 05:22:00 +0000 (UTC)
	(envelope-from chris@young-alumni.com)
Received: by mail.oldschoolpunx.net (Postfix, from userid 58)
	id 1A70E8D0E5; Mon, 13 Apr 2009 00:22:00 -0500 (CDT)
Received: from [192.168.8.100] (unknown [192.168.8.100])
	by mail.oldschoolpunx.net (Postfix) with ESMTPSA id 6C0A98D0CB
	for <freebsd-fs@freebsd.org>; Mon, 13 Apr 2009 00:18:06 -0500 (CDT)
Resent-To: freebsd-fs@freebsd.org
From: Chris Ruiz <chris@young-alumni.com>
To: Tobias C. Berner <tcberner@gmail.com>
In-Reply-To: <op.usavwhqoa4qlja@sam.firefly>
Resent-From: Chris Ruiz <chris@young-alumni.com>
References: <op.usavwhqoa4qlja@sam.firefly>
Message-Id: <C1745031-6EDD-46B5-A4C4-18BFB8DE9201@young-alumni.com>
Content-Type: text/plain; charset=US-ASCII; format=flowed
Content-Transfer-Encoding: 7bit
Resent-Date: Mon, 13 Apr 2009 00:18:06 -0500
Mime-Version: 1.0 (Apple Message framework v930.3)
Date: Mon, 13 Apr 2009 00:17:43 -0500
X-Mailer: Apple Mail (2.930.3)
Resent-Message-Id: <20090413051806.6C0A98D0CB@mail.oldschoolpunx.net>
Cc: 
Subject: Re: zfs and moving devices
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Apr 2009 05:22:01 -0000


On Apr 12, 2009, at 8:03 PM, Tobias C. Berner wrote:

> Hi
>
> I have a zfs pool
>
> 	NAME        STATE     READ WRITE CKSUM
> 	multimedia  ONLINE       0     0     0
> 	  ad8       ONLINE       0     0     0
> 	  ad10      ONLINE       0     0     0
> 	  ad12      ONLINE       0     0     0
> 	  ad14      ONLINE       0     0     0
>
> Now, I need more sata-connecters. If I activate
> an other onboard-controller, the device names
> move:
>
>  ad8  -> ad14
>  ad10 -> ad16
>  ad12 -> ad18
>  ad14 -> ad20
>
>
> What is the proper way to handle this in zfs?

ZFS should just find the pool even though the device names have changed.

Chris

From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 13 07:34:48 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3F0341065670
	for <freebsd-fs@freebsd.org>; Mon, 13 Apr 2009 07:34:48 +0000 (UTC)
	(envelope-from morganw@chemikals.org)
Received: from warped.bluecherry.net (unknown [IPv6:2001:440:eeee:fffb::2])
	by mx1.freebsd.org (Postfix) with ESMTP id CFBA48FC0C
	for <freebsd-fs@freebsd.org>; Mon, 13 Apr 2009 07:34:47 +0000 (UTC)
	(envelope-from morganw@chemikals.org)
Received: from volatile.chemikals.org (adsl-67-127-25.shv.bellsouth.net
	[98.67.127.25])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client did not present a certificate)
	by warped.bluecherry.net (Postfix) with ESMTPSA id C95C58057196;
	Mon, 13 Apr 2009 02:34:44 -0500 (CDT)
Received: from localhost (morganw@localhost [127.0.0.1])
	by volatile.chemikals.org (8.14.3/8.14.3) with ESMTP id n3D7YL6T078269; 
	Mon, 13 Apr 2009 02:34:37 -0500 (CDT)
	(envelope-from morganw@chemikals.org)
Date: Mon, 13 Apr 2009 02:34:21 -0500 (CDT)
From: Wes Morgan <morganw@chemikals.org>
To: "Tobias C. Berner" <tcberner@gmail.com>
In-Reply-To: <op.usavwhqoa4qlja@sam.firefly>
Message-ID: <alpine.BSF.2.00.0904130230280.1902@ibyngvyr.purzvxnyf.bet>
References: <op.usavwhqoa4qlja@sam.firefly>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@freebsd.org
Subject: Re: zfs and moving devices
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Apr 2009 07:34:48 -0000

On Mon, 13 Apr 2009, Tobias C. Berner wrote:

> I have a zfs pool
>
> 	NAME        STATE     READ WRITE CKSUM
> 	multimedia  ONLINE       0     0     0
> 	  ad8       ONLINE       0     0     0
> 	  ad10      ONLINE       0     0     0
> 	  ad12      ONLINE       0     0     0
> 	  ad14      ONLINE       0     0     0
>
> Now, I need more sata-connecters. If I activate
> an other onboard-controller, the device names
> move:
>
>   ad8  -> ad14
>   ad10 -> ad16
>   ad12 -> ad18
>   ad14 -> ad20
>
>
> What is the proper way to handle this in zfs?

Export the pool before you make the change and it should work no problem. 
You may want to enable ATA_STATIC_ID as well so you won't have to worry 
about it either.

On another note, that's a 4 device pool with no redundancy. Make sure you 
have frequent backups! I lost my "multimedia" pool once during a migration 
and was very sad. Now I use raidz2.

From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 13 10:18:10 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1396B10656C0
	for <freebsd-fs@freebsd.org>; Mon, 13 Apr 2009 10:18:10 +0000 (UTC)
	(envelope-from tcberner@gmail.com)
Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.159])
	by mx1.freebsd.org (Postfix) with ESMTP id 8F5BE8FC1A
	for <freebsd-fs@freebsd.org>; Mon, 13 Apr 2009 10:18:09 +0000 (UTC)
	(envelope-from tcberner@gmail.com)
Received: by fg-out-1718.google.com with SMTP id 13so538918fge.12
	for <freebsd-fs@freebsd.org>; Mon, 13 Apr 2009 03:18:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:date:to:subject:from
	:organization:content-type:mime-version:references
	:content-transfer-encoding:message-id:in-reply-to:user-agent;
	bh=q5fTn9NP5SuVpPp9E+UZqwlOZ+Pd7L5bZxSiMay1viU=;
	b=aHLZEOxbrEJtmC/brYYfym6rwdqimPcI8bRtPCgYliNFnvVhU8PzScn9KoyWNSgdMS
	EzKGMQig+tLXhlvr7hKIou0cM+mGcO8oRG0fFEyTveVyeX6X2I2n9KG9zWik9asXkCHL
	gJ9aal/iT46Zcqlbs09RUFGV/EMh2P3SNOt4k=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=date:to:subject:from:organization:content-type:mime-version
	:references:content-transfer-encoding:message-id:in-reply-to
	:user-agent;
	b=wzITWiZ+vNjrU29CzWe9UvhEnR9SDMAGWHE7VdlPu6sjDAqP8gnlGWV7Jvmx3J3fR/
	DYUGk5Oz7D4TQblMGuduDbgv0JHxJqV2s8kSLIFXDyd7beuhIk4eFU5YMuX8VY0jA0Z5
	l/dZp8pSyrY7cJhSEdrLCAuzzndwX0jvT6m6Q=
Received: by 10.86.1.1 with SMTP id 1mr4671555fga.0.1239617888548;
	Mon, 13 Apr 2009 03:18:08 -0700 (PDT)
Received: from sam.firefly (132-37.105-92.cust.bluewin.ch [92.105.37.132])
	by mx.google.com with ESMTPS id d4sm6456283fga.28.2009.04.13.03.18.07
	(version=TLSv1/SSLv3 cipher=RC4-MD5);
	Mon, 13 Apr 2009 03:18:08 -0700 (PDT)
Date: Mon, 13 Apr 2009 12:18:07 +0200
To: freebsd-fs@freebsd.org
From: "Tobias C. Berner" <tcberner@gmail.com>
Organization: -
Content-Type: text/plain; charset=utf-8
MIME-Version: 1.0
References: <op.usavwhqoa4qlja@sam.firefly>
	<alpine.BSF.2.00.0904130230280.1902@ibyngvyr.purzvxnyf.bet>
Content-Transfer-Encoding: 8bit
Message-ID: <op.usblkhnka4qlja@sam.firefly>
In-Reply-To: <alpine.BSF.2.00.0904130230280.1902@ibyngvyr.purzvxnyf.bet>
User-Agent: Opera Mail/10.00 (FreeBSD)
Subject: Re: zfs and moving devices
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Apr 2009 10:18:11 -0000

Am 13.04.2009, 09:34 Uhr, schrieb Wes Morgan <morganw@chemikals.org>:

> On Mon, 13 Apr 2009, Tobias C. Berner wrote:
>
>> I have a zfs pool
>>
>> 	NAME        STATE     READ WRITE CKSUM
>> 	multimedia  ONLINE       0     0     0
>> 	  ad8       ONLINE       0     0     0
>> 	  ad10      ONLINE       0     0     0
>> 	  ad12      ONLINE       0     0     0
>> 	  ad14      ONLINE       0     0     0
>>
>> Now, I need more sata-connecters. If I activate
>> an other onboard-controller, the device names
>> move:
>>
>>   ad8  -> ad14
>>   ad10 -> ad16
>>   ad12 -> ad18
>>   ad14 -> ad20
>>
>>
>> What is the proper way to handle this in zfs?
>
> Export the pool before you make the change and it should work no problem.
Ok, I will try that, 

> You may want to enable ATA_STATIC_ID as well so you won't have to worry
> about it either.
ATA_STATIC_ID is enabled:
   options 	ATA_STATIC_ID	# Static device numbering


thanks, Tobias

>
> On another note, that's a 4 device pool with no redundancy. Make sure you
> have frequent backups! I lost my "multimedia" pool once during a migration
> and was very sad. Now I use raidz2.


> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> 


-- 
Erstellt mit Operas revolutionärem E-Mail-Modul: http://www.opera.com/mail/


From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 13 10:52:40 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id AFECD106566B
	for <freebsd-fs@freebsd.org>; Mon, 13 Apr 2009 10:52:40 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail09.syd.optusnet.com.au (mail09.syd.optusnet.com.au
	[211.29.132.190])
	by mx1.freebsd.org (Postfix) with ESMTP id 4E4B38FC13
	for <freebsd-fs@freebsd.org>; Mon, 13 Apr 2009 10:52:40 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from c122-107-120-227.carlnfd1.nsw.optusnet.com.au
	(c122-107-120-227.carlnfd1.nsw.optusnet.com.au [122.107.120.227])
	by mail09.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	n3DAqbsI015286
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Mon, 13 Apr 2009 20:52:38 +1000
Date: Mon, 13 Apr 2009 20:52:37 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Rick Macklem <rmacklem@uoguelph.ca>
In-Reply-To: <Pine.GSO.4.63.0904121553050.3872@muncher.cs.uoguelph.ca>
Message-ID: <20090413193936.A52183@delplex.bde.org>
References: <Pine.GSO.4.63.0904121553050.3872@muncher.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@freebsd.org
Subject: Re: changing semantics of the va_filerev (code review)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Apr 2009 10:52:41 -0000

On Sun, 12 Apr 2009, Rick Macklem wrote:

> In summary, the nfsv4 server needs 3 changes to the FreeBSD kernel:
> ...
> 3 - Support for the Change attribute, which is what this post is about.
> ...
> As background, I believe va_filerev/i_modrev was added for nqnfs
> long long ago. Since it is not exposed to userland via the stat structure,

va_gen/i_gen/di_gen/st_gen seems to be even more suitable for this
purpose, but it isn't actually a file generation number like its
comments say (it is normally set to a random value on file creation
then never changed) and it is exposed to userland (st_gen).

> I don't believe anything outside of the kernel uses it. Inside the kernel,
> the only thing that currently uses it is the nfs server, which uses it as
> the cookie verifier. (It really doesn't use it, since a client regurgitates 
> it back to the server as opaque bits in the next readdir rpc
> and the server then ignores those bits. This is correct, since va_filerev is 
> a bogus cookie verifier.) As such, I don't believe changing the semantics of 
> va_filerev will break anything in FreeBSD.

va_gen isn't used much either.  In ext2fs, i_gen is a copy of the
on-disk field i_generation which is documented to be /* for NFS */ but
nfs doesn't use va_gen at all.  nfs3 (getattr, loadattrcache) doesn't
even initialize va_gen.  nfsv2 initializes it to a non-random value
based on a timestamp.  I'm not sure if it does this only on creation
or on every cache miss or on every call.  I think the uninitialized
va_gen gives stack garbage in st_gen, but in tests I get 0 for both
nfsv3 and nfsv2 (as root -- st_gen is always 0 for non-root).  I don't
understand the security issues for *_gen, but remember its being changed
for security.  cvs history shows that it used to actually be a generation
number in at least ffs, but for ffs files and not for individual file
changes (or for individual ffs file systems or all file systems).

> I'd like to change the semantics of va_filerev so that it can be used
> by the nfsv4 server as the Change attribute. To do this, it needs to
> change in 2 ways:
> - must change upon metadata changes as well as data changes
> - must persist across server reboots (ie. be moved to spare space in
> 	the on-disk i-node instead of in memory i-node)

Many nonstandard file systems, e.g., msdosfs, have no space to spare.

Read-only file systems like cd9660 and udf probably don't need a a
variable generation count (since they never change), but their current
handling of va_filerev and va_gen is wrong if these fields have any
other semantics.  These file systems just initialize va_gen to 1 for
all files and va_gen to 0 (with an XXX in udf only) for all files.

va_ctime should give what you want for all file systems, since it
should be increased whenever anything changes.  However, most file
systems always set the nsec part to 0, so va_ctime doesn't track
all file changes.  This is a problem for things like make(1) too,
so if nsec timestamps aren't available or are take too long or are
not fine-grained enough, the nsec part should be abused as a generation
counter so that any change gives a strictly larger timestamp.  The
case where someone sets the clock backwards is broken but won't
happen often.  Many nonstandard file systems, e.g., msdosfs, have
no space for an on-disk ctime, so they fake va_ctime using an on-disk
mtime.  Since such file systems don't have many attributes, only a
few more cases are broken.

> Here is the patch to ufs for the above, that I have been using for some
> time. Please review and comment.

> ...
> --- ufs/ufs/ufs_vnops.c.sav	2009-04-12 02:28:41.000000000 -0400
> +++ ufs/ufs/ufs_vnops.c	2009-03-10 16:47:11.000000000 -0400
> @@ -157,11 +157,12 @@
> 	if (ip->i_flag & IN_UPDATE) {
> 		DIP_SET(ip, i_mtime, ts.tv_sec);
> 		DIP_SET(ip, i_mtimensec, ts.tv_nsec);
> -		ip->i_modrev++;
> +		DIP_SET(ip, i_modrev, DIP(ip, i_modrev) + 1);
> 	}
> 	if (ip->i_flag & IN_CHANGE) {
> 		DIP_SET(ip, i_ctime, ts.tv_sec);
> 		DIP_SET(ip, i_ctimensec, ts.tv_nsec);
> +		DIP_SET(ip, i_modrev, DIP(ip, i_modrev) + 1);
> 	}

IN_UPDATE implies IN_CHANGE (unless there is a bug).  Thus the above
gives an extra increment.

Strictly, if you want to track _all_ metadata changes, then you need
an increment for IN_ACCESS, and va_ctime will no longer be nearly
usable since is not changed by read accesses.  I hope you don't want
this.

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 13 11:06:53 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 91B901065674
	for <freebsd-fs@FreeBSD.org>; Mon, 13 Apr 2009 11:06:53 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 7D81F8FC22
	for <freebsd-fs@FreeBSD.org>; Mon, 13 Apr 2009 11:06:53 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n3DB6rYJ084936
	for <freebsd-fs@FreeBSD.org>; Mon, 13 Apr 2009 11:06:53 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n3DB6qDT084932
	for freebsd-fs@FreeBSD.org; Mon, 13 Apr 2009 11:06:52 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 13 Apr 2009 11:06:52 GMT
Message-Id: <200904131106.n3DB6qDT084932@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
	owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@FreeBSD.org>
To: freebsd-fs@FreeBSD.org
Cc: 
Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Apr 2009 11:06:54 -0000

Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.


S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/133614  fs         [smbfs] [panic] panic: ffs_truncate: read-only filesys
o kern/133373  fs         [zfs] umass attachment causes ZFS checksum errors, dat
o kern/133174  fs         [msdosfs] [patch] msdosfs must support utf-encoded int
o kern/133150  fs         [zfs] Page fault with ZFS on 7.1-RELEASE/amd64 while w
o kern/133134  fs         [zfs] Missing ZFS zpool labels
o kern/132960  fs         [ufs] [panic] panic:ffs_blkfree: freeing free frag
o kern/132597  fs         [tmpfs] [panic] tmpfs-related panic while interrupting
o kern/132551  fs         [zfs] ZFS locks up on extattr_list_link syscall
o kern/132397  fs         reboot causes filesystem corruption (failure to sync b
o kern/132337  fs         [zfs] [panic] kernel panic in zfs_fuid_create_cred
o kern/132331  fs         [ufs] [lor] LOR ufs and syncer
o kern/132145  fs         [panic] File System Hard Crashes
f kern/132068  fs         [zfs] page fault when using ZFS over NFS on 7.1-RELEAS
o kern/131995  fs         [nfs] Failure to mount NFSv4 server
o kern/131360  fs         [nfs] poor scaling behavior of the NFS server under lo
o kern/131342  fs         [nfs] mounting/unmounting of disks causes NFS to fail
o bin/131341   fs         makefs: error "Bad file descriptor"  on the mount poin
o kern/131086  fs         [ext2fs] [patch] mkfs.ext2 creates rotten partition
o kern/131084  fs         [xfs] xfs destroys itself after copying data
o kern/131081  fs         [zfs] User cannot delete a file when a ZFS dataset is 
o kern/130979  fs         [smbfs] [panic] boot/kernel/smbfs.ko
o kern/130920  fs         [msdosfs] cp(1) takes 100% CPU time while copying file
o kern/130229  fs         [iconv] usermount fails on fs that need iconv
o kern/130210  fs         [nullfs] Error by check nullfs
o bin/130105   fs         [zfs] zfs send -R dumps core
o kern/129760  fs         [nfs] after 'umount -f' of a stale NFS share FreeBSD l
o kern/129231  fs         [ufs] [patch] New UFS mount (norandom) option - mostly
o kern/129152  fs         [panic] non-userfriendly panic when trying to mount(8)
f kern/128829  fs         smbd(8) causes periodic panic on 7-RELEASE
o kern/128633  fs         [zfs] [lor] lock order reversal in zfs
o kern/128514  fs         [zfs] [mpt] problems with ZFS and LSILogic SAS/SATA Ad
f kern/128173  fs         [ext2fs] ls gives "Input/output error" on mounted ext3
o kern/127420  fs         [gjournal] [panic] Journal overflow on gmirrored gjour
o kern/127213  fs         [tmpfs] sendfile on tmpfs data corruption
o kern/127029  fs         [panic] mount(8): trying to mount a write protected zi
o kern/126287  fs         [ufs] [panic] Kernel panics while mounting an UFS file
f kern/125536  fs         [ext2fs] ext 2 mounts cleanly but fails on commands li
o kern/125149  fs         [nfs] [panic] changing into .zfs dir from nfs client c
f kern/124621  fs         [ext3] [patch] Cannot mount ext2fs partition
o kern/122888  fs         [zfs] zfs hang w/ prefetch on, zil off while running t
o bin/122172   fs         [fs]: amd(8) automount daemon dies on 6.3-STABLE i386,
o bin/121072   fs         [smbfs] mount_smbfs(8) cannot normally convert the cha
o bin/118249   fs         mv(1): moving a directory changes its mtime
o kern/116170  fs         [panic] Kernel panic when mounting /tmp
o kern/114955  fs         [cd9660] [patch] [request] support for mask,dirmask,ui
o kern/114847  fs         [ntfs] [patch] [request] dirmask support for NTFS ala 
o kern/114676  fs         [ufs] snapshot creation panics: snapacct_ufs2: bad blo
o bin/114468   fs         [patch] [request] add -d option to umount(8) to detach
o bin/113838   fs         [patch] [request] mount(8): add support for relative p
o bin/113049   fs         [patch] [request] make quot(8) use getopt(3) and show 
o kern/112658  fs         [smbfs] [patch] smbfs and caching problems (resolves b
o kern/94769   fs         [ufs] Multiple file deletions on multi-snapshotted fil
o kern/93942   fs         [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D
o kern/92272   fs         [ffs] [hang] Filling a filesystem while creating a sna
o kern/89991   fs         [ufs] softupdates with mount -ur causes fs UNREFS
o kern/68978   fs         [panic] [ufs] crashes with failing hard disk, loose po
o kern/51685   fs         [hang] Unbounded inode allocation causes kernel to loc

57 problems total.


From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 13 15:07:08 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F067F106566B
	for <freebsd-fs@freebsd.org>; Mon, 13 Apr 2009 15:07:08 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from skerryvore.cs.uoguelph.ca (skerryvore.cs.uoguelph.ca
	[131.104.94.204])
	by mx1.freebsd.org (Postfix) with ESMTP id AB9858FC19
	for <freebsd-fs@freebsd.org>; Mon, 13 Apr 2009 15:07:08 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102])
	by skerryvore.cs.uoguelph.ca (8.13.1/8.13.1) with ESMTP id
	n3DF774G019406; Mon, 13 Apr 2009 11:07:07 -0400
Received: from localhost (rmacklem@localhost)
	by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id
	n3DFDXL03365; Mon, 13 Apr 2009 11:13:33 -0400 (EDT)
X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing
	-bs
Date: Mon, 13 Apr 2009 11:13:33 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
X-X-Sender: rmacklem@muncher.cs.uoguelph.ca
To: Bruce Evans <brde@optusnet.com.au>
In-Reply-To: <20090413193936.A52183@delplex.bde.org>
Message-ID: <Pine.GSO.4.63.0904131036310.27001@muncher.cs.uoguelph.ca>
References: <Pine.GSO.4.63.0904121553050.3872@muncher.cs.uoguelph.ca>
	<20090413193936.A52183@delplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Scanned-By: MIMEDefang 2.63 on 131.104.94.204
Cc: freebsd-fs@freebsd.org
Subject: Re: changing semantics of the va_filerev (code review)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Apr 2009 15:07:09 -0000


On Mon, 13 Apr 2009, Bruce Evans wrote:

>
> va_gen/i_gen/di_gen/st_gen seems to be even more suitable for this
> purpose, but it isn't actually a file generation number like its
> comments say (it is normally set to a random value on file creation
> then never changed) and it is exposed to userland (st_gen).
>
i_gen is used by NFS to create T-stable (valid for a long time, including
a long time after the file is removed) file handles. It is used by
ffs_vptofh() to create the file handles for NFS that are recognized as
representing removed files, even after an i-node gets reused such that
the i-node number now represents another file.

>
> va_gen isn't used much either.  In ext2fs, i_gen is a copy of the
> on-disk field i_generation which is documented to be /* for NFS */ but
> nfs doesn't use va_gen at all.  nfs3 (getattr, loadattrcache) doesn't
> even initialize va_gen.  nfsv2 initializes it to a non-random value
> based on a timestamp.  I'm not sure if it does this only on creation
> or on every cache miss or on every call.  I think the uninitialized
> va_gen gives stack garbage in st_gen, but in tests I get 0 for both
> nfsv3 and nfsv2 (as root -- st_gen is always 0 for non-root).  I don't
> understand the security issues for *_gen, but remember its being changed
> for security.  cvs history shows that it used to actually be a generation
> number in at least ffs, but for ffs files and not for individual file
> changes (or for individual ffs file systems or all file systems).
>
Its initial value doesn't matter for it to work correctly.
It should get incremented each time an i-node gets reused for a different
file. (That's what the ESTALE magic is, the server reporting
to the client that the file handle is for a file that has been removed.)

The "security" business is a bit bogus to me. It's one of those security
by obscurity tricks, imho. The problem was that a file handle was easy to 
fake when i_gen is initially 0. Initializing it to a non-zero value
makes faking one a little harder, but... Part of the reason for doing
this was that IP#s were only checked against exports at mount time on
some systems (BSD has never been this way) and faking the one file handle
for the root of the FS (root i-node#, i_gen == 0) bypassed exports and
tah dah.

>> I'd like to change the semantics of va_filerev so that it can be used
>> by the nfsv4 server as the Change attribute. To do this, it needs to
>> change in 2 ways:
>> - must change upon metadata changes as well as data changes
>> - must persist across server reboots (ie. be moved to spare space in
>> 	the on-disk i-node instead of in memory i-node)
>
> Many nonstandard file systems, e.g., msdosfs, have no space to spare.
>
If a file system can't support it correctly, faking it with something
like modify time is about all you can do. Since Change is supposed to
change on every file modification, this fails when multiple changes
occur within the same tod clock time or the clock gets reset backwards,
as you noted below. (Linux uses a modify time with a 1sec clock
resolution for Change, which isn't correct and the Linux nfs server
folks know that. Since this breaks the AIX nfsv4 client, the AIX folks
tend to remind them:-)

> Read-only file systems like cd9660 and udf probably don't need a a
> variable generation count (since they never change), but their current
> handling of va_filerev and va_gen is wrong if these fields have any
> other semantics.  These file systems just initialize va_gen to 1 for
> all files and va_gen to 0 (with an XXX in udf only) for all files.
>
Since they only need to change for modifications and their initial
values don't really matter, the above sounds fine to me.

> va_ctime should give what you want for all file systems, since it
> should be increased whenever anything changes.  However, most file
There are some places where IN_UPDATE gets set, but IN_CHANGE doesn't.
Since the Change attribute must change for every file modification, I
feel safer incrementing it for both IN_UPDATE and IN_CHANGE. (It's 64bits,
so it won't wrap around for a little while.)

> systems always set the nsec part to 0, so va_ctime doesn't track
> all file changes.  This is a problem for things like make(1) too,
> so if nsec timestamps aren't available or are take too long or are
> not fine-grained enough, the nsec part should be abused as a generation
> counter so that any change gives a strictly larger timestamp.  The
> case where someone sets the clock backwards is broken but won't
> happen often.
>
> Many nonstandard file systems, e.g., msdosfs, have
> no space for an on-disk ctime, so they fake va_ctime using an on-disk
> mtime.  Since such file systems don't have many attributes, only a
> few more cases are broken.
>
Yep, that's why ctime/mtime aren't sufficient.
If a read/write file system doesn't have support for it, all you
can do is fake it and hope the client works ok. I suspect the Linux
folks will eventually start to add support for it to ext3fs etc, because
of the above, but who knows. It seems that FreeBSD mostly uses FFS and
ZFS (which should have support for it, since the Solaris folks are into
NFSv4?), so at least we should be able to make those work correctly.

Have a good week, rick


From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 13 17:33:43 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2E73F106564A;
	Mon, 13 Apr 2009 17:33:43 +0000 (UTC)
	(envelope-from linimon@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 0393F8FC12;
	Mon, 13 Apr 2009 17:33:43 +0000 (UTC)
	(envelope-from linimon@FreeBSD.org)
Received: from freefall.freebsd.org (linimon@localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n3DHXgM2020276;
	Mon, 13 Apr 2009 17:33:42 GMT
	(envelope-from linimon@freefall.freebsd.org)
Received: (from linimon@localhost)
	by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n3DHXglX020272;
	Mon, 13 Apr 2009 17:33:42 GMT (envelope-from linimon)
Date: Mon, 13 Apr 2009 17:33:42 GMT
Message-Id: <200904131733.n3DHXglX020272@freefall.freebsd.org>
To: linimon@FreeBSD.org, freebsd-amd64@FreeBSD.org, freebsd-fs@FreeBSD.org
From: linimon@FreeBSD.org
Cc: 
Subject: Re: kern/133676: [smbfs] [panic] umount -f'ing a vnode-based memory
	disk from off a SMB share caused a reboot
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Apr 2009 17:33:43 -0000

Old Synopsis: umount -f'ing a vnode-based memory disk from off a SMB share caused a reboot
New Synopsis: [smbfs] [panic] umount -f'ing a vnode-based memory disk from off a SMB share caused a reboot

Responsible-Changed-From-To: freebsd-amd64->freebsd-fs
Responsible-Changed-By: linimon
Responsible-Changed-When: Mon Apr 13 17:31:51 UTC 2009
Responsible-Changed-Why: 
Reclassify and reassign.

http://www.freebsd.org/cgi/query-pr.cgi?pr=133676

From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 13 17:55:37 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 16E80106566C
	for <freebsd-fs@freebsd.org>; Mon, 13 Apr 2009 17:55:37 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 454AC8FC1A
	for <freebsd-fs@freebsd.org>; Mon, 13 Apr 2009 17:55:36 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 76A7946B5C;
	Mon, 13 Apr 2009 13:55:34 -0400 (EDT)
Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8])
	by bigwig.baldwin.cx (Postfix) with ESMTPA id A903F8A04D;
	Mon, 13 Apr 2009 13:54:45 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-fs@freebsd.org
Date: Mon, 13 Apr 2009 11:46:21 -0400
User-Agent: KMail/1.9.7
References: <Pine.GSO.4.63.0904091433590.24215@muncher.cs.uoguelph.ca>
In-Reply-To: <Pine.GSO.4.63.0904091433590.24215@muncher.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200904131146.21640.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1
	(bigwig.baldwin.cx); Mon, 13 Apr 2009 13:54:45 -0400 (EDT)
X-Virus-Scanned: clamav-milter 0.95 at bigwig.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=0.1 required=4.2 tests=RDNS_NONE autolearn=no
	version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx
Cc: 
Subject: Re: integrating nfsv4 locking with nlm and local locking
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Apr 2009 17:55:37 -0000

On Thursday 09 April 2009 3:04:37 pm Rick Macklem wrote:
> My nfsv4 server currently does VOP_ADVLOCK() with the non-blocking F_SETLK
> type and I had thought that was sufficient, but I now realize (thanks to
> a recent post by Zachary Loafman) that this breaks when a delegation for
> the file is issued to a client. (When a delegation for a file is issued
> to a client, it can do byte range locking locally, and the server doesn't
> know about these to do VOP_ADVLOCK() on the server machine.)
> 
> I believe that Zachary would like to discuss a more general solution, 
> including how to handle Open/Share locks, but in the meantime I'd like to 
> solve this specific case in as simple a way as possible.
> 
> Basically, I need a way to make sure delegations for a file don't exist
> when local byte range locking or locking via the NLM is being done on
> the file.
> 
> The simplest thing I can think of is the following:
> When VOP_ADVLOCK() is called for a file (outside of the nfsv4 server),
> do two things:
>  	1 - Make sure any outstanding delegations are recalled.
>              I already have a function that does this, so it is a matter
>              of figuring out where to put the call(s).
>  	2 - Set a flag on the vnode, so that my nfsv4 server knows not to
>  	    issue another delegation for that file.
>              (I could test for locks via VOP_ADVLOCK() before issuing a
>  	     delegation, but that has two problems.)
>  	    1 - Since the vnode is unlocked for VOP_ADVLOCK(), there could
>  	        be a race where the nfsv4 server issues a delegation
>                  between the time outstanding delegations are recalled at
>                  #1 above and the VOP_ADVLOCK() sets the lock that I would
>                  see during the test.
>              2 - It would have to keep checking for a lock and might issue
>  	        a delegation at a point where no lock is held, but one
>                  will be acquired soon, forcing the delegation recall.
>                  (It's much easier to not issue a delegation than recall
>                   one.)
>              Once this flag is set, I think it would be ok if the flag
>              remains set until the vnode is recycled, since it seems
>              fairly likely that, once byte range locking is done on a
>              file, more will happen.
>              (If people were agreeable to the vnode flag, it looks like
>               a VV_xxx flag would make more sense than a VI_xxx one. I
>               think an atomic_set_int() would be sufficient to set it,
>               even though the vnode lock isn't held?)

You have to hold the vnode lock to set a VV flag always.  Even if you do an 
atomic operation to set your flag, another thread might be setting a flag at 
the same time using non-atomic ops and could clobber your change (if it does 
a read-modify-write and reads a value that pre-dates your atomic_set_int() 
but its write posts after your write).

-- 
John Baldwin

From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 13 18:33:58 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BC683106566B
	for <freebsd-fs@freebsd.org>; Mon, 13 Apr 2009 18:33:58 +0000 (UTC)
	(envelope-from
	BATV+13135baec0e70ef2caf4+2059+infradead.org+hch@bombadil.srs.infradead.org)
Received: from bombadil.infradead.org (bombadil.infradead.org
	[IPv6:2001:4830:2446:ff00:214:51ff:fe65:c65c])
	by mx1.freebsd.org (Postfix) with ESMTP id 653D98FC29
	for <freebsd-fs@freebsd.org>; Mon, 13 Apr 2009 18:33:58 +0000 (UTC)
	(envelope-from
	BATV+13135baec0e70ef2caf4+2059+infradead.org+hch@bombadil.srs.infradead.org)
Received: from hch by bombadil.infradead.org with local (Exim 4.69 #1 (Red Hat
	Linux)) id 1LtQyd-0008Tf-IC; Mon, 13 Apr 2009 18:33:51 +0000
Date: Mon, 13 Apr 2009 14:33:51 -0400
From: Christoph Hellwig <hch@infradead.org>
To: Rick Macklem <rmacklem@uoguelph.ca>
Message-ID: <20090413183351.GA27610@infradead.org>
References: <Pine.GSO.4.63.0904121553050.3872@muncher.cs.uoguelph.ca>
	<20090413193936.A52183@delplex.bde.org>
	<Pine.GSO.4.63.0904131036310.27001@muncher.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <Pine.GSO.4.63.0904131036310.27001@muncher.cs.uoguelph.ca>
User-Agent: Mutt/1.5.18 (2008-05-17)
X-SRS-Rewrite: SMTP reverse-path rewritten from <hch@infradead.org> by
	bombadil.infradead.org See http://www.infradead.org/rpr.html
Cc: freebsd-fs@freebsd.org
Subject: Re: changing semantics of the va_filerev (code review)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Apr 2009 18:33:59 -0000

On Mon, Apr 13, 2009 at 11:13:33AM -0400, Rick Macklem wrote:
> If a file system can't support it correctly, faking it with something
> like modify time is about all you can do. Since Change is supposed to
> change on every file modification, this fails when multiple changes
> occur within the same tod clock time or the clock gets reset backwards,
> as you noted below. (Linux uses a modify time with a 1sec clock
> resolution for Change, which isn't correct and the Linux nfs server
> folks know that. Since this breaks the AIX nfsv4 client, the AIX folks
> tend to remind them:-)

Linux uses whatever granularity the underlying filesystems support.
For a lot of all designs that may be 1 second, for most recent
filesystems it's better.

> Yep, that's why ctime/mtime aren't sufficient.
> If a read/write file system doesn't have support for it, all you
> can do is fake it and hope the client works ok. I suspect the Linux
> folks will eventually start to add support for it to ext3fs etc, because
> of the above, but who knows. It seems that FreeBSD mostly uses FFS and
> ZFS (which should have support for it, since the Solaris folks are into
> NFSv4?), so at least we should be able to make those work correctly.

Linux already has the changecount in ext4 but the NFS server doesn't yet
use it.  Also it's beeing implemented for XFS and others.


From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 13 18:37:36 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C8CF5106566C;
	Mon, 13 Apr 2009 18:37:36 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from acadia.cs.uoguelph.ca (acadia.cs.uoguelph.ca [131.104.94.221])
	by mx1.freebsd.org (Postfix) with ESMTP id 6DFD48FC08;
	Mon, 13 Apr 2009 18:37:36 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102])
	by acadia.cs.uoguelph.ca (8.13.1/8.13.1) with ESMTP id n3DIbZxQ031333; 
	Mon, 13 Apr 2009 14:37:35 -0400
Received: from localhost (rmacklem@localhost)
	by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id
	n3DIi1n06133; Mon, 13 Apr 2009 14:44:02 -0400 (EDT)
X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing
	-bs
Date: Mon, 13 Apr 2009 14:44:01 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
X-X-Sender: rmacklem@muncher.cs.uoguelph.ca
To: John Baldwin <jhb@freebsd.org>
In-Reply-To: <200904131146.21640.jhb@freebsd.org>
Message-ID: <Pine.GSO.4.63.0904131431500.3907@muncher.cs.uoguelph.ca>
References: <Pine.GSO.4.63.0904091433590.24215@muncher.cs.uoguelph.ca>
	<200904131146.21640.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Scanned-By: MIMEDefang 2.63 on 131.104.94.221
Cc: freebsd-fs@freebsd.org
Subject: Re: integrating nfsv4 locking with nlm and local locking
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Apr 2009 18:37:37 -0000


On Mon, 13 Apr 2009, John Baldwin wrote:

>
> You have to hold the vnode lock to set a VV flag always.  Even if you do an
> atomic operation to set your flag, another thread might be setting a flag at
> the same time using non-atomic ops and could clobber your change (if it does
> a read-modify-write and reads a value that pre-dates your atomic_set_int()
> but its write posts after your write).
>
Righto, thanks. (I should have realized that.) I guess I'll have to use a
VI_xxx flag or add a field to the vnode to make the scheme work. I am
just trying to come up with a stopgap solution until something more
comprehensive can be done w.r.t. handling delegations.

VI_xxx are currently used for handling the vnode and it doesn't seem
appropriate to add one of these to indicate "don't issue delegations".

How do others feel w.r.t. adding a VI_xxx flag vs adding v_disabledelegate
to the structure?

There is always the fallback position of shipping an nfsv4 server with
delegations disabled, until handling them when local VOPs are done, gets
resolved.

rick


From owner-freebsd-fs@FreeBSD.ORG  Tue Apr 14 10:08:56 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8B0C110656F0
	for <freebsd-fs@freebsd.org>; Tue, 14 Apr 2009 10:08:56 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail10.syd.optusnet.com.au (mail10.syd.optusnet.com.au
	[211.29.132.191])
	by mx1.freebsd.org (Postfix) with ESMTP id 23D528FC08
	for <freebsd-fs@freebsd.org>; Tue, 14 Apr 2009 10:08:55 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from c122-107-120-227.carlnfd1.nsw.optusnet.com.au
	(c122-107-120-227.carlnfd1.nsw.optusnet.com.au [122.107.120.227])
	by mail10.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	n3EA8pmG014702
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 14 Apr 2009 20:08:53 +1000
Date: Tue, 14 Apr 2009 20:08:51 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Rick Macklem <rmacklem@uoguelph.ca>
In-Reply-To: <Pine.GSO.4.63.0904131036310.27001@muncher.cs.uoguelph.ca>
Message-ID: <20090414180826.J53102@delplex.bde.org>
References: <Pine.GSO.4.63.0904121553050.3872@muncher.cs.uoguelph.ca>
	<20090413193936.A52183@delplex.bde.org>
	<Pine.GSO.4.63.0904131036310.27001@muncher.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@freebsd.org
Subject: Re: changing semantics of the va_filerev (code review)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 14 Apr 2009 10:08:57 -0000

On Mon, 13 Apr 2009, Rick Macklem wrote:

> On Mon, 13 Apr 2009, Bruce Evans wrote:

>> va_gen/i_gen/di_gen/st_gen seems to be even more suitable for this
>> purpose, but it isn't actually a file generation number like its
>> comments say (it is normally set to a random value on file creation
>> then never changed) and it is exposed to userland (st_gen).
>> 
> i_gen is used by NFS to create T-stable (valid for a long time, including
> a long time after the file is removed) file handles. It is used by
> ffs_vptofh() to create the file handles for NFS that are recognized as
> representing removed files, even after an i-node gets reused such that
> the i-node number now represents another file.

Oops, I missed that since nfs's use of i_gen is indirect.  What does
nfs do for file systems that don't detect removed files, e.g., msdosfs.
vptofh and fhtovp routines seem to have too many differences.  E.g.,
file systems based on ffs return ESTALE for removed files, but
zfs_fhtovp() returns EINVAL.

I just noticed than the increment of i_gen was slightly broken for ffs
by a type mismatch in ffs2 (affects ffs1 too).  Originally, i_gen had
the same type as di_gen (int32_t).  Now i_gen has type int64_t but in
ffs1, di_gen of course still has type int32_t, and in ffs2, di_gen
still has type int32_t (apparently there was insufficient space to
expand it).  This makes the overflow check in ffs_alloc.c (++ip->i_gen
== 0) more broken than before.  Previously it only gave undefined
behaviour followed by a bogus check when overflow occurs for incrementing
from INT32_T_MAX.  Now it has no effect, since it takes 293 years of
incrementing at a rate of 1GHz to reach overflow at INT64_T_MAX.
Overflow now occurs on assignment to di_gen.

The result of this bug is almost the the same as removing the silly
part of the security code -- the re-randomization on overflow.  i_gen
may grow larger than UINT32_T_MAX, but usually refresh from the dinode
will keep it smaller.  When it starts near UINT32_T_MAX and grows
larger, the overflow on assignment and a subsequent refresh will make
it nearly 0.  Except, in 1 in every 2**32 cases, when the overflow makes
di_gen exactly 0, the subsequent refresh will randomize i_gen.

>> va_ctime should give what you want for all file systems, since it
>> should be increased whenever anything changes.  However, most file
> There are some places where IN_UPDATE gets set, but IN_CHANGE doesn't.

Are there?  This would be a bug.  I checked that ffs doesn't have this
bug.

> Since the Change attribute must change for every file modification, I
> feel safer incrementing it for both IN_UPDATE and IN_CHANGE. (It's 64bits,
> so it won't wrap around for a little while.)

It would be a large and obvious bug to modify the file data (IN_UPDATE)
without setting IN_CHANGE.

>> systems always set the nsec part to 0, so va_ctime doesn't track
>> all file changes.  This is a problem for things like make(1) too,
>> so if nsec timestamps aren't available or are take too long or are
>> not fine-grained enough, the nsec part should be abused as a generation
>> counter so that any change gives a strictly larger timestamp.  The
>> case where someone sets the clock backwards is broken but won't
>> happen often.
>> 
>> Many nonstandard file systems, e.g., msdosfs, have
>> no space for an on-disk ctime, so they fake va_ctime using an on-disk
>> mtime.  Since such file systems don't have many attributes, only a
>> few more cases are broken.
>> 
> Yep, that's why ctime/mtime aren't sufficient.
> If a read/write file system doesn't have support for it, all you
> can do is fake it and hope the client works ok. I suspect the Linux

They need to be fixed or faked well enough for make(1) too.

When the dinode has no space to spare, something can be done by keeping
state in the inode or vnode.  This won't work across reboots of course
(except by hashing a reboot counter into the generation counts or
timestamps) but might be enough for all short-term uses.  I'm not sure
how much is safe here.

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Tue Apr 14 15:52:35 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 056D41065670
	for <freebsd-fs@freebsd.org>; Tue, 14 Apr 2009 15:52:35 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from acadia.cs.uoguelph.ca (acadia.cs.uoguelph.ca [131.104.94.221])
	by mx1.freebsd.org (Postfix) with ESMTP id B63678FC12
	for <freebsd-fs@freebsd.org>; Tue, 14 Apr 2009 15:52:34 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102])
	by acadia.cs.uoguelph.ca (8.13.1/8.13.1) with ESMTP id n3EFqXxR008878; 
	Tue, 14 Apr 2009 11:52:33 -0400
Received: from localhost (rmacklem@localhost)
	by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id
	n3EFx2S20621; Tue, 14 Apr 2009 11:59:02 -0400 (EDT)
X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing
	-bs
Date: Tue, 14 Apr 2009 11:59:02 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
X-X-Sender: rmacklem@muncher.cs.uoguelph.ca
To: Bruce Evans <brde@optusnet.com.au>
In-Reply-To: <20090414180826.J53102@delplex.bde.org>
Message-ID: <Pine.GSO.4.63.0904141110590.13047@muncher.cs.uoguelph.ca>
References: <Pine.GSO.4.63.0904121553050.3872@muncher.cs.uoguelph.ca>
	<20090413193936.A52183@delplex.bde.org>
	<Pine.GSO.4.63.0904131036310.27001@muncher.cs.uoguelph.ca>
	<20090414180826.J53102@delplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Scanned-By: MIMEDefang 2.63 on 131.104.94.221
Cc: freebsd-fs@freebsd.org
Subject: Re: changing semantics of the va_filerev (code review)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 14 Apr 2009 15:52:35 -0000


On Tue, 14 Apr 2009, Bruce Evans wrote:

[stuff snipped]
>
> Oops, I missed that since nfs's use of i_gen is indirect.  What does
> nfs do for file systems that don't detect removed files, e.g., msdosfs.
> vptofh and fhtovp routines seem to have too many differences.  E.g.,
An nfs client can always think that a file exists for a short period of
time (until client side caches time out) after it has been removed locally
or by another client, on the server. The more serious failure occurs when
the i-node/directory entry gets reallocated. At that point, the client 
might access the attributes/data of the new file, thinking it was the old
file. (In the worst case, this could persist until the client does a
umount() of the file system.)

However, typically, unless it has the file open when the file is removed
locally on the server or by another client, nothing nasty will happen.
(And I think if the client has name caching disabled, nothing nasty can
happen.)

At least, that's my best guess at an answer.

> file systems based on ffs return ESTALE for removed files, but
> zfs_fhtovp() returns EINVAL.
>
I don't know why zfs would choose a different errno, but I don't think
that a different errno will have much effect. It's a terminal error
in either case. (I can't think of anything clever that a client can
do for ESTALE. I wouldn't be surprised if some clients end up translating
ESTALE to EINVAL, since POSIX apps don't expect ESTALE.)

I suppose someone could argue it violates the RFC, but only if they know
that the server should generate NFS3ERR_STALE for that case.

> I just noticed than the increment of i_gen was slightly broken for ffs
> by a type mismatch in ffs2 (affects ffs1 too).  Originally, i_gen had
> the same type as di_gen (int32_t).  Now i_gen has type int64_t but in
> ffs1, di_gen of course still has type int32_t, and in ffs2, di_gen
> still has type int32_t (apparently there was insufficient space to
> expand it).  This makes the overflow check in ffs_alloc.c (++ip->i_gen
> == 0) more broken than before.  Previously it only gave undefined
> behaviour followed by a bogus check when overflow occurs for incrementing
> from INT32_T_MAX.  Now it has no effect, since it takes 293 years of
> incrementing at a rate of 1GHz to reach overflow at INT64_T_MAX.
> Overflow now occurs on assignment to di_gen.
>
> The result of this bug is almost the the same as removing the silly
> part of the security code -- the re-randomization on overflow.  i_gen
> may grow larger than UINT32_T_MAX, but usually refresh from the dinode
> will keep it smaller.  When it starts near UINT32_T_MAX and grows
> larger, the overflow on assignment and a subsequent refresh will make
> it nearly 0.  Except, in 1 in every 2**32 cases, when the overflow makes
> di_gen exactly 0, the subsequent refresh will randomize i_gen.
>
Sounds like you have a better understanding of this than I. Since all
nfs really cares about is that the value of i_gen has changed after
the i-node is re-allocated, I doubt this causes grief in practice.
Personally, I'd just leave it as a 32bit number and initialize it
to some pseudo-random value in a range that is a small fraction of
UINT32_T_MAX (maybe 1<->1000000) if it is 0, otherwise just increment
it by a small value. (I've already noted that I'm not a big fan of
security by obscurity anyhow:-)

>>> va_ctime should give what you want for all file systems, since it
>>> should be increased whenever anything changes.  However, most file
>> There are some places where IN_UPDATE gets set, but IN_CHANGE doesn't.
>
> Are there?  This would be a bug.  I checked that ffs doesn't have this
> bug.
>
Oops, my mistake. I grep'd again and see it is IN_CHANGE that gets set
without IN_UPDATE and not the other way around, which makes sense, since
I can't think of how you can modify the data without modifying some
attribute.

So, the Change attribute only needs to change for IN_CHANGE (with all 
those uses of "change", it must be good:-). Thanks for pointing this out.

>
> They need to be fixed or faked well enough for make(1) too.
>
> When the dinode has no space to spare, something can be done by keeping
> state in the inode or vnode.  This won't work across reboots of course
> (except by hashing a reboot counter into the generation counts or
> timestamps) but might be enough for all short-term uses.  I'm not sure
> how much is safe here.
>
Yes, definitely. I think doing something like having an in-memory field
for va_filerev/i_modrev where the high order bits are initialized by
ctime (using whatever bits are valid, given tod clock resolution) when 
read in and then incrementing by 1 for each change, would be a good 
compromise.

rick

From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 16 11:47:54 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2ECDD106566B;
	Thu, 16 Apr 2009 11:47:54 +0000 (UTC)
	(envelope-from gavin@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 049A68FC18;
	Thu, 16 Apr 2009 11:47:54 +0000 (UTC)
	(envelope-from gavin@FreeBSD.org)
Received: from freefall.freebsd.org (gavin@localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n3GBlrWH082164;
	Thu, 16 Apr 2009 11:47:53 GMT
	(envelope-from gavin@freefall.freebsd.org)
Received: (from gavin@localhost)
	by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n3GBlr9C082160;
	Thu, 16 Apr 2009 11:47:53 GMT (envelope-from gavin)
Date: Thu, 16 Apr 2009 11:47:53 GMT
Message-Id: <200904161147.n3GBlr9C082160@freefall.freebsd.org>
To: gavin@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org
From: gavin@FreeBSD.org
Cc: 
Subject: Re: kern/65920: [nwfs] Mounted Netware filesystem behaves strange
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 16 Apr 2009 11:47:54 -0000

Synopsis: [nwfs] Mounted Netware filesystem behaves strange

Responsible-Changed-From-To: freebsd-bugs->freebsd-fs
Responsible-Changed-By: gavin
Responsible-Changed-When: Thu Apr 16 11:47:06 UTC 2009
Responsible-Changed-Why: 
Over to maintainer(s)

http://www.freebsd.org/cgi/query-pr.cgi?pr=65920

From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 16 12:34:49 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4FF92106586C
	for <freebsd-fs@freebsd.org>; Thu, 16 Apr 2009 12:34:49 +0000 (UTC)
	(envelope-from roberto@keltia.freenix.fr)
Received: from keltia.freenix.fr (keltia.freenix.org
	[IPv6:2001:660:330f:f820:213:72ff:fe15:f44])
	by mx1.freebsd.org (Postfix) with ESMTP id D73C78FC1C
	for <freebsd-fs@freebsd.org>; Thu, 16 Apr 2009 12:34:48 +0000 (UTC)
	(envelope-from roberto@keltia.freenix.fr)
Received: from localhost (localhost [127.0.0.1])
	by keltia.freenix.fr (Postfix/TLS) with ESMTP id A6A243BD92
	for <freebsd-fs@freebsd.org>; Thu, 16 Apr 2009 14:34:47 +0200 (CEST)
X-Virus-Scanned: amavisd-new at keltia.freenix.fr
Received: from keltia.freenix.fr ([127.0.0.1])
	by localhost (keltia.freenix.fr [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id 61TbHhkGpwDL for <freebsd-fs@freebsd.org>;
	Thu, 16 Apr 2009 14:34:47 +0200 (CEST)
Received: by keltia.freenix.fr (Postfix/TLS, from userid 101)
	id 4A5213BC9C; Thu, 16 Apr 2009 14:34:47 +0200 (CEST)
Date: Thu, 16 Apr 2009 14:34:47 +0200
From: Ollivier Robert <roberto@keltia.freenix.fr>
To: freebsd-fs@freebsd.org
Message-ID: <20090416123447.GB96263@keltia.freenix.fr>
References: <49E16021.6040900@jrv.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <49E16021.6040900@jrv.org>
X-Operating-System: MacOS X / Macbook Pro - FreeBSD 7 / Dell D820 SMP
User-Agent: Mutt/1.5.19 (2009-01-05)
Subject: Re: turning off ZFS mountpoint property behavior?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 16 Apr 2009 12:34:54 -0000

According to James R. Van Artsdalen:
> Unfortunately when zfs recv runs and it receive a filesystem with
> property mountpoint=/usr it mounts that filesystem there.  That's not
> desirable in my situation nor I suspect many others.
> 
> Is there a sysctl or some other way to disable the automatic mount behavior?

Have you tried to use legacy?

zfs set mountpoint=legacy tank/usr
-- 
Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto@keltia.freenix.fr
In memoriam to Ondine : http://ondine.keltia.net/

From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 16 16:01:30 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 65C30106566C
	for <freebsd-fs@freebsd.org>; Thu, 16 Apr 2009 16:01:30 +0000 (UTC)
	(envelope-from roberto@keltia.freenix.fr)
Received: from keltia.freenix.fr (keltia.freenix.org
	[IPv6:2001:660:330f:f820:213:72ff:fe15:f44])
	by mx1.freebsd.org (Postfix) with ESMTP id 03DAF8FC08
	for <freebsd-fs@freebsd.org>; Thu, 16 Apr 2009 16:01:30 +0000 (UTC)
	(envelope-from roberto@keltia.freenix.fr)
Received: from localhost (localhost [127.0.0.1])
	by keltia.freenix.fr (Postfix/TLS) with ESMTP id C6A3F3BDCA
	for <freebsd-fs@freebsd.org>; Thu, 16 Apr 2009 18:01:28 +0200 (CEST)
X-Virus-Scanned: amavisd-new at keltia.freenix.fr
Received: from keltia.freenix.fr ([127.0.0.1])
	by localhost (keltia.freenix.fr [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id oanFZnedmVjU for <freebsd-fs@freebsd.org>;
	Thu, 16 Apr 2009 18:01:28 +0200 (CEST)
Received: by keltia.freenix.fr (Postfix/TLS, from userid 101)
	id 6FF7D3BDC7; Thu, 16 Apr 2009 18:01:28 +0200 (CEST)
Date: Thu, 16 Apr 2009 18:01:28 +0200
From: Ollivier Robert <roberto@keltia.freenix.fr>
To: freebsd-fs@freebsd.org
Message-ID: <20090416160128.GA831@keltia.freenix.fr>
References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org>
	<C3970DC5-43A8-4F04-AC55-292F27A30275@lassitu.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <C3970DC5-43A8-4F04-AC55-292F27A30275@lassitu.de>
X-Operating-System: MacOS X / Macbook Pro - FreeBSD 7 / Dell D820 SMP
User-Agent: Mutt/1.5.19 (2009-01-05)
Subject: Re: Booting from ZFS raidz
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 16 Apr 2009 16:01:30 -0000

According to Stefan Bethke:
> Created a GPT label and one partition on each of the three drives:
> 
> 	gpart create -s gpt $1
> 	gpart add -b 34 -s 128 -t freebsd-boot $1
> 	gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 $1
> 	gpart add -b 512 -s 41900000 -t freebsd-zfs $1
> 	gpart list $1

Coming back to this thread, I'm playing with this setup (and the script
mentioned in another thread).  When I try to

zpool set bootfs=tank

with tank containing a raidz array, zpool refuses to set the property,
saying it is not available.  Using the same commandline on a mirror works.
-- 
Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto@keltia.freenix.fr
In memoriam to Ondine : http://ondine.keltia.net/

From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 16 16:40:56 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2F2481065849
	for <freebsd-fs@freebsd.org>; Thu, 16 Apr 2009 16:40:56 +0000 (UTC)
	(envelope-from roberto@keltia.freenix.fr)
Received: from keltia.freenix.fr (keltia.freenix.org
	[IPv6:2001:660:330f:f820:213:72ff:fe15:f44])
	by mx1.freebsd.org (Postfix) with ESMTP id BF3AC8FC1E
	for <freebsd-fs@freebsd.org>; Thu, 16 Apr 2009 16:40:55 +0000 (UTC)
	(envelope-from roberto@keltia.freenix.fr)
Received: from localhost (localhost [127.0.0.1])
	by keltia.freenix.fr (Postfix/TLS) with ESMTP id 57ADF3BDCA
	for <freebsd-fs@freebsd.org>; Thu, 16 Apr 2009 18:40:54 +0200 (CEST)
X-Virus-Scanned: amavisd-new at keltia.freenix.fr
Received: from keltia.freenix.fr ([127.0.0.1])
	by localhost (keltia.freenix.fr [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id E5BPciY3aMAm for <freebsd-fs@freebsd.org>;
	Thu, 16 Apr 2009 18:40:53 +0200 (CEST)
Received: by keltia.freenix.fr (Postfix/TLS, from userid 101)
	id E013C3BDC7; Thu, 16 Apr 2009 18:40:53 +0200 (CEST)
Date: Thu, 16 Apr 2009 18:40:53 +0200
From: Ollivier Robert <roberto@keltia.freenix.fr>
To: freebsd-fs@freebsd.org
Message-ID: <20090416164053.GA80978@keltia.freenix.fr>
References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org>
	<C3970DC5-43A8-4F04-AC55-292F27A30275@lassitu.de>
	<20090416160128.GA831@keltia.freenix.fr>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20090416160128.GA831@keltia.freenix.fr>
X-Operating-System: MacOS X / Macbook Pro - FreeBSD 7 / Dell D820 SMP
User-Agent: Mutt/1.5.19 (2009-01-05)
Subject: Re: Booting from ZFS raidz
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 16 Apr 2009 16:40:57 -0000

According to Ollivier Robert:
> with tank containing a raidz array, zpool refuses to set the property,
> saying it is not available.  Using the same commandline on a mirror works.

BTW all messages I've found on this subject assume (like the script does)
that one can do installworld/installkernel.

I can setup the whole gpt thing from livefs, even extracting all dists on
the newly zfs pool manually by playing with livefs/dvd1 but it can not boot
afterwards because / can not be found.

I must have missed something...  I long for pcbsd setup with zfs support in
fact I think :(
-- 
Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto@keltia.freenix.fr
In memoriam to Ondine : http://ondine.keltia.net/

From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 16 17:09:44 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 60C2F106566C
	for <freebsd-fs@freebsd.org>; Thu, 16 Apr 2009 17:09:44 +0000 (UTC)
	(envelope-from hartzell@alerce.com)
Received: from merlin.alerce.com (merlin.alerce.com [64.62.142.94])
	by mx1.freebsd.org (Postfix) with ESMTP id 4BC2E8FC08
	for <freebsd-fs@freebsd.org>; Thu, 16 Apr 2009 17:09:44 +0000 (UTC)
	(envelope-from hartzell@alerce.com)
Received: from merlin.alerce.com (localhost [127.0.0.1])
	by merlin.alerce.com (Postfix) with ESMTP id B189033C62;
	Thu, 16 Apr 2009 09:52:16 -0700 (PDT)
Received: from merlin.alerce.com (localhost [127.0.0.1])
	by merlin.alerce.com (Postfix) with ESMTP id 675C233C5B;
	Thu, 16 Apr 2009 09:52:16 -0700 (PDT)
From: George Hartzell <hartzell@alerce.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <18919.25164.567669.809759@already.local>
Date: Thu, 16 Apr 2009 09:52:28 -0700
To: Ollivier Robert <roberto@keltia.freenix.fr>
In-Reply-To: <20090416160128.GA831@keltia.freenix.fr>
References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org>
	<C3970DC5-43A8-4F04-AC55-292F27A30275@lassitu.de>
	<20090416160128.GA831@keltia.freenix.fr>
X-Mailer: VM 8.0.12 under 22.3.1 (i386-apple-darwin9.6.0)
X-Virus-Scanned: ClamAV using ClamSMTP
Cc: freebsd-fs@freebsd.org
Subject: Re: Booting from ZFS raidz
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: hartzell@alerce.com
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 16 Apr 2009 17:09:44 -0000

Ollivier Robert writes:
 > According to Stefan Bethke:
 > > Created a GPT label and one partition on each of the three drives:
 > > 
 > > 	gpart create -s gpt $1
 > > 	gpart add -b 34 -s 128 -t freebsd-boot $1
 > > 	gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 $1
 > > 	gpart add -b 512 -s 41900000 -t freebsd-zfs $1
 > > 	gpart list $1
 > 
 > Coming back to this thread, I'm playing with this setup (and the script
 > mentioned in another thread).  When I try to
 > 
 > zpool set bootfs=tank
 > 
 > with tank containing a raidz array, zpool refuses to set the property,
 > saying it is not available.  Using the same commandline on a mirror works.

In Doug's original email announcing raidz boot support,

  http://kerneltrap.org/mailarchive/freebsd-fs/2008/12/17/4441084

he says:

   Currently the ZFS kernel code refuses to allow you to set the
   bootfs pool property on raidz pools (because Solaris can't boot
   from them).  This means that you are limited to booting from the
   root filesystem of the pool for now (it shouldn't be hard to relax
   this restriction). The root filesystem of the pool should contain a
   directory /boot with the usual contents which must include a
   /boot/loader which was built with the 'LOADER_ZFS_SUPPORT' make
   option.

Which jsut means that you need a populated boot directory at the top
of the tank (e.g. /data/boot).  If you're using the
create-zfsboot-gpt.sh file that was posted here recently, you'll need
to rework it a bit, since it puts the root dir at /data/ROOT/data.

g.

From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 16 18:25:43 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 99889106566B
	for <freebsd-fs@freebsd.org>; Thu, 16 Apr 2009 18:25:43 +0000 (UTC)
	(envelope-from nhoyle@hoyletech.com)
Received: from mout.perfora.net (mout.perfora.net [74.208.4.195])
	by mx1.freebsd.org (Postfix) with ESMTP id 654A58FC16
	for <freebsd-fs@freebsd.org>; Thu, 16 Apr 2009 18:25:43 +0000 (UTC)
	(envelope-from nhoyle@hoyletech.com)
Received: from [127.0.0.1] (pool-96-241-114-53.washdc.fios.verizon.net
	[96.241.114.53])
	by mrelay.perfora.net (node=mrus0) with ESMTP (Nemesis)
	id 0MKp8S-1LuW4B1HnZ-000g1t; Thu, 16 Apr 2009 14:12:07 -0400
Message-ID: <49E774F0.1020706@hoyletech.com>
Date: Thu, 16 Apr 2009 14:12:00 -0400
From: Nathanael Hoyle <nhoyle@hoyletech.com>
User-Agent: Thunderbird 2.0.0.21 (Windows/20090302)
MIME-Version: 1.0
To: Ollivier Robert <roberto@keltia.freenix.fr>, freebsd-fs@freebsd.org
References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org>	<C3970DC5-43A8-4F04-AC55-292F27A30275@lassitu.de>	<20090416160128.GA831@keltia.freenix.fr>
	<20090416164053.GA80978@keltia.freenix.fr>
In-Reply-To: <20090416164053.GA80978@keltia.freenix.fr>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Provags-ID: V01U2FsdGVkX19Fq8IE/aaHBE0f98eeO6zvpqFCnNfb2MAeXps
	gQhuvd1jN621wnaltoaokro4LwKEF6o/SkpRgjoSwocFfIRDqf
	5rlaSZ83Y+cvoP3AUgqolAr5JcpdjEy
Cc: 
Subject: Re: Booting from ZFS raidz
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 16 Apr 2009 18:25:43 -0000

Ollivier Robert wrote:
> According to Ollivier Robert:
>   
>> with tank containing a raidz array, zpool refuses to set the property,
>> saying it is not available.  Using the same commandline on a mirror works.
>>     
>
> BTW all messages I've found on this subject assume (like the script does)
> that one can do installworld/installkernel.
>
> I can setup the whole gpt thing from livefs, even extracting all dists on
> the newly zfs pool manually by playing with livefs/dvd1 but it can not boot
> afterwards because / can not be found.
>
> I must have missed something...  I long for pcbsd setup with zfs support in
> fact I think :(
>   

To my knowledge, RAID-Z root (boot) pools are not supported.  I know 
that this is true for upstream (Solaris) ZFS, and unless the FreeBSD 
folks implemented it when I wasn't looking, you can't do it on FreeBSD 
either.  I believe the current implementation essentially reads 
"through" the mirror structure on a mirrored device and can find all of 
the data by "dumb" sequential reads on the first disk, just as it would 
with unpooled disks.  In the case of RAID-Z the boot loader would have 
to be far more intelligent in locating where to read the next block 
from.  It is my understanding that this is a planned future improvement 
(at least for upstream) but haven't heard any update on it in a while.

-Nathanael

From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 16 18:28:10 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DCA991065700
	for <freebsd-fs@freebsd.org>; Thu, 16 Apr 2009 18:28:10 +0000 (UTC)
	(envelope-from nhoyle@hoyletech.com)
Received: from mout.perfora.net (mout.perfora.net [74.208.4.195])
	by mx1.freebsd.org (Postfix) with ESMTP id A8C028FC1D
	for <freebsd-fs@freebsd.org>; Thu, 16 Apr 2009 18:28:10 +0000 (UTC)
	(envelope-from nhoyle@hoyletech.com)
Received: from [127.0.0.1] (pool-96-241-114-53.washdc.fios.verizon.net
	[96.241.114.53])
	by mrelay.perfora.net (node=mrus1) with ESMTP (Nemesis)
	id 0MKpCa-1LuW6f3orG-000coF; Thu, 16 Apr 2009 14:14:41 -0400
Message-ID: <49E7758A.30400@hoyletech.com>
Date: Thu, 16 Apr 2009 14:14:34 -0400
From: Nathanael Hoyle <nhoyle@hoyletech.com>
User-Agent: Thunderbird 2.0.0.21 (Windows/20090302)
MIME-Version: 1.0
To: Ollivier Robert <roberto@keltia.freenix.fr>
References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org>	<C3970DC5-43A8-4F04-AC55-292F27A30275@lassitu.de>	<20090416160128.GA831@keltia.freenix.fr>
	<20090416164053.GA80978@keltia.freenix.fr>
In-Reply-To: <20090416164053.GA80978@keltia.freenix.fr>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Provags-ID: V01U2FsdGVkX193hwmvpiZp0Tmu+oFrAxi/ik7K4s0uE6yqepX
	m6HskAiAaWoZCH4jL2bwiMVd3pZpL8Pn/Gsq09cJqn92Ssbw3E
	dBkqicQQg7d6I1YA5qRT9O1JeATJS4P
Cc: freebsd-fs@freebsd.org
Subject: Re: Booting from ZFS raidz
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 16 Apr 2009 18:28:11 -0000

Ollivier Robert wrote:
> According to Ollivier Robert:
>   
>> with tank containing a raidz array, zpool refuses to set the property,
>> saying it is not available.  Using the same commandline on a mirror works.
>>     
>
> BTW all messages I've found on this subject assume (like the script does)
> that one can do installworld/installkernel.
>
> I can setup the whole gpt thing from livefs, even extracting all dists on
> the newly zfs pool manually by playing with livefs/dvd1 but it can not boot
> afterwards because / can not be found.
>
> I must have missed something...  I long for pcbsd setup with zfs support in
> fact I think :(
>   
Ok, I screwed up.  Not on my usual workstation and my email client 
mis-threaded discussions.  I now realize you were referring to the 
experimental capabilities that Doug has been working on; my apologies 
for jumping the gun with the "can't do that" response.

-Nathanael

From owner-freebsd-fs@FreeBSD.ORG  Fri Apr 17 13:06:11 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 66B3F1065674;
	Fri, 17 Apr 2009 13:06:11 +0000 (UTC)
	(envelope-from alexander@leidinger.net)
Received: from redbull.bpaserver.net (redbullneu.bpaserver.net
	[213.198.78.217])
	by mx1.freebsd.org (Postfix) with ESMTP id 18AD98FC08;
	Fri, 17 Apr 2009 13:06:11 +0000 (UTC)
	(envelope-from alexander@leidinger.net)
Received: from outgoing.leidinger.net (pD9E2CE6F.dip.t-dialin.net
	[217.226.206.111])
	by redbull.bpaserver.net (Postfix) with ESMTP id 946162E1FE;
	Fri, 17 Apr 2009 14:50:33 +0200 (CEST)
Received: from webmail.leidinger.net (webmail.leidinger.net [192.168.1.102])
	by outgoing.leidinger.net (Postfix) with ESMTP id 37FE4C45FA;
	Fri, 17 Apr 2009 14:50:27 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=Leidinger.net;
	s=outgoing-alex; t=1239972628; bh=QtyPvoJKtsu/Ftw9hNgW3aVEbmjdcncPb
	azcBUlFnXs=; h=Message-ID:Date:From:To:Cc:Subject:MIME-Version:
	Content-Type:Content-Transfer-Encoding; b=3gXL7MGtITMJxpMCu49Q7mIb
	o+PY0HIGf7Ev4BQo51P1OvLWeRAQDheDFbAb8sgdGOke/ewwKMyvBb8gg9ptaK1Z5tG
	2+DQQgcDtH4LXI+yjtw2DyfNc+F4mCDhPbZNHB3zAzI3j4iyaD5mVDwdQStu2dcCrV5
	Oc6XR3gH3URHK+hpMR5bvD1E3Y/mDGJVEGThcBxuffoqVEC5zzCDpbpI4oVydBf4sbV
	k8r4bZfSn7HPVvBdeSMsfD5PMwMSRgOByhHwppCkLLiN2pfO9womm5qXihe2H+05Vyc
	xK6fDdM9HS3FduB8TsYiYCyUf9cDt892pJuUu4RuL0VRAbw5bg==
Received: (from www@localhost)
	by webmail.leidinger.net (8.14.3/8.13.8/Submit) id n3HCoP1o013531;
	Fri, 17 Apr 2009 14:50:25 +0200 (CEST)
	(envelope-from Alexander@Leidinger.net)
Received: from pslux.cec.eu.int (pslux.cec.eu.int [158.169.9.14]) by
	webmail.leidinger.net (Horde Framework) with HTTP; Fri, 17 Apr 2009
	14:50:24 +0200
Message-ID: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net>
X-Priority: 3 (Normal)
Date: Fri, 17 Apr 2009 14:50:24 +0200
From: Alexander Leidinger <Alexander@Leidinger.net>
To: current@freebsd.org
MIME-Version: 1.0
Content-Type: text/plain;
	charset=UTF-8;
	DelSp="Yes";
	format="flowed"
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
User-Agent: Internet Messaging Program (IMP) H3 (4.3) / FreeBSD-8.0
X-BPAnet-MailScanner-Information: Please contact the ISP for more information
X-MailScanner-ID: 946162E1FE.1BFC8
X-BPAnet-MailScanner: Found to be clean
X-BPAnet-MailScanner-SpamCheck: not spam, ORDB-RBL, SpamAssassin (not cached, 
	score=-14.9, required 6, BAYES_00 -15.00,
	DKIM_SIGNED 0.00, 
	DKIM_VERIFIED -0.00, RDNS_DYNAMIC 0.10)
X-BPAnet-MailScanner-From: alexander@leidinger.net
X-Spam-Status: No
Cc: fs@freebsd.org
Subject: ZFS: unlimited arc cache growth?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 17 Apr 2009 13:06:11 -0000

Hi,

to fs@, please CC me, as I'm not subscribed.

I monitored (by hand) a while the sysctls kstat.zfs.misc.arcstats.size =20
and kstat.zfs.misc.arcstats.hdr_size. Both grow way higher (at some =20
point I've seen more than 500M) than what I have configured in =20
vfs.zfs.arc_max (40M).

After a while FS operations (e.g. pkgdb -F with about 900 packages... =20
my specific workload is the fixup of gnome packages after the removal =20
of the obsolete libusb port) get very slow (in my specific example I =20
let the pkgdb run several times over night and it still is not =20
finished).

The big problem with this is, that at some point in time the machine =20
reboots (panic, page fault, page not present, during a fork1). I have =20
the impression (beware, I have a watchdog configured, as I don't know =20
if a triggered WD would cause the same panic, the following is just a =20
guess) that I run out of memory of some kind (I have 1G RAM, i386, max =20
kmem size 700M). I restarted  pkgdb several times after a reboot, and =20
it continues to process the libusb removal, but hey, this is anoying.

Does someone see something similar to what I describe (mainly the =20
growth of the arc cache way beyond what is configured)? Anyone with =20
some ideas what to try?

Bye,
Alexander.

--=20
When you go out to buy, don't show your silver.

http://www.Leidinger.net    Alexander @ Leidinger.net: PGP ID =3D B0063FE7
http://www.FreeBSD.org       netchild @ FreeBSD.org  : PGP ID =3D 72077137

From owner-freebsd-fs@FreeBSD.ORG  Fri Apr 17 13:14:46 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id ADE561065670
	for <freebsd-fs@freebsd.org>; Fri, 17 Apr 2009 13:14:46 +0000 (UTC)
	(envelope-from roberto@keltia.freenix.fr)
Received: from keltia.freenix.fr (keltia.freenix.org
	[IPv6:2001:660:330f:f820:213:72ff:fe15:f44])
	by mx1.freebsd.org (Postfix) with ESMTP id 5AFEE8FC25
	for <freebsd-fs@freebsd.org>; Fri, 17 Apr 2009 13:14:46 +0000 (UTC)
	(envelope-from roberto@keltia.freenix.fr)
Received: from localhost (localhost [127.0.0.1])
	by keltia.freenix.fr (Postfix/TLS) with ESMTP id 49D223BDC6;
	Fri, 17 Apr 2009 15:14:44 +0200 (CEST)
X-Virus-Scanned: amavisd-new at keltia.freenix.fr
Received: from keltia.freenix.fr ([127.0.0.1])
	by localhost (keltia.freenix.fr [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id pZc4x4V3PJ7j; Fri, 17 Apr 2009 15:14:43 +0200 (CEST)
Received: by keltia.freenix.fr (Postfix/TLS, from userid 101)
	id C2A483BD9D; Fri, 17 Apr 2009 15:14:43 +0200 (CEST)
Date: Fri, 17 Apr 2009 15:14:43 +0200
From: Ollivier Robert <roberto@keltia.freenix.fr>
To: George Hartzell <hartzell@alerce.com>
Message-ID: <20090417131443.GD96263@keltia.freenix.fr>
References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org>
	<C3970DC5-43A8-4F04-AC55-292F27A30275@lassitu.de>
	<20090416160128.GA831@keltia.freenix.fr>
	<18919.25164.567669.809759@already.local>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <18919.25164.567669.809759@already.local>
X-Operating-System: MacOS X / Macbook Pro - FreeBSD 7 / Dell D820 SMP
User-Agent: Mutt/1.5.19 (2009-01-05)
Cc: freebsd-fs@freebsd.org
Subject: Re: Booting from ZFS raidz
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 17 Apr 2009 13:14:47 -0000

According to George Hartzell:
> Which jsut means that you need a populated boot directory at the top
> of the tank (e.g. /data/boot).  If you're using the
> create-zfsboot-gpt.sh file that was posted here recently, you'll need
> to rework it a bit, since it puts the root dir at /data/ROOT/data.

OK, following this, I managed the boot code to find loader & loader.conf.
It stops when it can't find the root I want it to boot from though.

The ? prompt shows me all devices (da{0,1,2}, da{0,1,2}p{1,2} and
label/swap) but trying to use zfs:whatever does not seem to work.

loader.conf is very small:
-----
zfs_load="YES"
geom_label_load="YES"
vfs.root.mountfrom="zfs:tank/ROOT/tank"
-----

I did
zfs set mountpoint=/tank/ROOT/tank tank/ROOT/tank (aka the real root)

the other fs are in their usual place
zfs set  mountpoint=/usr tank/usr
zfs set  mountpoint=/var tank/var

Any other ideas.  I'll try to summarize here and on the wiki when I'm done.
-- 
Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto@keltia.freenix.fr
In memoriam to Ondine : http://ondine.keltia.net/

From owner-freebsd-fs@FreeBSD.ORG  Fri Apr 17 13:47:25 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7FCF8106564A
	for <freebsd-fs@freebsd.org>; Fri, 17 Apr 2009 13:47:25 +0000 (UTC)
	(envelope-from hartzell@alerce.com)
Received: from merlin.alerce.com (merlin.alerce.com [64.62.142.94])
	by mx1.freebsd.org (Postfix) with ESMTP id 683628FC13
	for <freebsd-fs@freebsd.org>; Fri, 17 Apr 2009 13:47:25 +0000 (UTC)
	(envelope-from hartzell@alerce.com)
Received: from merlin.alerce.com (localhost [127.0.0.1])
	by merlin.alerce.com (Postfix) with ESMTP id E997533C62;
	Fri, 17 Apr 2009 06:47:24 -0700 (PDT)
Received: from merlin.alerce.com (localhost [127.0.0.1])
	by merlin.alerce.com (Postfix) with ESMTP id 5869E33C5B;
	Fri, 17 Apr 2009 06:47:24 -0700 (PDT)
From: George Hartzell <hartzell@alerce.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <18920.34924.2076.295983@already.local>
Date: Fri, 17 Apr 2009 06:47:24 -0700
To: Ollivier Robert <roberto@keltia.freenix.fr>
In-Reply-To: <20090417131443.GD96263@keltia.freenix.fr>
References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org>
	<C3970DC5-43A8-4F04-AC55-292F27A30275@lassitu.de>
	<20090416160128.GA831@keltia.freenix.fr>
	<18919.25164.567669.809759@already.local>
	<20090417131443.GD96263@keltia.freenix.fr>
X-Mailer: VM 8.0.12 under 22.3.1 (i386-apple-darwin9.6.0)
X-Virus-Scanned: ClamAV using ClamSMTP
Cc: freebsd-fs@freebsd.org
Subject: Re: Booting from ZFS raidz
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: hartzell@alerce.com
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 17 Apr 2009 13:47:25 -0000

Ollivier Robert writes:
 > According to George Hartzell:
 > > Which jsut means that you need a populated boot directory at the top
 > > of the tank (e.g. /data/boot).  If you're using the
 > > create-zfsboot-gpt.sh file that was posted here recently, you'll need
 > > to rework it a bit, since it puts the root dir at /data/ROOT/data.
 > 
 > OK, following this, I managed the boot code to find loader & loader.conf.
 > It stops when it can't find the root I want it to boot from though.
 > 
 > The ? prompt shows me all devices (da{0,1,2}, da{0,1,2}p{1,2} and
 > label/swap) but trying to use zfs:whatever does not seem to work.
 > 
 > loader.conf is very small:
 > -----
 > zfs_load="YES"
 > geom_label_load="YES"
 > vfs.root.mountfrom="zfs:tank/ROOT/tank"
 > -----
 > 
 > I did
 > zfs set mountpoint=/tank/ROOT/tank tank/ROOT/tank (aka the real root)
 > 
 > the other fs are in their usual place
 > zfs set  mountpoint=/usr tank/usr
 > zfs set  mountpoint=/var tank/var
 > 
 > Any other ideas.  I'll try to summarize here and on the wiki when I'm done.

Did you build the loader with LOADER_ZFS_SUPPORT=YES enabled?

I just threw that line in my /etc/make.conf and rebuilt everything.

g.

From owner-freebsd-fs@FreeBSD.ORG  Fri Apr 17 13:57:39 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 632361065672
	for <freebsd-fs@freebsd.org>; Fri, 17 Apr 2009 13:57:39 +0000 (UTC)
	(envelope-from roberto@keltia.freenix.fr)
Received: from keltia.freenix.fr (keltia.freenix.org
	[IPv6:2001:660:330f:f820:213:72ff:fe15:f44])
	by mx1.freebsd.org (Postfix) with ESMTP id 0F71F8FC13
	for <freebsd-fs@freebsd.org>; Fri, 17 Apr 2009 13:57:39 +0000 (UTC)
	(envelope-from roberto@keltia.freenix.fr)
Received: from localhost (localhost [127.0.0.1])
	by keltia.freenix.fr (Postfix/TLS) with ESMTP id 280B83BDC7;
	Fri, 17 Apr 2009 15:57:38 +0200 (CEST)
X-Virus-Scanned: amavisd-new at keltia.freenix.fr
Received: from keltia.freenix.fr ([127.0.0.1])
	by localhost (keltia.freenix.fr [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id 4rIfpui42FOn; Fri, 17 Apr 2009 15:57:37 +0200 (CEST)
Received: by keltia.freenix.fr (Postfix/TLS, from userid 101)
	id 999F93BDC6; Fri, 17 Apr 2009 15:57:37 +0200 (CEST)
Date: Fri, 17 Apr 2009 15:57:37 +0200
From: Ollivier Robert <roberto@keltia.freenix.fr>
To: George Hartzell <hartzell@alerce.com>
Message-ID: <20090417135737.GE96263@keltia.freenix.fr>
References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org>
	<C3970DC5-43A8-4F04-AC55-292F27A30275@lassitu.de>
	<20090416160128.GA831@keltia.freenix.fr>
	<18919.25164.567669.809759@already.local>
	<20090417131443.GD96263@keltia.freenix.fr>
	<18920.34924.2076.295983@already.local>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <18920.34924.2076.295983@already.local>
X-Operating-System: MacOS X / Macbook Pro - FreeBSD 7 / Dell D820 SMP
User-Agent: Mutt/1.5.19 (2009-01-05)
Cc: freebsd-fs@freebsd.org
Subject: Re: Booting from ZFS raidz
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 17 Apr 2009 13:57:40 -0000

According to George Hartzell:
> Did you build the loader with LOADER_ZFS_SUPPORT=YES enabled?
> 
> I just threw that line in my /etc/make.conf and rebuilt everything.

Yes, I even reinstalled the gpart bootcode.
-- 
Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto@keltia.freenix.fr
In memoriam to Ondine : http://ondine.keltia.net/

From owner-freebsd-fs@FreeBSD.ORG  Fri Apr 17 14:20:14 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7CD9D1065674;
	Fri, 17 Apr 2009 14:20:14 +0000 (UTC)
	(envelope-from ben@wanderview.com)
Received: from mail.wanderview.com (mail.wanderview.com [66.92.166.102])
	by mx1.freebsd.org (Postfix) with ESMTP id A6CCB8FC1C;
	Fri, 17 Apr 2009 14:20:13 +0000 (UTC)
	(envelope-from ben@wanderview.com)
Received: from harkness.in.wanderview.com (harkness.in.wanderview.com
	[10.76.10.150]) (authenticated bits=0)
	by mail.wanderview.com (8.14.3/8.14.3) with ESMTP id n3HE4FBX003074
	(version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO);
	Fri, 17 Apr 2009 14:04:15 GMT (envelope-from ben@wanderview.com)
From: Ben Kelly <ben@wanderview.com>
To: Alexander Leidinger <Alexander@Leidinger.net>
In-Reply-To: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net>
X-Priority: 3 (Normal)
References: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net>
Message-Id: <D2B1B82F-AFCF-4161-BB9E-316EC976E360@wanderview.com>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v930.3)
Date: Fri, 17 Apr 2009 10:04:15 -0400
X-Mailer: Apple Mail (2.930.3)
X-Spam-Score: -1.44 () ALL_TRUSTED
X-Scanned-By: MIMEDefang 2.64 on 10.76.20.1
Cc: current@freebsd.org, fs@freebsd.org
Subject: Re: ZFS: unlimited arc cache growth?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 17 Apr 2009 14:20:15 -0000

On Apr 17, 2009, at 8:50 AM, Alexander Leidinger wrote:
> to fs@, please CC me, as I'm not subscribed.
>
> I monitored (by hand) a while the sysctls  
> kstat.zfs.misc.arcstats.size and kstat.zfs.misc.arcstats.hdr_size.  
> Both grow way higher (at some point I've seen more than 500M) than  
> what I have configured in vfs.zfs.arc_max (40M).
>
> After a while FS operations (e.g. pkgdb -F with about 900  
> packages... my specific workload is the fixup of gnome packages  
> after the removal of the obsolete libusb port) get very slow (in my  
> specific example I let the pkgdb run several times over night and it  
> still is not finished).
>
> The big problem with this is, that at some point in time the machine  
> reboots (panic, page fault, page not present, during a fork1). I  
> have the impression (beware, I have a watchdog configured, as I  
> don't know if a triggered WD would cause the same panic, the  
> following is just a guess) that I run out of memory of some kind (I  
> have 1G RAM, i386, max kmem size 700M). I restarted  pkgdb several  
> times after a reboot, and it continues to process the libusb  
> removal, but hey, this is anoying.
>
> Does someone see something similar to what I describe (mainly the  
> growth of the arc cache way beyond what is configured)? Anyone with  
> some ideas what to try?

Can you provide the rest of the arcstats from sysctl?  Also, does your  
arc_reclaim_thread process get any cycles when this problem occurs?   
What happens if you kill the pkgdb -F manually before it completes?   
Does the arc cache size come back down or is it stuck at the  
abnormally high level?

At first glance it looks like the tunable limits the value of the  
arc_c target value, but that appears to only be a soft limit.  There  
is code in there to shrink an ARC that has exceeded its arc_c value.   
It looks like that code is supposed to run from the arc_reclaim_thread.

- Ben

From owner-freebsd-fs@FreeBSD.ORG  Fri Apr 17 14:36:02 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8F089106577B;
	Fri, 17 Apr 2009 14:36:02 +0000 (UTC)
	(envelope-from ticso@cicely7.cicely.de)
Received: from raven.bwct.de (raven.bwct.de [85.159.14.73])
	by mx1.freebsd.org (Postfix) with ESMTP id 120CD8FC13;
	Fri, 17 Apr 2009 14:36:01 +0000 (UTC)
	(envelope-from ticso@cicely7.cicely.de)
Received: from cicely5.cicely.de ([10.1.1.7])
	by raven.bwct.de (8.13.4/8.13.4) with ESMTP id n3HEIKtw047958
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
	Fri, 17 Apr 2009 16:18:20 +0200 (CEST)
	(envelope-from ticso@cicely7.cicely.de)
Received: from cicely7.cicely.de (cicely7.cicely.de [10.1.1.9])
	by cicely5.cicely.de (8.14.2/8.14.2) with ESMTP id n3HEIHqp018223
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 17 Apr 2009 16:18:17 +0200 (CEST)
	(envelope-from ticso@cicely7.cicely.de)
Received: from cicely7.cicely.de (localhost [127.0.0.1])
	by cicely7.cicely.de (8.14.2/8.14.2) with ESMTP id n3HEIH5U015759;
	Fri, 17 Apr 2009 16:18:17 +0200 (CEST)
	(envelope-from ticso@cicely7.cicely.de)
Received: (from ticso@localhost)
	by cicely7.cicely.de (8.14.2/8.14.2/Submit) id n3HEIHiI015758;
	Fri, 17 Apr 2009 16:18:17 +0200 (CEST) (envelope-from ticso)
Date: Fri, 17 Apr 2009 16:18:17 +0200
From: Bernd Walter <ticso@cicely7.cicely.de>
To: Alexander Leidinger <Alexander@Leidinger.net>
Message-ID: <20090417141817.GR11551@cicely7.cicely.de>
References: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net>
X-Operating-System: FreeBSD cicely7.cicely.de 7.0-STABLE i386
User-Agent: Mutt/1.5.11
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED=-1.8, AWL=0.000,
	BAYES_00=-2.599 autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on spamd.cicely.de
Cc: current@freebsd.org, fs@freebsd.org
Subject: Re: ZFS: unlimited arc cache growth?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: ticso@cicely.de
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 17 Apr 2009 14:36:05 -0000

On Fri, Apr 17, 2009 at 02:50:24PM +0200, Alexander Leidinger wrote:
> Hi,
> 
> to fs@, please CC me, as I'm not subscribed.
> 
> I monitored (by hand) a while the sysctls kstat.zfs.misc.arcstats.size  
> and kstat.zfs.misc.arcstats.hdr_size. Both grow way higher (at some  
> point I've seen more than 500M) than what I have configured in  
> vfs.zfs.arc_max (40M).

My understanding about this is the following:
vfs.zfs.arc_min/max are not used as min max values.
They are used as high/low watermarks.
If arc is more than max the arc a thread is triggered to reduce the
arc cache until min, but in the meantime other threads can still grow
arc so there is a race between them.

> After a while FS operations (e.g. pkgdb -F with about 900 packages...  
> my specific workload is the fixup of gnome packages after the removal  
> of the obsolete libusb port) get very slow (in my specific example I  
> let the pkgdb run several times over night and it still is not  
> finished).

I've seen many workloads were prefetching can saturate disks without
ever being used.
You might want to try disabling prefetch.
Of course prefetching also grows arc.

> The big problem with this is, that at some point in time the machine  
> reboots (panic, page fault, page not present, during a fork1). I have  
> the impression (beware, I have a watchdog configured, as I don't know  
> if a triggered WD would cause the same panic, the following is just a  
> guess) that I run out of memory of some kind (I have 1G RAM, i386, max  
> kmem size 700M). I restarted  pkgdb several times after a reboot, and  
> it continues to process the libusb removal, but hey, this is anoying.

With just 700M kmem you should set arc values extremly small and
avoid anything which can quickly grow it.
Unfortunately accessing many small files is a know arc filling workload.
Activating vfs.zfs.cache_flush_disable can help speeding up arc decreasing,
with the obvous risks of course...

> Does someone see something similar to what I describe (mainly the  
> growth of the arc cache way beyond what is configured)? Anyone with  
> some ideas what to try?

In my opinion the watermark mechanism can work as it is, but there should
be a forced max - currently there is no garantied limit at all.
Nevertheless it is up for the people which know the code to decide.

-- 
B.Walter <bernd@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.

From owner-freebsd-fs@FreeBSD.ORG  Fri Apr 17 14:46:06 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EC4C91065B7F
	for <freebsd-fs@freebsd.org>; Fri, 17 Apr 2009 14:46:06 +0000 (UTC)
	(envelope-from roberto@keltia.freenix.fr)
Received: from keltia.freenix.fr (keltia.freenix.org
	[IPv6:2001:660:330f:f820:213:72ff:fe15:f44])
	by mx1.freebsd.org (Postfix) with ESMTP id 9181F8FC20
	for <freebsd-fs@freebsd.org>; Fri, 17 Apr 2009 14:46:06 +0000 (UTC)
	(envelope-from roberto@keltia.freenix.fr)
Received: from localhost (localhost [127.0.0.1])
	by keltia.freenix.fr (Postfix/TLS) with ESMTP id 92C8B3BDC5
	for <freebsd-fs@freebsd.org>; Fri, 17 Apr 2009 16:46:05 +0200 (CEST)
X-Virus-Scanned: amavisd-new at keltia.freenix.fr
Received: from keltia.freenix.fr ([127.0.0.1])
	by localhost (keltia.freenix.fr [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id n4cF79JbJVsh for <freebsd-fs@freebsd.org>;
	Fri, 17 Apr 2009 16:46:05 +0200 (CEST)
Received: by keltia.freenix.fr (Postfix/TLS, from userid 101)
	id 1E92A3BDC4; Fri, 17 Apr 2009 16:46:05 +0200 (CEST)
Date: Fri, 17 Apr 2009 16:46:05 +0200
From: Ollivier Robert <roberto@keltia.freenix.fr>
To: freebsd-fs@freebsd.org
Message-ID: <20090417144605.GA2316@keltia.freenix.fr>
References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org>
	<C3970DC5-43A8-4F04-AC55-292F27A30275@lassitu.de>
	<20090416160128.GA831@keltia.freenix.fr>
	<18919.25164.567669.809759@already.local>
	<20090417131443.GD96263@keltia.freenix.fr>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20090417131443.GD96263@keltia.freenix.fr>
X-Operating-System: MacOS X / Macbook Pro - FreeBSD 7 / Dell D820 SMP
User-Agent: Mutt/1.5.19 (2009-01-05)
Subject: Re: Booting from ZFS raidz
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 17 Apr 2009 14:46:10 -0000

According to Ollivier Robert:
> I did
> zfs set mountpoint=/tank/ROOT/tank tank/ROOT/tank (aka the real root)
> 
> the other fs are in their usual place
> zfs set  mountpoint=/usr tank/usr
> zfs set  mountpoint=/var tank/var

With a proper zpool.cache in the right place (it was not generated first
time I tried), it gets further.  I'm still missing some bitsi (/usr
apparently although I did configure it...).

As this is all done in a vmware vm, I can redo everything whenever I want.
I wish sysinstall was in a higher level language than C, I could hack a bit
on it.  Right now, like many others, I feel a bit overwhelmed by the 20k
LOC...
-- 
Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto@keltia.freenix.fr
In memoriam to Ondine : http://ondine.keltia.net/

From owner-freebsd-fs@FreeBSD.ORG  Fri Apr 17 16:58:32 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A3C88106564A;
	Fri, 17 Apr 2009 16:58:32 +0000 (UTC)
	(envelope-from marius@nuenneri.ch)
Received: from mail-fx0-f167.google.com (mail-fx0-f167.google.com
	[209.85.220.167])
	by mx1.freebsd.org (Postfix) with ESMTP id DB9B18FC15;
	Fri, 17 Apr 2009 16:58:31 +0000 (UTC)
	(envelope-from marius@nuenneri.ch)
Received: by fxm11 with SMTP id 11so1004188fxm.43
	for <multiple recipients>; Fri, 17 Apr 2009 09:58:30 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.204.116.69 with SMTP id l5mr2456841bkq.102.1239985709072; Fri, 
	17 Apr 2009 09:28:29 -0700 (PDT)
In-Reply-To: <20090417141817.GR11551@cicely7.cicely.de>
References: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net>
	<20090417141817.GR11551@cicely7.cicely.de>
Date: Fri, 17 Apr 2009 18:28:29 +0200
Message-ID: <b649e5e0904170928p1568329bwb2f8a5f0f9b4f698@mail.gmail.com>
From: =?ISO-8859-1?Q?Marius_N=FCnnerich?= <marius@nuenneri.ch>
To: ticso@cicely.de
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: Alexander Leidinger <Alexander@leidinger.net>, current@freebsd.org,
	fs@freebsd.org
Subject: Re: ZFS: unlimited arc cache growth?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 17 Apr 2009 16:58:33 -0000

On Fri, Apr 17, 2009 at 16:18, Bernd Walter <ticso@cicely7.cicely.de> wrote=
:
> On Fri, Apr 17, 2009 at 02:50:24PM +0200, Alexander Leidinger wrote:
>> Hi,
>>
>> to fs@, please CC me, as I'm not subscribed.
>>
>> I monitored (by hand) a while the sysctls kstat.zfs.misc.arcstats.size
>> and kstat.zfs.misc.arcstats.hdr_size. Both grow way higher (at some
>> point I've seen more than 500M) than what I have configured in
>> vfs.zfs.arc_max (40M).
>
> My understanding about this is the following:
> vfs.zfs.arc_min/max are not used as min max values.
> They are used as high/low watermarks.
> If arc is more than max the arc a thread is triggered to reduce the
> arc cache until min, but in the meantime other threads can still grow
> arc so there is a race between them.

Hmm, if this is true the ARC size should go down to arc_min once it
did grow past arc_max and no new data is coming along but I do not
observe such a thing here. It simply stays near but below arc_max here
all the time. I have only /home on ZFS with moderate load.

>
>> After a while FS operations (e.g. pkgdb -F with about 900 packages...
>> my specific workload is the fixup of gnome packages after the removal
>> of the obsolete libusb port) get very slow (in my specific example I
>> let the pkgdb run several times over night and it still is not
>> finished).
>
> I've seen many workloads were prefetching can saturate disks without
> ever being used.
> You might want to try disabling prefetch.
> Of course prefetching also grows arc.
>
>> The big problem with this is, that at some point in time the machine
>> reboots (panic, page fault, page not present, during a fork1). I have
>> the impression (beware, I have a watchdog configured, as I don't know
>> if a triggered WD would cause the same panic, the following is just a
>> guess) that I run out of memory of some kind (I have 1G RAM, i386, max
>> kmem size 700M). I restarted =A0pkgdb several times after a reboot, and
>> it continues to process the libusb removal, but hey, this is anoying.
>
> With just 700M kmem you should set arc values extremly small and
> avoid anything which can quickly grow it.
> Unfortunately accessing many small files is a know arc filling workload.
> Activating vfs.zfs.cache_flush_disable can help speeding up arc decreasin=
g,
> with the obvous risks of course...
>
>> Does someone see something similar to what I describe (mainly the
>> growth of the arc cache way beyond what is configured)? Anyone with
>> some ideas what to try?
>
> In my opinion the watermark mechanism can work as it is, but there should
> be a forced max - currently there is no garantied limit at all.
> Nevertheless it is up for the people which know the code to decide.
>
> --
> B.Walter <bernd@bwct.de> http://www.bwct.de
> Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org=
"
>

From owner-freebsd-fs@FreeBSD.ORG  Fri Apr 17 19:05:58 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 31283106566B;
	Fri, 17 Apr 2009 19:05:58 +0000 (UTC)
	(envelope-from ticso@cicely7.cicely.de)
Received: from raven.bwct.de (raven.bwct.de [85.159.14.73])
	by mx1.freebsd.org (Postfix) with ESMTP id 95F608FC16;
	Fri, 17 Apr 2009 19:05:57 +0000 (UTC)
	(envelope-from ticso@cicely7.cicely.de)
Received: from cicely5.cicely.de ([10.1.1.7])
	by raven.bwct.de (8.13.4/8.13.4) with ESMTP id n3HJ5tmF063212
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
	Fri, 17 Apr 2009 21:05:56 +0200 (CEST)
	(envelope-from ticso@cicely7.cicely.de)
Received: from cicely7.cicely.de (cicely7.cicely.de [10.1.1.9])
	by cicely5.cicely.de (8.14.2/8.14.2) with ESMTP id n3HJ5q2p027708
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 17 Apr 2009 21:05:52 +0200 (CEST)
	(envelope-from ticso@cicely7.cicely.de)
Received: from cicely7.cicely.de (localhost [127.0.0.1])
	by cicely7.cicely.de (8.14.2/8.14.2) with ESMTP id n3HJ5qJX016460;
	Fri, 17 Apr 2009 21:05:52 +0200 (CEST)
	(envelope-from ticso@cicely7.cicely.de)
Received: (from ticso@localhost)
	by cicely7.cicely.de (8.14.2/8.14.2/Submit) id n3HJ5qeY016459;
	Fri, 17 Apr 2009 21:05:52 +0200 (CEST) (envelope-from ticso)
Date: Fri, 17 Apr 2009 21:05:52 +0200
From: Bernd Walter <ticso@cicely7.cicely.de>
To: Marius =?iso-8859-1?Q?N=FCnnerich?= <marius@nuenneri.ch>
Message-ID: <20090417190551.GT11551@cicely7.cicely.de>
References: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net>
	<20090417141817.GR11551@cicely7.cicely.de>
	<b649e5e0904170928p1568329bwb2f8a5f0f9b4f698@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <b649e5e0904170928p1568329bwb2f8a5f0f9b4f698@mail.gmail.com>
X-Operating-System: FreeBSD cicely7.cicely.de 7.0-STABLE i386
User-Agent: Mutt/1.5.11
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED=-1.8, AWL=0.000,
	BAYES_00=-2.599 autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on spamd.cicely.de
Cc: Alexander Leidinger <Alexander@leidinger.net>, ticso@cicely.de,
	fs@freebsd.org, current@freebsd.org
Subject: Re: ZFS: unlimited arc cache growth?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: ticso@cicely.de
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 17 Apr 2009 19:05:58 -0000

On Fri, Apr 17, 2009 at 06:28:29PM +0200, Marius N�nnerich wrote:
> On Fri, Apr 17, 2009 at 16:18, Bernd Walter <ticso@cicely7.cicely.de> wrote:
> > On Fri, Apr 17, 2009 at 02:50:24PM +0200, Alexander Leidinger wrote:
> >> Hi,
> >>
> >> to fs@, please CC me, as I'm not subscribed.
> >>
> >> I monitored (by hand) a while the sysctls kstat.zfs.misc.arcstats.size
> >> and kstat.zfs.misc.arcstats.hdr_size. Both grow way higher (at some
> >> point I've seen more than 500M) than what I have configured in
> >> vfs.zfs.arc_max (40M).
> >
> > My understanding about this is the following:
> > vfs.zfs.arc_min/max are not used as min max values.
> > They are used as high/low watermarks.
> > If arc is more than max the arc a thread is triggered to reduce the
> > arc cache until min, but in the meantime other threads can still grow
> > arc so there is a race between them.
> 
> Hmm, if this is true the ARC size should go down to arc_min once it
> did grow past arc_max and no new data is coming along but I do not
> observe such a thing here. It simply stays near but below arc_max here
> all the time. I have only /home on ZFS with moderate load.

I had a few ideas why this could be, but scanning complete sys showed
no point at all where arc_min is used.
There are formular to set this value, but that's all I find.

-- 
B.Walter <bernd@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.

From owner-freebsd-fs@FreeBSD.ORG  Fri Apr 17 21:44:05 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 19E33106566B
	for <fs@freebsd.org>; Fri, 17 Apr 2009 21:44:05 +0000 (UTC)
	(envelope-from dan@dan.emsphone.com)
Received: from email1.allantgroup.com (email1.emsphone.com [199.67.51.115])
	by mx1.freebsd.org (Postfix) with ESMTP id C0EA88FC19
	for <fs@freebsd.org>; Fri, 17 Apr 2009 21:44:04 +0000 (UTC)
	(envelope-from dan@dan.emsphone.com)
Received: from dan.emsphone.com (dan.emsphone.com [199.67.51.101])
	by email1.allantgroup.com (8.14.0/8.14.0) with ESMTP id n3HLCaLH073548
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <fs@freebsd.org>; Fri, 17 Apr 2009 16:12:37 -0500 (CDT)
	(envelope-from dan@dan.emsphone.com)
Received: from dan.emsphone.com (smmsp@localhost [127.0.0.1])
	by dan.emsphone.com (8.14.3/8.14.3) with ESMTP id n3HLCanh027321
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <fs@freebsd.org>; Fri, 17 Apr 2009 16:12:36 -0500 (CDT)
	(envelope-from dan@dan.emsphone.com)
Received: (from dan@localhost)
	by dan.emsphone.com (8.14.3/8.14.3/Submit) id n3HKxtlu099641;
	Fri, 17 Apr 2009 15:59:55 -0500 (CDT) (envelope-from dan)
Date: Fri, 17 Apr 2009 15:59:55 -0500
From: Dan Nelson <dnelson@allantgroup.com>
To: ticso@cicely.de
Message-ID: <20090417205955.GK90152@dan.emsphone.com>
References: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net>
	<20090417141817.GR11551@cicely7.cicely.de>
	<b649e5e0904170928p1568329bwb2f8a5f0f9b4f698@mail.gmail.com>
	<20090417190551.GT11551@cicely7.cicely.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <20090417190551.GT11551@cicely7.cicely.de>
X-OS: FreeBSD 7.1-STABLE
User-Agent: Mutt/1.5.19 (2009-01-05)
X-Virus-Scanned: ClamAV version 0.94.1,
	clamav-milter version 0.94.1 on email1.allantgroup.com
X-Virus-Status: Clean
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2
	(email1.allantgroup.com [199.67.51.78]);
	Fri, 17 Apr 2009 16:12:37 -0500 (CDT)
X-Scanned-By: MIMEDefang 2.45
Cc: Alexander Leidinger <Alexander@leidinger.net>, current@freebsd.org,
	fs@freebsd.org
Subject: Re: ZFS: unlimited arc cache growth?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 17 Apr 2009 21:44:05 -0000

In the last episode (Apr 17), Bernd Walter said:
> On Fri, Apr 17, 2009 at 06:28:29PM +0200, Marius Nünnerich wrote:
> > On Fri, Apr 17, 2009 at 16:18, Bernd Walter <ticso@cicely7.cicely.de> wrote:
> > > On Fri, Apr 17, 2009 at 02:50:24PM +0200, Alexander Leidinger wrote:
> > >> I monitored (by hand) a while the sysctls
> > >> kstat.zfs.misc.arcstats.size and kstat.zfs.misc.arcstats.hdr_size. 
> > >> Both grow way higher (at some point I've seen more than 500M) than
> > >> what I have configured in vfs.zfs.arc_max (40M).
> > >
> > > My understanding about this is the following: vfs.zfs.arc_min/max are
> > > not used as min max values.  They are used as high/low watermarks.  If
> > > arc is more than max the arc a thread is triggered to reduce the arc
> > > cache until min, but in the meantime other threads can still grow arc
> > > so there is a race between them.
> > 
> > Hmm, if this is true the ARC size should go down to arc_min once it did
> > grow past arc_max and no new data is coming along but I do not observe
> > such a thing here.  It simply stays near but below arc_max here all the
> > time.  I have only /home on ZFS with moderate load.
> 
> I had a few ideas why this could be, but scanning complete sys showed no
> point at all where arc_min is used.  There are formular to set this value,
> but that's all I find.

zfs_arc_{min,max} are just tunables.  The real variables arc_c_{min,max} get
autosized and then capped to {min,max} in uts/common/fs/zfs/arc.c:arc_init() .

-- 
	Dan Nelson
	dnelson@allantgroup.com

From owner-freebsd-fs@FreeBSD.ORG  Sat Apr 18 07:39:13 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B4099106564A;
	Sat, 18 Apr 2009 07:39:13 +0000 (UTC)
	(envelope-from alexander@leidinger.net)
Received: from redbull.bpaserver.net (redbullneu.bpaserver.net
	[213.198.78.217])
	by mx1.freebsd.org (Postfix) with ESMTP id 6FAAB8FC08;
	Sat, 18 Apr 2009 07:39:13 +0000 (UTC)
	(envelope-from alexander@leidinger.net)
Received: from outgoing.leidinger.net (pD9E2DC61.dip.t-dialin.net
	[217.226.220.97])
	by redbull.bpaserver.net (Postfix) with ESMTP id 09C302E068;
	Sat, 18 Apr 2009 09:39:06 +0200 (CEST)
Received: from unknown (IO.Leidinger.net [192.168.2.103])
	by outgoing.leidinger.net (Postfix) with ESMTP id 2410FC2B67;
	Sat, 18 Apr 2009 09:38:59 +0200 (CEST)
Date: Sat, 18 Apr 2009 09:38:57 +0200
From: Alexander Leidinger <Alexander@Leidinger.net>
To: ticso@cicely.de
Message-ID: <20090418093857.0000199a@unknown>
In-Reply-To: <20090417141817.GR11551@cicely7.cicely.de>
References: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net>
	<20090417141817.GR11551@cicely7.cicely.de>
X-Mailer: Claws Mail 3.7.1 (GTK+ 2.10.13; i586-pc-mingw32msvc)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-BPAnet-MailScanner-Information: Please contact the ISP for more information
X-MailScanner-ID: 09C302E068.61563
X-BPAnet-MailScanner: Found to be clean
X-BPAnet-MailScanner-SpamCheck: not spam, ORDB-RBL, SpamAssassin (not cached, 
	score=-14.4, required 6, BAYES_00 -15.00,
	L_HELLO_ADDRESS 0.50, RDNS_DYNAMIC 0.10)
X-BPAnet-MailScanner-From: alexander@leidinger.net
X-Spam-Status: No
Cc: current@freebsd.org, fs@freebsd.org
Subject: Re:  ZFS: unlimited arc cache growth?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 18 Apr 2009 07:39:14 -0000

On Fri, 17 Apr 2009 16:18:17 +0200 Bernd Walter
<ticso@cicely7.cicely.de> wrote:

> On Fri, Apr 17, 2009 at 02:50:24PM +0200, Alexander Leidinger wrote:
> > Hi,
> > 
> > to fs@, please CC me, as I'm not subscribed.
> > 
> > I monitored (by hand) a while the sysctls
> > kstat.zfs.misc.arcstats.size and kstat.zfs.misc.arcstats.hdr_size.
> > Both grow way higher (at some point I've seen more than 500M) than
> > what I have configured in vfs.zfs.arc_max (40M).
> 
> My understanding about this is the following:
> vfs.zfs.arc_min/max are not used as min max values.
> They are used as high/low watermarks.
> If arc is more than max the arc a thread is triggered to reduce the
> arc cache until min, but in the meantime other threads can still grow
> arc so there is a race between them.

500M (more than 10 times my max) after a night seems to be a big race...

> > After a while FS operations (e.g. pkgdb -F with about 900
> > packages... my specific workload is the fixup of gnome packages
> > after the removal of the obsolete libusb port) get very slow (in my
> > specific example I let the pkgdb run several times over night and
> > it still is not finished).
> 
> I've seen many workloads were prefetching can saturate disks without
> ever being used.
> You might want to try disabling prefetch.
> Of course prefetching also grows arc.

Prefetching is already disabled in this case.

> > The big problem with this is, that at some point in time the
> > machine reboots (panic, page fault, page not present, during a
> > fork1). I have the impression (beware, I have a watchdog
> > configured, as I don't know if a triggered WD would cause the same
> > panic, the following is just a guess) that I run out of memory of
> > some kind (I have 1G RAM, i386, max kmem size 700M). I restarted
> > pkgdb several times after a reboot, and it continues to process the
> > libusb removal, but hey, this is anoying.
> 
> With just 700M kmem you should set arc values extremly small and
> avoid anything which can quickly grow it.
> Unfortunately accessing many small files is a know arc filling
> workload. Activating vfs.zfs.cache_flush_disable can help speeding up
> arc decreasing, with the obvous risks of course...

I have this:
---snip---
vfs.zfs.prefetch_disable=1
vm.kmem_size="700M"
vm.kmem_size_max="700M"
vfs.zfs.arc_max="40M"
vfs.zfs.vdev.cache.size="5M"
vfs.zfs.vdev.cache.bshift="13"  # device read ahead: 8k
vfs.zfs.vdev.max_pending="6"    # congruent request to the device, + for NCQ
---snip---

Bye,
Alexander.

From owner-freebsd-fs@FreeBSD.ORG  Sat Apr 18 07:48:31 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 353CD106564A;
	Sat, 18 Apr 2009 07:48:31 +0000 (UTC)
	(envelope-from alexander@leidinger.net)
Received: from redbull.bpaserver.net (redbullneu.bpaserver.net
	[213.198.78.217])
	by mx1.freebsd.org (Postfix) with ESMTP id AC7278FC0C;
	Sat, 18 Apr 2009 07:48:30 +0000 (UTC)
	(envelope-from alexander@leidinger.net)
Received: from outgoing.leidinger.net (pD9E2DC61.dip.t-dialin.net
	[217.226.220.97])
	by redbull.bpaserver.net (Postfix) with ESMTP id A1B1D2E0AD;
	Sat, 18 Apr 2009 09:48:26 +0200 (CEST)
Received: from unknown (IO.Leidinger.net [192.168.2.103])
	by outgoing.leidinger.net (Postfix) with ESMTP id F2787C2E1F;
	Sat, 18 Apr 2009 09:48:22 +0200 (CEST)
Date: Sat, 18 Apr 2009 09:48:21 +0200
From: Alexander Leidinger <Alexander@Leidinger.net>
To: Ben Kelly <ben@wanderview.com>
Message-ID: <20090418094821.00002e67@unknown>
In-Reply-To: <D2B1B82F-AFCF-4161-BB9E-316EC976E360@wanderview.com>
References: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net>
	<D2B1B82F-AFCF-4161-BB9E-316EC976E360@wanderview.com>
X-Mailer: Claws Mail 3.7.1 (GTK+ 2.10.13; i586-pc-mingw32msvc)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-BPAnet-MailScanner-Information: Please contact the ISP for more information
X-MailScanner-ID: A1B1D2E0AD.767EF
X-BPAnet-MailScanner: Found to be clean
X-BPAnet-MailScanner-SpamCheck: not spam, ORDB-RBL, SpamAssassin (not cached, 
	score=-14.823, required 6, BAYES_00 -15.00,
	RDNS_DYNAMIC 0.10, TW_ZF 0.08)
X-BPAnet-MailScanner-From: alexander@leidinger.net
X-Spam-Status: No
Cc: current@freebsd.org, fs@freebsd.org
Subject: Re:  ZFS: unlimited arc cache growth?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 18 Apr 2009 07:48:31 -0000

On Fri, 17 Apr 2009 10:04:15 -0400 Ben Kelly <ben@wanderview.com> wrote:


> On Apr 17, 2009, at 8:50 AM, Alexander Leidinger wrote:
> > to fs@, please CC me, as I'm not subscribed.
> >
> > I monitored (by hand) a while the sysctls  
> > kstat.zfs.misc.arcstats.size and kstat.zfs.misc.arcstats.hdr_size.  
> > Both grow way higher (at some point I've seen more than 500M) than  
> > what I have configured in vfs.zfs.arc_max (40M).
> >
> > After a while FS operations (e.g. pkgdb -F with about 900  
> > packages... my specific workload is the fixup of gnome packages  
> > after the removal of the obsolete libusb port) get very slow (in
> > my specific example I let the pkgdb run several times over night
> > and it still is not finished).
> >
> > The big problem with this is, that at some point in time the
> > machine reboots (panic, page fault, page not present, during a
> > fork1). I have the impression (beware, I have a watchdog
> > configured, as I don't know if a triggered WD would cause the same
> > panic, the following is just a guess) that I run out of memory of
> > some kind (I have 1G RAM, i386, max kmem size 700M). I restarted
> > pkgdb several times after a reboot, and it continues to process the
> > libusb removal, but hey, this is anoying.
> >
> > Does someone see something similar to what I describe (mainly the  
> > growth of the arc cache way beyond what is configured)? Anyone
> > with some ideas what to try?
> 
> Can you provide the rest of the arcstats from sysctl?  Also, does
> your arc_reclaim_thread process get any cycles when this problem
> occurs? What happens if you kill the pkgdb -F manually before it
> completes? Does the arc cache size come back down or is it stuck at
> the abnormally high level?

I haven't tried killing pkgdb and looking at the stats, but on the idle
machine (reboot after the panic and 5h of no use by me... the machine
fetches my mails, has a webmail + mysql + imap interface and is a
fileserver) the size is double of my max value. Again there's no real
load at this time, just fetching my mails (most traffic from the
FreeBSD lists) and a little bit of SpamAssassin filtering of them. When
I logged in this morning the machine was rebooted about 5h ago by a
panic and no FS traffic was going on (100% idle).

Currently the arc_reclaim_thread has 0:12 of accumulated CPU time,
the wcpu is at 0%, but it is in the running state. The machine is
about 80% idle.

Here are all zfs sysctls as of now (pkgdb started 5min ago):
---snip---
# sysctl -a | grep zfs
vfs.zfs.arc_meta_limit: 10485760
vfs.zfs.arc_meta_used: 130211600
vfs.zfs.mdcomp_disable: 0
vfs.zfs.arc_min: 22937600
vfs.zfs.arc_max: 41943040
vfs.zfs.zfetch.array_rd_sz: 1048576
vfs.zfs.zfetch.block_cap: 256
vfs.zfs.zfetch.min_sec_reap: 2
vfs.zfs.zfetch.max_streams: 8
vfs.zfs.prefetch_disable: 1
vfs.zfs.recover: 0
vfs.zfs.txg.synctime: 5
vfs.zfs.txg.timeout: 30
vfs.zfs.scrub_limit: 10
vfs.zfs.vdev.cache.bshift: 13
vfs.zfs.vdev.cache.size: 5242880
vfs.zfs.vdev.cache.max: 16384
vfs.zfs.vdev.aggregation_limit: 131072
vfs.zfs.vdev.ramp_rate: 2
vfs.zfs.vdev.time_shift: 6
vfs.zfs.vdev.min_pending: 4
vfs.zfs.vdev.max_pending: 6
vfs.zfs.cache_flush_disable: 0
vfs.zfs.zil_disable: 0
vfs.zfs.version.zpl: 3
vfs.zfs.version.vdev_boot: 1
vfs.zfs.version.spa: 13
vfs.zfs.version.dmu_backup_stream: 1
vfs.zfs.version.dmu_backup_header: 2
vfs.zfs.version.acl: 1
vfs.zfs.debug: 0
vfs.zfs.super_owner: 0
kstat.zfs.misc.arcstats.hits: 2483157
kstat.zfs.misc.arcstats.misses: 604115
kstat.zfs.misc.arcstats.demand_data_hits: 187200
kstat.zfs.misc.arcstats.demand_data_misses: 78685
kstat.zfs.misc.arcstats.demand_metadata_hits: 2295957
kstat.zfs.misc.arcstats.demand_metadata_misses: 525430
kstat.zfs.misc.arcstats.prefetch_data_hits: 0
kstat.zfs.misc.arcstats.prefetch_data_misses: 0
kstat.zfs.misc.arcstats.prefetch_metadata_hits: 0
kstat.zfs.misc.arcstats.prefetch_metadata_misses: 0
kstat.zfs.misc.arcstats.mru_hits: 1621026
kstat.zfs.misc.arcstats.mru_ghost_hits: 32102
kstat.zfs.misc.arcstats.mfu_hits: 862131
kstat.zfs.misc.arcstats.mfu_ghost_hits: 18804
kstat.zfs.misc.arcstats.deleted: 550853
kstat.zfs.misc.arcstats.recycle_miss: 287993
kstat.zfs.misc.arcstats.mutex_miss: 2
kstat.zfs.misc.arcstats.evict_skip: 654418
kstat.zfs.misc.arcstats.hash_elements: 5363
kstat.zfs.misc.arcstats.hash_elements_max: 8569
kstat.zfs.misc.arcstats.hash_collisions: 133396
kstat.zfs.misc.arcstats.hash_chains: 739
kstat.zfs.misc.arcstats.hash_chain_max: 5
kstat.zfs.misc.arcstats.p: 41943040
kstat.zfs.misc.arcstats.c: 41943040
kstat.zfs.misc.arcstats.c_min: 22937600
kstat.zfs.misc.arcstats.c_max: 41943040
kstat.zfs.misc.arcstats.size: 130467088
kstat.zfs.misc.arcstats.hdr_size: 730456
kstat.zfs.misc.arcstats.l2_hits: 0
kstat.zfs.misc.arcstats.l2_misses: 0
kstat.zfs.misc.arcstats.l2_feeds: 0
kstat.zfs.misc.arcstats.l2_rw_clash: 0
kstat.zfs.misc.arcstats.l2_writes_sent: 0
kstat.zfs.misc.arcstats.l2_writes_done: 0
kstat.zfs.misc.arcstats.l2_writes_error: 0
kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 0
kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0
kstat.zfs.misc.arcstats.l2_evict_reading: 0
kstat.zfs.misc.arcstats.l2_free_on_write: 0
kstat.zfs.misc.arcstats.l2_abort_lowmem: 0
kstat.zfs.misc.arcstats.l2_cksum_bad: 0
kstat.zfs.misc.arcstats.l2_io_error: 0
kstat.zfs.misc.arcstats.l2_size: 0
kstat.zfs.misc.arcstats.l2_hdr_size: 0
kstat.zfs.misc.arcstats.memory_throttle_count: 0
kstat.zfs.misc.vdev_cache_stats.delegations: 2728
kstat.zfs.misc.vdev_cache_stats.hits: 297326
kstat.zfs.misc.vdev_cache_stats.misses: 368918
---snip---

Bye,
Alexander.

From owner-freebsd-fs@FreeBSD.ORG  Sat Apr 18 12:58:38 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id AE88A1065C57;
	Sat, 18 Apr 2009 12:58:38 +0000 (UTC)
	(envelope-from marius@nuenneri.ch)
Received: from mail-fx0-f167.google.com (mail-fx0-f167.google.com
	[209.85.220.167])
	by mx1.freebsd.org (Postfix) with ESMTP id E20738FC16;
	Sat, 18 Apr 2009 12:58:37 +0000 (UTC)
	(envelope-from marius@nuenneri.ch)
Received: by fxm11 with SMTP id 11so1268416fxm.43
	for <multiple recipients>; Sat, 18 Apr 2009 05:58:37 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.204.55.142 with SMTP id u14mr3348482bkg.121.1240059516812; 
	Sat, 18 Apr 2009 05:58:36 -0700 (PDT)
In-Reply-To: <20090418094821.00002e67@unknown>
References: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net>
	<D2B1B82F-AFCF-4161-BB9E-316EC976E360@wanderview.com>
	<20090418094821.00002e67@unknown>
Date: Sat, 18 Apr 2009 14:58:36 +0200
Message-ID: <b649e5e0904180558u4e94b9e6p254067a3a53db097@mail.gmail.com>
From: =?ISO-8859-1?Q?Marius_N=FCnnerich?= <marius@nuenneri.ch>
To: Alexander Leidinger <Alexander@leidinger.net>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: fs@freebsd.org, current@freebsd.org, Ben Kelly <ben@wanderview.com>
Subject: Re: ZFS: unlimited arc cache growth?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 18 Apr 2009 12:58:48 -0000

On Sat, Apr 18, 2009 at 09:48, Alexander Leidinger
<Alexander@leidinger.net> wrote:
> On Fri, 17 Apr 2009 10:04:15 -0400 Ben Kelly <ben@wanderview.com> wrote:
>
>
>> On Apr 17, 2009, at 8:50 AM, Alexander Leidinger wrote:
>> > to fs@, please CC me, as I'm not subscribed.
>> >
>> > I monitored (by hand) a while the sysctls
>> > kstat.zfs.misc.arcstats.size and kstat.zfs.misc.arcstats.hdr_size.
>> > Both grow way higher (at some point I've seen more than 500M) than
>> > what I have configured in vfs.zfs.arc_max (40M).
>> >
>> > After a while FS operations (e.g. pkgdb -F with about 900
>> > packages... my specific workload is the fixup of gnome packages
>> > after the removal of the obsolete libusb port) get very slow (in
>> > my specific example I let the pkgdb run several times over night
>> > and it still is not finished).
>> >
>> > The big problem with this is, that at some point in time the
>> > machine reboots (panic, page fault, page not present, during a
>> > fork1). I have the impression (beware, I have a watchdog
>> > configured, as I don't know if a triggered WD would cause the same
>> > panic, the following is just a guess) that I run out of memory of
>> > some kind (I have 1G RAM, i386, max kmem size 700M). I restarted
>> > pkgdb several times after a reboot, and it continues to process the
>> > libusb removal, but hey, this is anoying.
>> >
>> > Does someone see something similar to what I describe (mainly the
>> > growth of the arc cache way beyond what is configured)? Anyone
>> > with some ideas what to try?
>>
>> Can you provide the rest of the arcstats from sysctl? =A0Also, does
>> your arc_reclaim_thread process get any cycles when this problem
>> occurs? What happens if you kill the pkgdb -F manually before it
>> completes? Does the arc cache size come back down or is it stuck at
>> the abnormally high level?
>
> I haven't tried killing pkgdb and looking at the stats, but on the idle
> machine (reboot after the panic and 5h of no use by me... the machine
> fetches my mails, has a webmail + mysql + imap interface and is a
> fileserver) the size is double of my max value. Again there's no real
> load at this time, just fetching my mails (most traffic from the
> FreeBSD lists) and a little bit of SpamAssassin filtering of them. When
> I logged in this morning the machine was rebooted about 5h ago by a
> panic and no FS traffic was going on (100% idle).
>
> Currently the arc_reclaim_thread has 0:12 of accumulated CPU time,
> the wcpu is at 0%, but it is in the running state. The machine is
> about 80% idle.
>

[snip]

How about adding a few DTrace probes into arc_reclaim_thread and see
what it does?

From owner-freebsd-fs@FreeBSD.ORG  Sat Apr 18 21:17:04 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 77FA0106576E;
	Sat, 18 Apr 2009 21:17:04 +0000 (UTC)
	(envelope-from ben@wanderview.com)
Received: from mail.wanderview.com (mail.wanderview.com [66.92.166.102])
	by mx1.freebsd.org (Postfix) with ESMTP id 001308FC13;
	Sat, 18 Apr 2009 21:17:03 +0000 (UTC)
	(envelope-from ben@wanderview.com)
Received: from harkness.in.wanderview.com (harkness.in.wanderview.com
	[10.76.10.150]) (authenticated bits=0)
	by mail.wanderview.com (8.14.3/8.14.3) with ESMTP id n3ILH0Dk003279
	(version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO);
	Sat, 18 Apr 2009 21:17:01 GMT (envelope-from ben@wanderview.com)
Message-Id: <6535218D-6292-4F84-A8BA-FFA9B2E47F80@wanderview.com>
From: Ben Kelly <ben@wanderview.com>
To: Alexander Leidinger <Alexander@Leidinger.net>
In-Reply-To: <20090418094821.00002e67@unknown>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v930.3)
Date: Sat, 18 Apr 2009 17:17:00 -0400
References: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net>
	<D2B1B82F-AFCF-4161-BB9E-316EC976E360@wanderview.com>
	<20090418094821.00002e67@unknown>
X-Mailer: Apple Mail (2.930.3)
X-Spam-Score: -1.44 () ALL_TRUSTED
X-Scanned-By: MIMEDefang 2.64 on 10.76.20.1
Cc: current@freebsd.org, fs@freebsd.org
Subject: Re: ZFS: unlimited arc cache growth?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 18 Apr 2009 21:17:04 -0000

On Apr 18, 2009, at 3:48 AM, Alexander Leidinger wrote:
> On Fri, 17 Apr 2009 10:04:15 -0400 Ben Kelly <ben@wanderview.com>  
> wrote:
> I haven't tried killing pkgdb and looking at the stats, but on the  
> idle
> machine (reboot after the panic and 5h of no use by me... the machine
> fetches my mails, has a webmail + mysql + imap interface and is a
> fileserver) the size is double of my max value. Again there's no real
> load at this time, just fetching my mails (most traffic from the
> FreeBSD lists) and a little bit of SpamAssassin filtering of them.  
> When
> I logged in this morning the machine was rebooted about 5h ago by a
> panic and no FS traffic was going on (100% idle).

 From looking at the code, its not too surprising it settles out at 2x  
your zfs_arc_max tunable.  It looks like under normal conditions the  
arc_reclaim_thread only tries to evict buffers when the arc_size plus  
any ghost buffers is twice the value of arc_c:

                 if (needfree ||
                     (2 * arc_c < arc_size +
                     arc_mru_ghost->arcs_size + arc_mfu_ghost- 
 >arcs_size))
                         arc_adjust();

(The needfree flag is only set when the system lowmem event is  
fired.)  The arc_reclaim_thread checks this once a second.  Perhaps  
this limit should be a tunable.  Also, it might make sense to have a  
separate limit check for the ghost buffers.

I was able to reproduce similar arc_size growth on my machine by  
running my rsync backup.  After instrumenting the code it appeared  
that buffers were not being evicted because they were "indirect" and  
had been in the cache less than a second.  The "indirect" flag is set  
based on the on-disk level field.  When you see the  
arcstats.evict_skip sysctl going up this is probably what is  
happening.  The comments in the code say this check is only for  
prefetch data, but it also triggers for indirect.  I'm hesitant to  
make it really only affect prefetch buffers.  Perhaps we could make  
the timeout a tunable or dynamic based on how far the cache is over  
its target.

After the rsync completed my machine slowly evicts buffers until its  
back down to about twice arc_c.  There was one case, however, where I  
saw it stop at about four times arc_c.  In that case it was failing to  
evict buffers due to a missed lock.  Its not clear yet if it was a  
buffer lock or hash lock.  When this happens you'll see the  
arcstats.mutex_missed sysctl go up.  I'm going to see if I can track  
down why this is occuring under idle conditions.  That seems  
suspicious to me.

Hope that helps.  I'll let you know if I find anything else.

- Ben

From owner-freebsd-fs@FreeBSD.ORG  Sat Apr 18 21:25:22 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A30AD106566C;
	Sat, 18 Apr 2009 21:25:22 +0000 (UTC)
	(envelope-from ben@wanderview.com)
Received: from mail.wanderview.com (mail.wanderview.com [66.92.166.102])
	by mx1.freebsd.org (Postfix) with ESMTP id 2C6338FC25;
	Sat, 18 Apr 2009 21:25:21 +0000 (UTC)
	(envelope-from ben@wanderview.com)
Received: from harkness.in.wanderview.com (harkness.in.wanderview.com
	[10.76.10.150]) (authenticated bits=0)
	by mail.wanderview.com (8.14.3/8.14.3) with ESMTP id n3ILPHve003379
	(version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO);
	Sat, 18 Apr 2009 21:25:18 GMT (envelope-from ben@wanderview.com)
Message-Id: <6FBF637A-6D96-4117-85C5-F205989DCCC1@wanderview.com>
From: Ben Kelly <ben@wanderview.com>
To: =?ISO-8859-1?Q?Marius_N=FCnnerich?= <marius@nuenneri.ch>
In-Reply-To: <b649e5e0904170928p1568329bwb2f8a5f0f9b4f698@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Apple Message framework v930.3)
Date: Sat, 18 Apr 2009 17:25:17 -0400
References: <20090417145024.205173ighmwi4j0o@webmail.leidinger.net>
	<20090417141817.GR11551@cicely7.cicely.de>
	<b649e5e0904170928p1568329bwb2f8a5f0f9b4f698@mail.gmail.com>
X-Mailer: Apple Mail (2.930.3)
X-Spam-Score: -1.44 () ALL_TRUSTED
X-Scanned-By: MIMEDefang 2.64 on 10.76.20.1
Cc: Alexander Leidinger <Alexander@leidinger.net>, ticso@cicely.de,
	fs@freebsd.org, current@freebsd.org
Subject: Re: ZFS: unlimited arc cache growth?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 18 Apr 2009 21:25:22 -0000


On Apr 17, 2009, at 12:28 PM, Marius N=FCnnerich wrote:

> On Fri, Apr 17, 2009 at 16:18, Bernd Walter =20
> <ticso@cicely7.cicely.de> wrote:
>> On Fri, Apr 17, 2009 at 02:50:24PM +0200, Alexander Leidinger wrote:
>>> Hi,
>>>
>>> to fs@, please CC me, as I'm not subscribed.
>>>
>>> I monitored (by hand) a while the sysctls =20
>>> kstat.zfs.misc.arcstats.size
>>> and kstat.zfs.misc.arcstats.hdr_size. Both grow way higher (at some
>>> point I've seen more than 500M) than what I have configured in
>>> vfs.zfs.arc_max (40M).
>>
>> My understanding about this is the following:
>> vfs.zfs.arc_min/max are not used as min max values.
>> They are used as high/low watermarks.
>> If arc is more than max the arc a thread is triggered to reduce the
>> arc cache until min, but in the meantime other threads can still grow
>> arc so there is a race between them.
>
> Hmm, if this is true the ARC size should go down to arc_min once it
> did grow past arc_max and no new data is coming along but I do not
> observe such a thing here. It simply stays near but below arc_max here
> all the time. I have only /home on ZFS with moderate load.

It appears arc_reclaim_thread only shrinks from arc_max when the =20
system vm_lowmem event is fired or more than 75% of max kmem is in use =20=

by the system.

If you want to make it try to shrink the arc all the time you could =20
try the patch below.  This worked to reduce arc_c on my system, but it =20=

was unable to reduce arc_size to match due to an apparent mutex miss.  =20=

I'm still trying to track that down.

Hope that helps.

- Ben

Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c        =20
(revision 205)
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c        =20
(working copy)
@@ -1963,7 +1963,7 @@
                 if (needfree ||
                     (2 * arc_c < arc_size +
                     arc_mru_ghost->arcs_size + arc_mfu_ghost-=20
 >arcs_size))
-                       arc_adjust();
+                       arc_shrink();

                 if (arc_eviction_list !=3D NULL)
                         arc_do_user_evicts();=

From owner-freebsd-fs@FreeBSD.ORG  Sat Apr 18 22:57:01 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 405E4106566B
	for <freebsd-fs@freebsd.org>; Sat, 18 Apr 2009 22:57:01 +0000 (UTC)
	(envelope-from morganw@chemikals.org)
Received: from warped.bluecherry.net (unknown [IPv6:2001:440:eeee:fffb::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 0537E8FC17
	for <freebsd-fs@freebsd.org>; Sat, 18 Apr 2009 22:57:01 +0000 (UTC)
	(envelope-from morganw@chemikals.org)
Received: from volatile.chemikals.org (adsl-67-215-2.shv.bellsouth.net
	[98.67.215.2])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client did not present a certificate)
	by warped.bluecherry.net (Postfix) with ESMTPSA id A8C388004D04
	for <freebsd-fs@freebsd.org>; Sat, 18 Apr 2009 17:56:58 -0500 (CDT)
Received: from localhost (morganw@localhost [127.0.0.1])
	by volatile.chemikals.org (8.14.3/8.14.3) with ESMTP id n3IMuqli041194
	for <freebsd-fs@freebsd.org>; Sat, 18 Apr 2009 17:56:53 -0500 (CDT)
	(envelope-from morganw@chemikals.org)
Date: Sat, 18 Apr 2009 17:56:52 -0500 (CDT)
From: Wes Morgan <morganw@chemikals.org>
To: freebsd-fs@freebsd.org
Message-ID: <alpine.BSF.2.00.0904181756190.41188@ibyngvyr.purzvxnyf.bet>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Subject: Marvell 88SE6480
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 18 Apr 2009 22:57:01 -0000

Saw this on zfs-discuss:

http://supermicro.com/products/accessories/addon/AOC-SASLP-MV8.cfm

Has a Marvell 88SE6480 chipset on it. Looks like a good controller for zfs 
arrays. It doesn't appear to be supported by FreeBSD (yet). Anyone know 
more about it?