From owner-freebsd-fs@FreeBSD.ORG  Sun Nov 14 09:16:24 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DFB601065697;
	Sun, 14 Nov 2010 09:16:24 +0000 (UTC)
	(envelope-from to.my.trociny@gmail.com)
Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com
	[209.85.161.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 4BE788FC12;
	Sun, 14 Nov 2010 09:16:23 +0000 (UTC)
Received: by fxm19 with SMTP id 19so3285058fxm.13
	for <multiple recipients>; Sun, 14 Nov 2010 01:16:23 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:from:to:cc:subject:date
	:message-id:user-agent:mime-version:content-type;
	bh=7ufVHBxTzW6RDxdwnfZLdfIsKcLjd4xfmxAYKUhXfkE=;
	b=YjhXfuRYI+YlYTX2T8XOfQQehBE8s70T48C2VV3hYVrFzbdFwkF7b33XDkBRU+VBdY
	PcsZ1VG3+IO0Kxoqfb43K/t/6JyqN8K0AFs+2YEeSRTWgSuPKnXu0rj7DLeN3rudDG8i
	R36jioscfE3vhb+LWMrs0YkCOhlw5zuXS4/HY=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=from:to:cc:subject:date:message-id:user-agent:mime-version
	:content-type;
	b=koUehaifA23hslzsaKVPI8EbStCyl8wy85XZEDn4C0ud8PtoOPfI3K3kpeMu76xpPv
	MBwemCx0pi3Qrm9NLvve0wqSe3+REIYPV8eYm8bTOHyZN5R6Geq/k45XnfSipiREuE+G
	3ukCYEPti6XB1lBTJjNKuEQ2k4DdRw8FZdHk0=
Received: by 10.223.86.197 with SMTP id t5mr3535804fal.38.1289726182498;
	Sun, 14 Nov 2010 01:16:22 -0800 (PST)
Received: from localhost ([95.69.174.185])
	by mx.google.com with ESMTPS id a25sm523695fab.13.2010.11.14.01.16.20
	(version=TLSv1/SSLv3 cipher=RC4-MD5);
	Sun, 14 Nov 2010 01:16:21 -0800 (PST)
From: Mikolaj Golub <to.my.trociny@gmail.com>
To: freebsd-fs@freebsd.org
Date: Sun, 14 Nov 2010 11:16:18 +0200
Message-ID: <86wrogqn8d.fsf@kopusha.home.net>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=-=-="
Cc: Pawel Jakub Dawidek <pjd@freebsd.org>
Subject: hastd/primary.c micro fixes
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Nov 2010 09:16:25 -0000

--=-=-=

Hi,

noticed the following in hastd/primary.c:

1) in init_remote() when incoming connection is set up, if proto_client()
fails it only reports about the error and try proto_connect() anyway. This
will cause a crash as proto_conn structure is not valid. Primary should rather
exit (as it is when outgoing connection is set up, and as it is in the patch
below) or set proto_conn pointer to NULL and goto close.

2) in guard_thread() timeout.tv_sec is set in loop, while it looks like it may
be set only once before the loop.

-- 
Mikolaj Golub


--=-=-=
Content-Type: text/x-patch
Content-Disposition: inline; filename=primary.c.patch

Index: sbin/hastd/primary.c
===================================================================
--- sbin/hastd/primary.c	(revision 215165)
+++ sbin/hastd/primary.c	(working copy)
@@ -577,7 +577,7 @@ init_remote(struct hast_resource *res, struct prot
 	 * Setup incoming connection with remote node.
 	 */
 	if (proto_client(res->hr_remoteaddr, &in) < 0) {
-		pjdlog_errno(LOG_WARNING, "Unable to create connection to %s",
+		primary_exit(EX_TEMPFAIL, "Unable to create connection to %s",
 		    res->hr_remoteaddr);
 	}
 	/* Try to connect, but accept failure. */
@@ -2008,6 +2008,7 @@ guard_thread(void *arg)
 	PJDLOG_VERIFY(sigaddset(&mask, SIGINT) == 0);
 	PJDLOG_VERIFY(sigaddset(&mask, SIGTERM) == 0);
 
+	timeout.tv_sec = RETRY_SLEEP;
 	timeout.tv_nsec = 0;
 	signo = -1;
 
@@ -2033,7 +2034,6 @@ guard_thread(void *arg)
 				guard_one(res, ii);
 			lastcheck = now;
 		}
-		timeout.tv_sec = RETRY_SLEEP;
 		signo = sigtimedwait(&mask, NULL, &timeout);
 	}
 	/* NOTREACHED */

--=-=-=--

From owner-freebsd-fs@FreeBSD.ORG  Sun Nov 14 17:10:24 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3FA7A1065674
	for <freebsd-fs@freebsd.org>; Sun, 14 Nov 2010 17:10:24 +0000 (UTC)
	(envelope-from v.velox@vvelox.net)
Received: from vulpes.vvelox.net (sula-ki.vvelox.net [99.69.115.46])
	by mx1.freebsd.org (Postfix) with ESMTP id E9A7F8FC13
	for <freebsd-fs@freebsd.org>; Sun, 14 Nov 2010 17:10:23 +0000 (UTC)
Received: from vixen42.vulpes.vvelox.net (unknown [192.168.14.2])
	(Authenticated sender: v.velox)
	by vulpes.vvelox.net (Postfix) with ESMTPA id 2D53AB882;
	Sun, 14 Nov 2010 10:17:37 -0600 (CST)
Date: Sun, 14 Nov 2010 10:13:24 -0600
From: "Zane C.B." <v.velox@vvelox.net>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
Message-ID: <20101114101324.154a822e@vixen42.vulpes.vvelox.net>
In-Reply-To: <20101109170104.GA37882@icarus.home.lan>
References: <20101109101740.2e8e80ce@vixen42.vulpes.vvelox.net>
	<20101109170104.GA37882@icarus.home.lan>
X-Mailer: Claws Mail 3.7.6 (GTK+ 2.20.1; amd64-portbld-freebsd8.1)
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=PGP-SHA1;
	boundary="Sig_/daoOeefSdA5ViHWzjJh3PRH";
	protocol="application/pgp-signature"
Cc: freebsd-fs@freebsd.org
Subject: Re: NFS locking
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Nov 2010 17:10:24 -0000

--Sig_/daoOeefSdA5ViHWzjJh3PRH
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Tue, 9 Nov 2010 09:01:04 -0800
Jeremy Chadwick <freebsd@jdc.parodius.com> wrote:

> On Tue, Nov 09, 2010 at 10:17:40AM -0600, Zane C.B. wrote:
> > What does it take to get NFS locking working?
> >=20
> > Any time I start lockd on the server, I get the message below in
> > dmesg.
> >=20
> > NLM: failed to contact remote rpcbind, stat =3D 0, port =3D 0
> > NLM: failed to contact remote rpcbind, stat =3D 0, port =3D 0
> > Can't start NLM - unable to contact NSM
> >=20
> > Any ideas?
>=20
> http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2010-03/msg00484.h=
tml
>=20
> Solution (for me):
>=20
> http://lists.freebsd.org/pipermail/freebsd-stable/2010-March/056043.html

Ahh! Thanks.

Got poking through those two and noticed what I missed. rpc.lockd
requires rpc.statd.

--Sig_/daoOeefSdA5ViHWzjJh3PRH
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.16 (FreeBSD)

iEYEARECAAYFAkzgCq0ACgkQqrJJy0yxYQCFoQCfS9vic4zG7mMRbJMa4K2nvv1Q
dvEAn1icpOlCuYmPc5GCCS3ff+chJNnQ
=pbe7
-----END PGP SIGNATURE-----

--Sig_/daoOeefSdA5ViHWzjJh3PRH--

From owner-freebsd-fs@FreeBSD.ORG  Mon Nov 15 00:24:04 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 76B89106564A
	for <fs@freebsd.org>; Mon, 15 Nov 2010 00:24:04 +0000 (UTC)
	(envelope-from areilly@bigpond.net.au)
Received: from nskntqsrv03p.mx.bigpond.com (nskntqsrv03p.mx.bigpond.com
	[61.9.168.237]) by mx1.freebsd.org (Postfix) with ESMTP id 0FBB48FC0C
	for <fs@freebsd.org>; Mon, 15 Nov 2010 00:24:03 +0000 (UTC)
Received: from nskntotgx03p.mx.bigpond.com ([124.188.161.100])
	by nskntmtas03p.mx.bigpond.com with ESMTP id
	<20101114230917.ZTDJ24865.nskntmtas03p.mx.bigpond.com@nskntotgx03p.mx.bigpond.com>;
	Sun, 14 Nov 2010 23:09:17 +0000
Received: from johnny.reilly.home ([124.188.161.100])
	by nskntotgx03p.mx.bigpond.com with ESMTP id
	<20101114230917.ELLX13584.nskntotgx03p.mx.bigpond.com@johnny.reilly.home>;
	Sun, 14 Nov 2010 23:09:17 +0000
Date: Mon, 15 Nov 2010 10:08:56 +1100
From: Andrew Reilly <areilly@bigpond.net.au>
Message-ID: <20101114230856.GA14153@johnny.reilly.home>
References: <4CD04AEC.8040607@aldan.algebra.com>
	<20101103030515.GA61758@icarus.home.lan>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20101103030515.GA61758@icarus.home.lan>
User-Agent: Mutt/1.4.2.3i
X-RPD-ScanID: Class unknown; VirusThreatLevel unknown, RefID
	str=0001.0A150205.4CE06C1D.006B,ss=1,fgs=0
X-SIH-MSG-ID: rxo6F9z7TAD0zmQs0WyzOwJxyArnqyN48Z4QX81loRIGTUDCp8DeQ9rANv1Rv9GgxD9IJhiGNGEpaa3jTY3Rs9mK
Cc: fs@freebsd.org
Subject: Re: Using an SSD "disk" for /
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Nov 2010 00:24:04 -0000


Just another happy data-point:

I'm using an "SSD disk" for "the OS", which for now I'm calling /, /usr, /usr/local.
I have swap, /var, /usr/src, /usr/obj, /usr/ports and /usr/home (and various other
scratch areas) on a raidz on four real (SATA) disks.  /tmp is a tmpfs.  Seems to be working
awesomely well.  My /dev/gpt/root is mounted with soft-updates and noatime, in an
attempt to keep writes down.  (Which is also why /var and /tmp aren't on it.)  Boot
and OS/port updating is very fast.  Well, fast enough for me.  The SSD disk itself is
an 8G compact flash card that I pinched from my camera bag, mounted in a SATA adaptor.
Not because it's "best" or anything, but I thought that it could be handy to be able
to rebuild or replace my "OS" off-line more easily.

Cheers,

-- 
Andrew


From owner-freebsd-fs@FreeBSD.ORG  Mon Nov 15 03:20:05 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5CD2C106564A
	for <freebsd-fs@freebsd.org>; Mon, 15 Nov 2010 03:20:05 +0000 (UTC)
	(envelope-from pjd@garage.freebsd.pl)
Received: from mail.garage.freebsd.pl (60.wheelsystems.com [83.12.187.60])
	by mx1.freebsd.org (Postfix) with ESMTP id 03E9F8FC17
	for <freebsd-fs@freebsd.org>; Mon, 15 Nov 2010 03:19:58 +0000 (UTC)
Received: by mail.garage.freebsd.pl (Postfix, from userid 65534)
	id B6F5945CD8; Mon, 15 Nov 2010 04:19:56 +0100 (CET)
Received: from localhost (chello089073192049.chello.pl [89.73.192.49])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.garage.freebsd.pl (Postfix) with ESMTP id C92B945683;
	Mon, 15 Nov 2010 04:19:51 +0100 (CET)
Date: Mon, 15 Nov 2010 04:19:04 +0100
From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
To: Mikolaj Golub <to.my.trociny@gmail.com>
Message-ID: <20101115031904.GJ4780@garage.freebsd.pl>
References: <86wrogqn8d.fsf@kopusha.home.net>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="/0P/MvzTfyTu5j9Q"
Content-Disposition: inline
In-Reply-To: <86wrogqn8d.fsf@kopusha.home.net>
User-Agent: Mutt/1.4.2.3i
X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc
X-OS: FreeBSD 9.0-CURRENT amd64
X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on 
	mail.garage.freebsd.pl
X-Spam-Level: 
X-Spam-Status: No, score=-0.6 required=4.5 tests=BAYES_00,RCVD_IN_SORBS_DUL 
	autolearn=no version=3.0.4
Cc: freebsd-fs@freebsd.org
Subject: Re: hastd/primary.c micro fixes
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Nov 2010 03:20:05 -0000


--/0P/MvzTfyTu5j9Q
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Nov 14, 2010 at 11:16:18AM +0200, Mikolaj Golub wrote:
> Hi,
>=20
> noticed the following in hastd/primary.c:
>=20
> 1) in init_remote() when incoming connection is set up, if proto_client()
> fails it only reports about the error and try proto_connect() anyway. This
> will cause a crash as proto_conn structure is not valid. Primary should r=
ather
> exit (as it is when outgoing connection is set up, and as it is in the pa=
tch
> below) or set proto_conn pointer to NULL and goto close.
>=20
> 2) in guard_thread() timeout.tv_sec is set in loop, while it looks like i=
t may
> be set only once before the loop.

Thanks, I committed both changes.

--=20
Pawel Jakub Dawidek                       http://www.wheelsystems.com
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--/0P/MvzTfyTu5j9Q
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (FreeBSD)

iEYEARECAAYFAkzgpqcACgkQForvXbEpPzQ4xwCfUdeVz3dOii6sW71zcXxXqhCV
ZtYAn1xp+0jAHy2I5hKgh+lAba8DUGX/
=zP4z
-----END PGP SIGNATURE-----

--/0P/MvzTfyTu5j9Q--

From owner-freebsd-fs@FreeBSD.ORG  Mon Nov 15 11:06:55 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EFEC91065672
	for <freebsd-fs@FreeBSD.org>; Mon, 15 Nov 2010 11:06:55 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id C24668FC22
	for <freebsd-fs@FreeBSD.org>; Mon, 15 Nov 2010 11:06:55 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id oAFB6t41086280
	for <freebsd-fs@FreeBSD.org>; Mon, 15 Nov 2010 11:06:55 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id oAFB6tJ1086278
	for freebsd-fs@FreeBSD.org; Mon, 15 Nov 2010 11:06:55 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 15 Nov 2010 11:06:55 GMT
Message-Id: <201011151106.oAFB6tJ1086278@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
	owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@FreeBSD.org>
To: freebsd-fs@FreeBSD.org
Cc: 
Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Nov 2010 11:06:56 -0000

Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.


S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/152079  fs         [msdosfs] [patch] Small cleanups from the other NetBSD
o kern/152022  fs         [nfs] nfs service hangs with linux client [regression]
o kern/151942  fs         [zfs] panic during ls(1) zfs snapshot directory
o kern/151905  fs         [zfs] page fault under load in /sbin/zfs
o kern/151845  fs         [smbfs] [patch] smbfs should be upgraded to support Un
o bin/151713   fs         [patch] Bug in growfs(8) with respect to 32-bit overfl
o kern/151648  fs         [zfs] disk wait bug
o kern/151629  fs         [fs] [patch] Skip empty directory entries during name 
o kern/151330  fs         [zfs] will unshare all zfs filesystem after execute a 
o kern/151326  fs         [nfs] nfs exports fail if netgroups contain duplicate 
o kern/151251  fs         [ufs] Can not create files on filesystem with heavy us
o kern/151226  fs         [zfs] can't delete zfs snapshot
o kern/151111  fs         [zfs] vnodes leakage during zfs unmount
o kern/151082  fs         [zfs] [patch] sappend-flaged files on ZFS not working 
o kern/150796  fs         [panic] [suj] [ufs] [softupdates] Panic on portbuild
o kern/150503  fs         [zfs] ZFS disks are UNAVAIL and corrupted after reboot
o kern/150501  fs         [zfs] ZFS vdev failure vdev.bad_label on amd64
o kern/150390  fs         [zfs] zfs deadlock when arcmsr reports drive faulted
o kern/150336  fs         [nfs] mountd/nfsd became confused; refused to reload n
o kern/150207  fs         zpool(1): zpool import -d /dev tries to open weird dev
o kern/149855  fs         [gvinum] growfs causes fsck to report errors in Filesy
o kern/149495  fs         [zfs] chflags sappend on zfs not working right
o kern/149208  fs         mksnap_ffs(8) hang/deadlock
o kern/149173  fs         [patch] [zfs] make OpenSolaris <sys/nvpair.h> installa
o kern/149022  fs         [hang] File system operations hangs with suspfs state
o kern/149015  fs         [zfs] [patch] misc fixes for ZFS code to build on Glib
o kern/149014  fs         [zfs] [patch] declarations in ZFS libraries/utilities 
o kern/149013  fs         [zfs] [patch] make ZFS makefiles use the libraries fro
o kern/148504  fs         [zfs] ZFS' zpool does not allow replacing drives to be
o kern/148490  fs         [zfs]: zpool attach - resilver bidirectionally, and re
o kern/148368  fs         [zfs] ZFS hanging forever on 8.1-PRERELEASE
o bin/148296   fs         [zfs] [loader] [patch] Very slow probe in /usr/src/sys
o kern/148204  fs         [nfs] UDP NFS causes overload
o kern/148138  fs         [zfs] zfs raidz pool commands freeze
o kern/147903  fs         [zfs] [panic] Kernel panics on faulty zfs device
o kern/147881  fs         [zfs] [patch] ZFS "sharenfs" doesn't allow different "
o kern/147790  fs         [zfs] zfs set acl(mode|inherit) fails on existing zfs
o kern/147560  fs         [zfs] [boot] Booting 8.1-PRERELEASE raidz system take 
o kern/147420  fs         [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt 
o kern/146941  fs         [zfs] [panic] Kernel Double Fault - Happens constantly
o kern/146786  fs         [zfs] zpool import hangs with checksum errors
o kern/146708  fs         [ufs] [panic] Kernel panic in softdep_disk_write_compl
o kern/146528  fs         [zfs] Severe memory leak in ZFS on i386
o kern/146502  fs         [nfs] FreeBSD 8 NFS Client Connection to Server
o kern/146375  fs         [nfs] [patch] Typos in macro variables names in sys/fs
s kern/145712  fs         [zfs] cannot offline two drives in a raidz2 configurat
o kern/145411  fs         [xfs] [panic] Kernel panics shortly after mounting an 
o bin/145309   fs         bsdlabel: Editing disk label invalidates the whole dev
o kern/145272  fs         [zfs] [panic] Panic during boot when accessing zfs on 
o kern/145246  fs         [ufs] dirhash in 7.3 gratuitously frees hashes when it
o kern/145238  fs         [zfs] [panic] kernel panic on zpool clear tank
o kern/145229  fs         [zfs] Vast differences in ZFS ARC behavior between 8.0
o kern/145189  fs         [nfs] nfsd performs abysmally under load
o kern/144929  fs         [ufs] [lor] vfs_bio.c + ufs_dirhash.c
o kern/144458  fs         [nfs] [patch] nfsd fails as a kld
p kern/144447  fs         [zfs] sharenfs fsunshare() & fsshare_main() non functi
o kern/144416  fs         [panic] Kernel panic on online filesystem optimization
s kern/144415  fs         [zfs] [panic] kernel panics on boot after zfs crash
o kern/144234  fs         [zfs] Cannot boot machine with recent gptzfsboot code 
o kern/143825  fs         [nfs] [panic] Kernel panic on NFS client
o bin/143572   fs         [zfs] zpool(1): [patch] The verbose output from iostat
o kern/143345  fs         [ext2fs] [patch] extfs minor header cleanups to better
o kern/143212  fs         [nfs] NFSv4 client strange work ...
o kern/143184  fs         [zfs] [lor] zfs/bufwait LOR
o kern/142924  fs         [ext2fs] [patch] Small cleanup for the inode struct in
o kern/142914  fs         [zfs] ZFS performance degradation over time
o kern/142878  fs         [zfs] [vfs] lock order reversal
o kern/142597  fs         [ext2fs] ext2fs does not work on filesystems with real
o kern/142489  fs         [zfs] [lor] allproc/zfs LOR
o kern/142466  fs         Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re
o kern/142401  fs         [ntfs] [patch] Minor updates to NTFS from NetBSD
o kern/142306  fs         [zfs] [panic] ZFS drive (from OSX Leopard) causes two 
o kern/142068  fs         [ufs] BSD labels are got deleted spontaneously
o kern/141897  fs         [msdosfs] [panic] Kernel panic. msdofs: file name leng
o kern/141463  fs         [nfs] [panic] Frequent kernel panics after upgrade fro
o kern/141305  fs         [zfs] FreeBSD ZFS+sendfile severe performance issues (
o kern/141091  fs         [patch] [nullfs] fix panics with DIAGNOSTIC enabled
o kern/141086  fs         [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS
o kern/141010  fs         [zfs] "zfs scrub" fails when backed by files in UFS2
o kern/140888  fs         [zfs] boot fail from zfs root while the pool resilveri
o kern/140661  fs         [zfs] [patch] /boot/loader fails to work on a GPT/ZFS-
o kern/140640  fs         [zfs] snapshot crash
o kern/140134  fs         [msdosfs] write and fsck destroy filesystem integrity
o kern/140068  fs         [smbfs] [patch] smbfs does not allow semicolon in file
o kern/139725  fs         [zfs] zdb(1) dumps core on i386 when examining zpool c
o kern/139715  fs         [zfs] vfs.numvnodes leak on busy zfs
o bin/139651   fs         [nfs] mount(8): read-only remount of NFS volume does n
o kern/139597  fs         [patch] [tmpfs] tmpfs initializes va_gen but doesn't u
o kern/139564  fs         [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo
o kern/139407  fs         [smbfs] [panic] smb mount causes system crash if remot
o kern/138790  fs         [zfs] ZFS ceases caching when mem demand is high
o kern/138662  fs         [panic] ffs_blkfree: freeing free block
o kern/138421  fs         [ufs] [patch] remove UFS label limitations
o kern/138202  fs         mount_msdosfs(1) see only 2Gb
o kern/136968  fs         [ufs] [lor] ufs/bufwait/ufs (open)
o kern/136945  fs         [ufs] [lor] filedesc structure/ufs (poll)
o kern/136944  fs         [ffs] [lor] bufwait/snaplk (fsync)
o kern/136873  fs         [ntfs] Missing directories/files on NTFS volume
o kern/136865  fs         [nfs] [patch] NFS exports atomic and on-the-fly atomic
o kern/136470  fs         [nfs] Cannot mount / in read-only, over NFS
o kern/135667  fs         [lor] LORs causing ufs filesystem corruption on XEN Do
o kern/135546  fs         [zfs] zfs.ko module doesn't ignore zpool.cache filenam
o kern/135469  fs         [ufs] [panic] kernel crash on md operation in ufs_dirb
o kern/135050  fs         [zfs] ZFS clears/hides disk errors on reboot
o kern/134491  fs         [zfs] Hot spares are rather cold...
o kern/133676  fs         [smbfs] [panic] umount -f'ing a vnode-based memory dis
o kern/133614  fs         [panic] panic: ffs_truncate: read-only filesystem
o kern/133174  fs         [msdosfs] [patch] msdosfs must support utf-encoded int
o kern/132960  fs         [ufs] [panic] panic:ffs_blkfree: freeing free frag
o kern/132397  fs         reboot causes filesystem corruption (failure to sync b
o kern/132331  fs         [ufs] [lor] LOR ufs and syncer
o kern/132237  fs         [msdosfs] msdosfs has problems to read MSDOS Floppy
o kern/132145  fs         [panic] File System Hard Crashes
o kern/131441  fs         [unionfs] [nullfs] unionfs and/or nullfs not combineab
o kern/131360  fs         [nfs] poor scaling behavior of the NFS server under lo
o kern/131342  fs         [nfs] mounting/unmounting of disks causes NFS to fail
o bin/131341   fs         makefs: error "Bad file descriptor"  on the mount poin
o kern/130920  fs         [msdosfs] cp(1) takes 100% CPU time while copying file
o kern/130210  fs         [nullfs] Error by check nullfs
o kern/129760  fs         [nfs] after 'umount -f' of a stale NFS share FreeBSD l
o kern/129488  fs         [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: 
o kern/129231  fs         [ufs] [patch] New UFS mount (norandom) option - mostly
o kern/129152  fs         [panic] non-userfriendly panic when trying to mount(8)
f kern/128829  fs         smbd(8) causes periodic panic on 7-RELEASE
o kern/127787  fs         [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs
o bin/127270   fs         fsck_msdosfs(8) may crash if BytesPerSec is zero
o kern/127029  fs         [panic] mount(8): trying to mount a write protected zi
o kern/126287  fs         [ufs] [panic] Kernel panics while mounting an UFS file
o kern/125895  fs         [ffs] [panic] kernel: panic: ffs_blkfree: freeing free
s kern/125738  fs         [zfs] [request] SHA256 acceleration in ZFS
p kern/124621  fs         [ext3] [patch] Cannot mount ext2fs partition
o kern/123939  fs         [msdosfs] corrupts new files
o kern/122380  fs         [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash
o bin/122172   fs         [fs]: amd(8) automount daemon dies on 6.3-STABLE i386,
o bin/121898   fs         [nullfs] pwd(1)/getcwd(2) fails with Permission denied
o bin/121779   fs         [ufs] snapinfo(8) (and related tools?) only work for t
o bin/121366   fs         [zfs] [patch] Automatic disk scrubbing from periodic(8
o bin/121072   fs         [smbfs] mount_smbfs(8) cannot normally convert the cha
f kern/120991  fs         [panic] [ffs] [snapshot] System crashes when manipulat
o kern/120483  fs         [ntfs] [patch] NTFS filesystem locking changes
o kern/120482  fs         [ntfs] [patch] Sync style changes between NetBSD and F
f kern/119735  fs         [zfs] geli + ZFS + samba starting on boot panics 7.0-B
o kern/118912  fs         [2tb] disk sizing/geometry problem with large array
o kern/118713  fs         [minidump] [patch] Display media size required for a k
o bin/118249   fs         [ufs] mv(1): moving a directory changes its mtime
o kern/118107  fs         [ntfs] [panic] Kernel panic when accessing a file at N
o kern/117954  fs         [ufs] dirhash on very large directories blocks the mac
o bin/117315   fs         [smbfs] mount_smbfs(8) and related options can't mount
o kern/117314  fs         [ntfs] Long-filename only NTFS fs'es cause kernel pani
o kern/117158  fs         [zfs] zpool scrub causes panic if geli vdevs detach on
o bin/116980   fs         [msdosfs] [patch] mount_msdosfs(8) resets some flags f
o conf/116931  fs         lack of fsck_cd9660 prevents mounting iso images with 
p kern/116608  fs         [msdosfs] [patch] msdosfs fails to check mount options
o kern/116583  fs         [ffs] [hang] System freezes for short time when using 
o kern/116170  fs         [panic] Kernel panic when mounting /tmp
f kern/115645  fs         [ffs] [snapshots] [panic] lockmgr: thread 0xc4c00d80, 
o bin/115361   fs         [zfs] mount(8) gets into a state where it won't set/un
o kern/114955  fs         [cd9660] [patch] [request] support for mask,dirmask,ui
o kern/114847  fs         [ntfs] [patch] [request] dirmask support for NTFS ala 
o kern/114676  fs         [ufs] snapshot creation panics: snapacct_ufs2: bad blo
o bin/114468   fs         [patch] [request] add -d option to umount(8) to detach
o kern/113852  fs         [smbfs] smbfs does not properly implement DFS referral
o bin/113838   fs         [patch] [request] mount(8): add support for relative p
o bin/113049   fs         [patch] [request] make quot(8) use getopt(3) and show 
o kern/112658  fs         [smbfs] [patch] smbfs and caching problems (resolves b
o kern/111843  fs         [msdosfs] Long Names of files are incorrectly created 
o kern/111782  fs         [ufs] dump(8) fails horribly for large filesystems
s bin/111146   fs         [2tb] fsck(8) fails on 6T filesystem
o kern/109024  fs         [msdosfs] [iconv] mount_msdosfs: msdosfs_iconv: Operat
o kern/109010  fs         [msdosfs] can't mv directory within fat32 file system
o bin/107829   fs         [2TB] fdisk(8): invalid boundary checking in fdisk / w
o kern/106107  fs         [ufs] left-over fsck_snapshot after unfinished backgro
o kern/106030  fs         [ufs] [panic] panic in ufs from geom when a dead disk 
o kern/104406  fs         [ufs] Processes get stuck in "ufs" state under persist
o kern/104133  fs         [ext2fs] EXT2FS module corrupts EXT2/3 filesystems
o kern/103035  fs         [ntfs] Directories in NTFS mounted disc images appear 
o kern/101324  fs         [smbfs] smbfs sometimes not case sensitive when it's s
o kern/99290   fs         [ntfs] mount_ntfs ignorant of cluster sizes
s bin/97498    fs         [request] newfs(8) has no option to clear the first 12
o kern/97377   fs         [ntfs] [patch] syntax cleanup for ntfs_ihash.c
o kern/95222   fs         [iso9660] File sections on ISO9660 level 3 CDs ignored
o kern/94849   fs         [ufs] rename on UFS filesystem is not atomic
o bin/94810    fs         fsck(8) incorrectly reports 'file system marked clean'
o kern/94769   fs         [ufs] Multiple file deletions on multi-snapshotted fil
o kern/94733   fs         [smbfs] smbfs may cause double unlock
o bin/94635    fs         snapinfo(8)/libufs only works for disk-backed filesyst
o kern/93942   fs         [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D
o kern/92272   fs         [ffs] [hang] Filling a filesystem while creating a sna
o kern/91134   fs         [smbfs] [patch] Preserve access and modification time 
a kern/90815   fs         [smbfs] [patch] SMBFS with character conversions somet
o kern/88657   fs         [smbfs] windows client hang when browsing a samba shar
o kern/88555   fs         [panic] ffs_blkfree: freeing free frag on AMD 64
o kern/88266   fs         [smbfs] smbfs does not implement UIO_NOCOPY and sendfi
o bin/87966    fs         [patch] newfs(8): introduce -A flag for newfs to enabl
o kern/87859   fs         [smbfs] System reboot while umount smbfs.
o kern/86587   fs         [msdosfs] rm -r /PATH fails with lots of small files
o bin/85494    fs         fsck_ffs: unchecked use of cg_inosused macro etc.
f kern/85326   fs         [smbfs] [panic] saving a file via samba to an overquot
o kern/80088   fs         [smbfs] Incorrect file time setting on NTFS mounted vi
o bin/74779    fs         Background-fsck checks one filesystem twice and omits 
o kern/73484   fs         [ntfs] Kernel panic when doing `ls` from the client si
o bin/73019    fs         [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino
o kern/71774   fs         [ntfs] NTFS cannot "see" files on a WinXP filesystem
o bin/70600    fs         fsck(8) throws files away when it can't grow lost+foun
o kern/68978   fs         [panic] [ufs] crashes with failing hard disk, loose po
o kern/65920   fs         [nwfs] Mounted Netware filesystem behaves strange
o kern/65901   fs         [smbfs] [patch] smbfs fails fsx write/truncate-down/tr
o kern/61503   fs         [smbfs] mount_smbfs does not work as non-root
o kern/55617   fs         [smbfs] Accessing an nsmb-mounted drive via a smb expo
o kern/51685   fs         [hang] Unbounded inode allocation causes kernel to loc
o kern/51583   fs         [nullfs] [patch] allow to work with devices and socket
o kern/36566   fs         [smbfs] System reboot with dead smb mount and umount
o kern/33464   fs         [ufs] soft update inconsistencies after system crash
o bin/27687    fs         fsck(8) wrapper is not properly passing options to fsc
o kern/18874   fs         [2TB] 32bit NFS servers export wrong negative values t

215 problems total.


From owner-freebsd-fs@FreeBSD.ORG  Mon Nov 15 22:25:38 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E3B22106566B
	for <freebsd-fs@freebsd.org>; Mon, 15 Nov 2010 22:25:38 +0000 (UTC)
	(envelope-from michaelscotttech@gmail.com)
Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com
	[209.85.216.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 96C768FC16
	for <freebsd-fs@freebsd.org>; Mon, 15 Nov 2010 22:25:38 +0000 (UTC)
Received: by qwd6 with SMTP id 6so52843qwd.13
	for <freebsd-fs@freebsd.org>; Mon, 15 Nov 2010 14:25:37 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:message-id:from:to
	:content-type:mime-version:subject:date:x-mailer;
	bh=iIi2wTMqXEIrQcKIVt4sQ2znV5rTracQHS9PlXA1ojM=;
	b=sR716/goBjz8cRO/j4yJFDjWxnlEoig5BrVstH+HsjE8OEYDB+vI4aUtUtsgynY7rl
	6JzoNeYXPWYjhMPai+4kihGtKlGQMpLKd5Y5n/EVfzp5jvF5DYdtLZUsLIxFq92HSWxC
	flCi5Hu44F0uuEbdMCYUedj2KeaqnZQYmXo9w=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=message-id:from:to:content-type:mime-version:subject:date:x-mailer;
	b=e5J66S3eQkGyika9gj+G/dbStrTJZfuvl33NgC7MWMuN/380a+LZqefnUjCu43DqJY
	b9Kp9LaI/6m+0hpadhEoyX9me5TlXNl3fdoW8QqEpSqYrcsiIikuyJL30UU5zHAb66dx
	AoMXNln/I2hxe1IniFOuUT1bOnzRoELTc0c/4=
Received: by 10.229.214.5 with SMTP id gy5mr5206401qcb.245.1289858613176;
	Mon, 15 Nov 2010 14:03:33 -0800 (PST)
Received: from msb.datacomp-intranet.com
	(h69-130-231-62.mdsnwi.tisp.static.tds.net [69.130.231.62])
	by mx.google.com with ESMTPS id n7sm281919qcu.16.2010.11.15.14.03.32
	(version=TLSv1/SSLv3 cipher=RC4-MD5);
	Mon, 15 Nov 2010 14:03:32 -0800 (PST)
Message-Id: <25DC6C26-52FB-447A-AEB0-8549DA8F53E7@gmail.com>
From: Michael Boers <michaelscotttech@gmail.com>
To: freebsd-fs@freebsd.org
Mime-Version: 1.0 (Apple Message framework v936)
Date: Mon, 15 Nov 2010 17:03:30 -0500
X-Mailer: Apple Mail (2.936)
Content-Type: text/plain;
	charset=US-ASCII;
	format=flowed;
	delsp=yes
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: zfs mirror recognizing disk failures
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Nov 2010 22:25:39 -0000

Is there anything I can do to make a zfs mirror quicker to give up on  
a flaky disk?

I recently had a 100% zfs system crash when started to have some disk  
errors.  I had hoped that by having a mirror, the system would survive  
this type of error.  Instead it just hung.

Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): SYNCHRONIZE  
CACHE(10). CDB: 35 0 0 0 0 0 0 0 0 0
Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): CAM Status: SCSI  
Status Error
Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): SCSI Status: Check  
Condition
Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): ABORTED COMMAND asc: 
0,0
Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): No additional sense  
information
Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): Retries Exhausted
Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003c87a0:2838  
timed out for ccb 0xffffff0103acc000 (req->ccb 0xffffff0103acc000)
Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003c5110:2839  
timed out for ccb 0xffffff035cab0800 (req->ccb 0xffffff035cab0800)
Nov 11 10:05:53 caprica kernel: mpt0: attempting to abort req  
0xffffff80003c87a0:2838 function 0
Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003bef30:2840  
timed out for ccb 0xffffff0007986800 (req->ccb 0xffffff0007986800)
Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003c8560:2841  
timed out for ccb 0xffffff032d985000 (req->ccb 0xffffff032d985000)
Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003bf320:2842  
timed out for ccb 0xffffff0103af2000 (req->ccb 0xffffff0103af2000)
Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003cbda0:2843  
timed out for ccb 0xffffff0103b0b000 (req->ccb 0xffffff0103b0b000)
Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003bfd40:2844  
timed out for ccb 0xffffff00102bf800 (req->ccb 0xffffff00102bf800)
Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003cad50:2845  
timed out for ccb 0xffffff01e6f33000 (req->ccb 0xffffff01e6f33000)
Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003caf00:2846  
timed out for ccb 0xffffff01e6f24800 (req->ccb 0xffffff01e6f24800)
Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003ccd60:2847  
timed out for ccb 0xffffff01308a4000 (req->ccb 0xffffff01308a4000)

Is this a type of error zfs can survive or do I need a hardware mirror  
to handle this type of problem?

Thanks,


From owner-freebsd-fs@FreeBSD.ORG  Tue Nov 16 06:26:37 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EA2CF1065693;
	Tue, 16 Nov 2010 06:26:37 +0000 (UTC) (envelope-from TERRY@tmk.com)
Received: from server.tmk.com (server.tmk.com [204.141.35.63])
	by mx1.freebsd.org (Postfix) with ESMTP id C6C108FC12;
	Tue, 16 Nov 2010 06:26:37 +0000 (UTC)
Received: from tmk.com by tmk.com (PMDF V6.4 #37010)
	id <01NUB1C0N25S00BNN4@tmk.com>; Tue, 16 Nov 2010 01:26:35 -0500 (EST)
Date: Tue, 16 Nov 2010 01:24:37 -0500 (EST)
From: Terry Kennedy <TERRY@tmk.com>
To: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org
Message-id: <01NUB1F8POL000BNN4@tmk.com>
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=us-ascii
Cc: 
Subject: Re: ZFS panic after replacing log device
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Nov 2010 06:26:38 -0000

> I can give a developer remote console / root access to the box if that would 
> help. I have a couple days before I will need to nuke the pool and restore it 
> from backups. 

I haven't heard from anyone that wants to look into this. I need to get the 
pool back into service soon. If I don't get any requests to postpone or offers 
to investigate by 00:00 GMT on the 18th, I'll proceed with re-initializing the 
pool (minus the SSD, which is persona non grata). 

        Terry Kennedy             http://www.tmk.com
        terry@tmk.com             New York, NY USA

From owner-freebsd-fs@FreeBSD.ORG  Tue Nov 16 07:26:40 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 65D031065670;
	Tue, 16 Nov 2010 07:26:40 +0000 (UTC) (envelope-from TERRY@tmk.com)
Received: from server.tmk.com (server.tmk.com [204.141.35.63])
	by mx1.freebsd.org (Postfix) with ESMTP id 3F12E8FC1C;
	Tue, 16 Nov 2010 07:26:40 +0000 (UTC)
Received: from tmk.com by tmk.com (PMDF V6.4 #37010)
	id <01NUB2MU0E8000BNN4@tmk.com>; Tue, 16 Nov 2010 02:26:37 -0500 (EST)
Date: Tue, 16 Nov 2010 02:01:58 -0500 (EST)
From: Terry Kennedy <TERRY@tmk.com>
In-reply-to: "Your message dated Mon, 15 Nov 2010 22:55:11 -0800"
	<E7621997-3485-43A2-A2EE-A11574054FF6@deman.com>
To: Michael DeMan <freebsd@deman.com>
Message-id: <01NUB3IOMZJW00BNN4@tmk.com>
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=us-ascii
References: <01NUB1F8POL000BNN4@tmk.com>
Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org
Subject: Re: ZFS panic after replacing log device
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Nov 2010 07:26:40 -0000

> I am no ZFS kernel-code dude or anything, but it is well known that losing
> the ZIL can corrupt things pretty bad with ZFS.

  First, thanks for writing back!

  I agree that this could be the problem. As I mentioned in my original post,
I followed the steps recommended by "zpool status" - clearing the device and
then doing a replace. The fix may be as simple as testing for whether the de-
vice in question is a log device and if so, erroring out with "You can't do
that".

  Also note that multiple scrubs pass with no errors detected - it is only
writes that trigger the panic. It looks like something isn't being cleaned
up in the clear / replace path.

  I would save a crash dump for people to look at, but unfortunately the
last time a crash dump actually worked for me (on dozens of systems) was
back in the FreeBSD 6.2 days.

  There wasn't any data corruption (the filesystem was not being written at
the time the log device failed) - I have my own checksum files written by
the sysutils/cfv port, and the data all matches.

> All in all, if I was in your situation I would give a whirl at installing
> OpenSolaris and going from there, being sure not to upgrade the pool vers-
> ion past what is supported by FreeBSD and going from there.

  I have the data on another server (see my prior "snapshots are not back-
ups" discussion on freebsd-stable if interested). So, fortunately, this is
not a case of data recovery.

> Unfortunately we all find ourselves in a bit of a pickle with ZFS right 
> now with the Oracle acquisition of Sun.  For myself, I would stick with 
> deploying on FreeBSD but I think its going to be FBSD 9.1 before its go-
> ing to be truly ready for production.

  The problem with hardware on the leading edge is that the software often
needs time to catch up. In this particular case, the ZFS pool is 32TB. I
can't begin to imagine how long a UFS fsck would take on such a partition,
even if it were possible to create one. It was bad enough on the previous
generation of my servers (2TB UFS partitions).

        Terry Kennedy             http://www.tmk.com
        terry@tmk.com             New York, NY USA

From owner-freebsd-fs@FreeBSD.ORG  Tue Nov 16 07:34:13 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 42E371065670;
	Tue, 16 Nov 2010 07:34:13 +0000 (UTC)
	(envelope-from freebsd@deman.com)
Received: from cp11.openaccess.org (cp11.openaccess.org [66.114.41.130])
	by mx1.freebsd.org (Postfix) with ESMTP id 24F4A8FC1C;
	Tue, 16 Nov 2010 07:34:12 +0000 (UTC)
Received: from mono-sis1.s.bli.openaccess.org ([66.114.32.149]
	helo=[192.168.2.248])
	by cp11.openaccess.org with esmtpsa (TLSv1:AES128-SHA:128)
	(Exim 4.69) (envelope-from <freebsd@deman.com>)
	id 1PIFRn-00084n-1Q; Mon, 15 Nov 2010 22:55:19 -0800
Mime-Version: 1.0 (Apple Message framework v1082)
Content-Type: text/plain; charset=us-ascii
From: Michael DeMan <freebsd@deman.com>
In-Reply-To: <01NUB1F8POL000BNN4@tmk.com>
Date: Mon, 15 Nov 2010 22:55:11 -0800
Content-Transfer-Encoding: quoted-printable
Message-Id: <E7621997-3485-43A2-A2EE-A11574054FF6@deman.com>
References: <01NUB1F8POL000BNN4@tmk.com>
To: Terry Kennedy <TERRY@tmk.com>
X-Mailer: Apple Mail (2.1082)
X-AntiAbuse: This header was added to track abuse,
	please include it with any abuse report
X-AntiAbuse: Primary Hostname - cp11.openaccess.org
X-AntiAbuse: Original Domain - freebsd.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - deman.com
X-Source: 
X-Source-Args: 
X-Source-Dir: 
Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org
Subject: Re: ZFS panic after replacing log device
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Nov 2010 07:34:13 -0000

Hi Terry,

I am no ZFS kernel-code dude or anything, but it is well known that =
losing the ZIL can corrupt things pretty bad with ZFS.

You may want to skim the archives at OpenSolaris ZFS discuss =
<zfs-discuss@opensolaris.org>

All in all, if I was in your situation I would give a whirl at =
installing OpenSolaris and going from there, being sure not to upgrade =
the pool version past what is supported by FreeBSD and going from there.

Unfortunately we all find ourselves in a bit of a pickle with ZFS right =
now with the Oracle acquisition of Sun.  For myself, I would stick with =
deploying on FreeBSD but I think its going to be FBSD 9.1 before its =
going to be truly ready for production.

Just my 2-cents.

- Mike


On Nov 15, 2010, at 10:24 PM, Terry Kennedy wrote:

>> I can give a developer remote console / root access to the box if =
that would=20
>> help. I have a couple days before I will need to nuke the pool and =
restore it=20
>> from backups.=20
>=20
> I haven't heard from anyone that wants to look into this. I need to =
get the=20
> pool back into service soon. If I don't get any requests to postpone =
or offers=20
> to investigate by 00:00 GMT on the 18th, I'll proceed with =
re-initializing the=20
> pool (minus the SSD, which is persona non grata).=20
>=20
>        Terry Kennedy             http://www.tmk.com
>        terry@tmk.com             New York, NY USA
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"


From owner-freebsd-fs@FreeBSD.ORG  Tue Nov 16 07:48:27 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B200C1065670;
	Tue, 16 Nov 2010 07:48:27 +0000 (UTC)
	(envelope-from freebsd@deman.com)
Received: from cp11.openaccess.org (cp11.openaccess.org [66.114.41.130])
	by mx1.freebsd.org (Postfix) with ESMTP id 944A98FC15;
	Tue, 16 Nov 2010 07:48:27 +0000 (UTC)
Received: from mono-sis1.s.bli.openaccess.org ([66.114.32.149]
	helo=[192.168.2.248])
	by cp11.openaccess.org with esmtpsa (TLSv1:AES128-SHA:128)
	(Exim 4.69) (envelope-from <freebsd@deman.com>)
	id 1PIGHK-0002Sj-9G; Mon, 15 Nov 2010 23:48:34 -0800
Mime-Version: 1.0 (Apple Message framework v1082)
Content-Type: text/plain; charset=us-ascii
From: Michael DeMan <freebsd@deman.com>
In-Reply-To: <01NUB3IOMZJW00BNN4@tmk.com>
Date: Mon, 15 Nov 2010 23:48:26 -0800
Content-Transfer-Encoding: quoted-printable
Message-Id: <816D59CD-1BAF-4331-BEAD-67CEADCE4EF9@deman.com>
References: <01NUB1F8POL000BNN4@tmk.com> <01NUB3IOMZJW00BNN4@tmk.com>
To: Terry Kennedy <TERRY@tmk.com>
X-Mailer: Apple Mail (2.1082)
X-AntiAbuse: This header was added to track abuse,
	please include it with any abuse report
X-AntiAbuse: Primary Hostname - cp11.openaccess.org
X-AntiAbuse: Original Domain - freebsd.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - deman.com
X-Source: 
X-Source-Args: 
X-Source-Dir: 
Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org
Subject: Re: ZFS panic after replacing log device
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Nov 2010 07:48:27 -0000

Hi, sorry for not completely digesting your original post.

I would say it is definitely very odd that writes are a problem.  Sounds =
like it might be a hardware problem.  Is it possible to export the pool, =
remove the ZIL and re-import it?  I myself would be pretty nervous =
trying that, but it would help isolate the problem?  If you can risk it.



On Nov 15, 2010, at 11:01 PM, Terry Kennedy wrote:

> Also note that multiple scrubs pass with no errors detected - it is =
only
> writes that trigger the panic. It looks like something isn't being =
cleaned
> up in the clear / replace path.


From owner-freebsd-fs@FreeBSD.ORG  Tue Nov 16 08:47:34 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 077BF106566C
	for <freebsd-fs@freebsd.org>; Tue, 16 Nov 2010 08:47:34 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta05.emeryville.ca.mail.comcast.net
	(qmta05.emeryville.ca.mail.comcast.net [76.96.30.48])
	by mx1.freebsd.org (Postfix) with ESMTP id E1AAB8FC08
	for <freebsd-fs@freebsd.org>; Tue, 16 Nov 2010 08:47:33 +0000 (UTC)
Received: from omta10.emeryville.ca.mail.comcast.net ([76.96.30.28])
	by qmta05.emeryville.ca.mail.comcast.net with comcast
	id Xkgk1f0020cQ2SLA5knZBa; Tue, 16 Nov 2010 08:47:33 +0000
Received: from koitsu.dyndns.org ([98.248.41.155])
	by omta10.emeryville.ca.mail.comcast.net with comcast
	id XknY1f0043LrwQ28WknYBN; Tue, 16 Nov 2010 08:47:33 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 704DB9B427; Tue, 16 Nov 2010 00:47:32 -0800 (PST)
Date: Tue, 16 Nov 2010 00:47:32 -0800
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Michael Boers <michaelscotttech@gmail.com>
Message-ID: <20101116084732.GA85887@icarus.home.lan>
References: <25DC6C26-52FB-447A-AEB0-8549DA8F53E7@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <25DC6C26-52FB-447A-AEB0-8549DA8F53E7@gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: zfs mirror recognizing disk failures
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Nov 2010 08:47:34 -0000

On Mon, Nov 15, 2010 at 05:03:30PM -0500, Michael Boers wrote:
> Is there anything I can do to make a zfs mirror quicker to give up
> on a flaky disk?
> 
> I recently had a 100% zfs system crash when started to have some
> disk errors.  I had hoped that by having a mirror, the system would
> survive this type of error.  Instead it just hung.
> 
> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): SYNCHRONIZE
> CACHE(10). CDB: 35 0 0 0 0 0 0 0 0 0
> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): CAM Status: SCSI
> Status Error
> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): SCSI Status: Check
> Condition
> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): ABORTED COMMAND
> asc:0,0
> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): No additional
> sense information
> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): Retries Exhausted
> Nov 11 10:05:53 caprica kernel: mpt0: request
> 0xffffff80003c87a0:2838 timed out for ccb 0xffffff0103acc000
> (req->ccb 0xffffff0103acc000)
> Nov 11 10:05:53 caprica kernel: mpt0: request
> 0xffffff80003c5110:2839 timed out for ccb 0xffffff035cab0800
> (req->ccb 0xffffff035cab0800)
> Nov 11 10:05:53 caprica kernel: mpt0: attempting to abort req
> 0xffffff80003c87a0:2838 function 0
> Nov 11 10:05:53 caprica kernel: mpt0: request
> 0xffffff80003bef30:2840 timed out for ccb 0xffffff0007986800
> (req->ccb 0xffffff0007986800)
> Nov 11 10:05:53 caprica kernel: mpt0: request
> 0xffffff80003c8560:2841 timed out for ccb 0xffffff032d985000
> (req->ccb 0xffffff032d985000)
> Nov 11 10:05:53 caprica kernel: mpt0: request
> 0xffffff80003bf320:2842 timed out for ccb 0xffffff0103af2000
> (req->ccb 0xffffff0103af2000)
> Nov 11 10:05:53 caprica kernel: mpt0: request
> 0xffffff80003cbda0:2843 timed out for ccb 0xffffff0103b0b000
> (req->ccb 0xffffff0103b0b000)
> Nov 11 10:05:53 caprica kernel: mpt0: request
> 0xffffff80003bfd40:2844 timed out for ccb 0xffffff00102bf800
> (req->ccb 0xffffff00102bf800)
> Nov 11 10:05:53 caprica kernel: mpt0: request
> 0xffffff80003cad50:2845 timed out for ccb 0xffffff01e6f33000
> (req->ccb 0xffffff01e6f33000)
> Nov 11 10:05:53 caprica kernel: mpt0: request
> 0xffffff80003caf00:2846 timed out for ccb 0xffffff01e6f24800
> (req->ccb 0xffffff01e6f24800)
> Nov 11 10:05:53 caprica kernel: mpt0: request
> 0xffffff80003ccd60:2847 timed out for ccb 0xffffff01308a4000
> (req->ccb 0xffffff01308a4000)
> 
> Is this a type of error zfs can survive or do I need a hardware
> mirror to handle this type of problem?

This looks to me like a problem/quirk with mpt(4) and not ZFS.  What
happened after this point?  Didn't the mpt driver drop the disk off the
bus (in CAM)?  ZFS would notice that when it happens.  So, I think this
looks like a problem with either the mpt cards or the driver.

What I'm stating: ZFS shouldn't be responsible for "figuring out if
communication with the disk is messed up" -- that's the job of the
storage controller and the storage controller driver.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Tue Nov 16 10:24:50 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8045F106566C
	for <freebsd-fs@freebsd.org>; Tue, 16 Nov 2010 10:24:50 +0000 (UTC)
	(envelope-from olivier@gid0.org)
Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com
	[209.85.214.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 538F68FC08
	for <freebsd-fs@freebsd.org>; Tue, 16 Nov 2010 10:24:50 +0000 (UTC)
Received: by iwn39 with SMTP id 39so649910iwn.13
	for <freebsd-fs@freebsd.org>; Tue, 16 Nov 2010 02:24:49 -0800 (PST)
MIME-Version: 1.0
Received: by 10.231.19.74 with SMTP id z10mr5175806iba.120.1289903088673; Tue,
	16 Nov 2010 02:24:48 -0800 (PST)
Received: by 10.231.172.202 with HTTP; Tue, 16 Nov 2010 02:24:48 -0800 (PST)
In-Reply-To: <25DC6C26-52FB-447A-AEB0-8549DA8F53E7@gmail.com>
References: <25DC6C26-52FB-447A-AEB0-8549DA8F53E7@gmail.com>
Date: Tue, 16 Nov 2010 11:24:48 +0100
Message-ID: <AANLkTi=mqgjj+dWVvZKmUcZWPtZSF2wA=upYy+1dEhRe@mail.gmail.com>
From: Olivier Smedts <olivier@gid0.org>
To: Michael Boers <michaelscotttech@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org
Subject: Re: zfs mirror recognizing disk failures
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Nov 2010 10:24:50 -0000

2010/11/15 Michael Boers <michaelscotttech@gmail.com>:
> Is there anything I can do to make a zfs mirror quicker to give up on a
> flaky disk?
>
> I recently had a 100% zfs system crash when started to have some disk
> errors. =A0I had hoped that by having a mirror, the system would survive =
this
> type of error. =A0Instead it just hung.

You can offline the faulty drive.
Also, I think you're interested in a feature like TLER :
http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery
But typical (cheap) drives don't implement it.

>
> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): SYNCHRONIZE CACHE(10).
> CDB: 35 0 0 0 0 0 0 0 0 0
> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): CAM Status: SCSI Status
> Error
> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): SCSI Status: Check
> Condition
> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): ABORTED COMMAND asc:0,0
> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): No additional sense
> information
> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): Retries Exhausted
> Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003c87a0:2838 tim=
ed
> out for ccb 0xffffff0103acc000 (req->ccb 0xffffff0103acc000)
> Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003c5110:2839 tim=
ed
> out for ccb 0xffffff035cab0800 (req->ccb 0xffffff035cab0800)
> Nov 11 10:05:53 caprica kernel: mpt0: attempting to abort req
> 0xffffff80003c87a0:2838 function 0
> Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003bef30:2840 tim=
ed
> out for ccb 0xffffff0007986800 (req->ccb 0xffffff0007986800)
> Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003c8560:2841 tim=
ed
> out for ccb 0xffffff032d985000 (req->ccb 0xffffff032d985000)
> Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003bf320:2842 tim=
ed
> out for ccb 0xffffff0103af2000 (req->ccb 0xffffff0103af2000)
> Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003cbda0:2843 tim=
ed
> out for ccb 0xffffff0103b0b000 (req->ccb 0xffffff0103b0b000)
> Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003bfd40:2844 tim=
ed
> out for ccb 0xffffff00102bf800 (req->ccb 0xffffff00102bf800)
> Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003cad50:2845 tim=
ed
> out for ccb 0xffffff01e6f33000 (req->ccb 0xffffff01e6f33000)
> Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003caf00:2846 tim=
ed
> out for ccb 0xffffff01e6f24800 (req->ccb 0xffffff01e6f24800)
> Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003ccd60:2847 tim=
ed
> out for ccb 0xffffff01308a4000 (req->ccb 0xffffff01308a4000)
>
> Is this a type of error zfs can survive or do I need a hardware mirror to
> handle this type of problem?
>
> Thanks,
>
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>



--=20
Olivier Smedts=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=A0 _
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 ASCII ribbon campaign ( )
e-mail: olivier@gid0.org=A0 =A0 =A0 =A0 - against HTML email & vCards=A0 X
www: http://www.gid0.org=A0 =A0 - against proprietary attachments / \

=A0 "Il y a seulement 10 sortes de gens dans le monde :
=A0 ceux qui comprennent le binaire,
=A0 et ceux qui ne le comprennent pas."

From owner-freebsd-fs@FreeBSD.ORG  Tue Nov 16 10:37:55 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C6A811065672
	for <freebsd-fs@freebsd.org>; Tue, 16 Nov 2010 10:37:55 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta06.westchester.pa.mail.comcast.net
	(qmta06.westchester.pa.mail.comcast.net [76.96.62.56])
	by mx1.freebsd.org (Postfix) with ESMTP id 710838FC1A
	for <freebsd-fs@freebsd.org>; Tue, 16 Nov 2010 10:37:55 +0000 (UTC)
Received: from omta24.westchester.pa.mail.comcast.net ([76.96.62.76])
	by qmta06.westchester.pa.mail.comcast.net with comcast
	id Xmcw1f0011ei1Bg56mdvWM; Tue, 16 Nov 2010 10:37:55 +0000
Received: from koitsu.dyndns.org ([98.248.41.155])
	by omta24.westchester.pa.mail.comcast.net with comcast
	id Xmdu1f0013LrwQ23kmduEi; Tue, 16 Nov 2010 10:37:55 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 03F739B427; Tue, 16 Nov 2010 02:37:53 -0800 (PST)
Date: Tue, 16 Nov 2010 02:37:53 -0800
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Olivier Smedts <olivier@gid0.org>
Message-ID: <20101116103752.GA87642@icarus.home.lan>
References: <25DC6C26-52FB-447A-AEB0-8549DA8F53E7@gmail.com>
	<AANLkTi=mqgjj+dWVvZKmUcZWPtZSF2wA=upYy+1dEhRe@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <AANLkTi=mqgjj+dWVvZKmUcZWPtZSF2wA=upYy+1dEhRe@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org, Michael Boers <michaelscotttech@gmail.com>
Subject: Re: zfs mirror recognizing disk failures
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Nov 2010 10:37:55 -0000

On Tue, Nov 16, 2010 at 11:24:48AM +0100, Olivier Smedts wrote:
> 2010/11/15 Michael Boers <michaelscotttech@gmail.com>:
> > Is there anything I can do to make a zfs mirror quicker to give up on a
> > flaky disk?
> >
> > I recently had a 100% zfs system crash when started to have some disk
> > errors. �I had hoped that by having a mirror, the system would survive this
> > type of error. �Instead it just hung.
> 
> You can offline the faulty drive.
> Also, I think you're interested in a feature like TLER :
> http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery
> But typical (cheap) drives don't implement it.

TLER wouldn't have helped this problem.  TLER will cause the drive to
internally "time out" the request submit from the controller.  If you
read the below output, it appears that the CDB submit to the drive was
intentionally aborted, and retries were exhausted.  Continued command
submissions to mpt0 kept timing out.

There's absolutely nothing (that I'm aware of) that TLER provides which
will cause the drive to "disconnect itself from the bus".  Furthermore,
since TLER is on a per-command basis, there's no guarantee that repeated
commands send from the controller to the disk won't continue to
witnessed problems.  Just because TLER times out the command quicker
that the OS driver doesn't mean the drive will suddenly become usable.

So we're back to the original question, which is why mpt(4) didn't
choose to drop the SCSI drive from the LUN or bus, given the repetitive
nature of the failure and mpt's own internal timeouts getting reached.

And to answer the OP's original question: "is this a type of error zfs
can survive or do I need a hardware mirror to handle this type of
problem?", the answer is yes, ZFS can survive this situation perfectly
fine, but ZFS is at the whim of the storage controller and controller
driver you choose to use.  It's not the job of the filesystem to tell
the storage controller "I hate this disk, get rid of it".

> > Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): SYNCHRONIZE CACHE(10).
> > CDB: 35 0 0 0 0 0 0 0 0 0
> > Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): CAM Status: SCSI Status
> > Error
> > Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): SCSI Status: Check
> > Condition
> > Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): ABORTED COMMAND asc:0,0
> > Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): No additional sense
> > information
> > Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): Retries Exhausted
> > Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003c87a0:2838 timed
> > out for ccb 0xffffff0103acc000 (req->ccb 0xffffff0103acc000)
> > Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003c5110:2839 timed
> > out for ccb 0xffffff035cab0800 (req->ccb 0xffffff035cab0800)
> > Nov 11 10:05:53 caprica kernel: mpt0: attempting to abort req
> > 0xffffff80003c87a0:2838 function 0
> > Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003bef30:2840 timed
> > out for ccb 0xffffff0007986800 (req->ccb 0xffffff0007986800)
> > Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003c8560:2841 timed
> > out for ccb 0xffffff032d985000 (req->ccb 0xffffff032d985000)
> > Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003bf320:2842 timed
> > out for ccb 0xffffff0103af2000 (req->ccb 0xffffff0103af2000)
> > Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003cbda0:2843 timed
> > out for ccb 0xffffff0103b0b000 (req->ccb 0xffffff0103b0b000)
> > Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003bfd40:2844 timed
> > out for ccb 0xffffff00102bf800 (req->ccb 0xffffff00102bf800)
> > Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003cad50:2845 timed
> > out for ccb 0xffffff01e6f33000 (req->ccb 0xffffff01e6f33000)
> > Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003caf00:2846 timed
> > out for ccb 0xffffff01e6f24800 (req->ccb 0xffffff01e6f24800)
> > Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003ccd60:2847 timed
> > out for ccb 0xffffff01308a4000 (req->ccb 0xffffff01308a4000)
> >
> > Is this a type of error zfs can survive or do I need a hardware mirror to
> > handle this type of problem?

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Tue Nov 16 13:32:39 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 386D71065670
	for <freebsd-fs@freebsd.org>; Tue, 16 Nov 2010 13:32:39 +0000 (UTC)
	(envelope-from michaelscotttech@gmail.com)
Received: from mail-qy0-f182.google.com (mail-qy0-f182.google.com
	[209.85.216.182])
	by mx1.freebsd.org (Postfix) with ESMTP id D69938FC08
	for <freebsd-fs@freebsd.org>; Tue, 16 Nov 2010 13:32:38 +0000 (UTC)
Received: by qyk7 with SMTP id 7so721798qyk.13
	for <freebsd-fs@freebsd.org>; Tue, 16 Nov 2010 05:32:38 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:cc:message-id:from:to
	:in-reply-to:content-type:content-transfer-encoding:mime-version
	:subject:date:references:x-mailer;
	bh=tpjWkbiqoxvIjs6tnOuVD8Y1A3LXZHUhCTgCIzNg3fA=;
	b=j8fTY+HM0HbLRQsnQ4Gaj2DmcjDQxrXvs4TgqOPg/QfskzG3HTKQg9F3KzHpm64MIM
	O5NGWAIBeN4JjuOGzCcAYNCLuOcaW5dkNp1u7LFERNgBukyfY0jWPEZWlIBOPbjw4vNX
	DdF0Gj3vKpfv5Ys/tnwpzsOprFySqd5jF43uU=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=cc:message-id:from:to:in-reply-to:content-type
	:content-transfer-encoding:mime-version:subject:date:references
	:x-mailer;
	b=gQ6Vazz3XuQ/tT+t7OdnDrkzpzheU90jIDoob5q8GZV0iErhWJAHE6sQ4XQ0nTJKnN
	zKn7a2P9hYKA1vw9eMsZ8+xLIDsyM18tA6XKJUMXsi6W/3xCSYA+YgH3BMQ3gZ2lhJUV
	Tx722BF3Hb0FbFVmYbD6ui/xwy7BPRkwN6Qa0=
Received: by 10.224.28.85 with SMTP id l21mr3025141qac.188.1289914358181;
	Tue, 16 Nov 2010 05:32:38 -0800 (PST)
Received: from msb.datacomp-intranet.com
	(h69-130-231-62.mdsnwi.tisp.static.tds.net [69.130.231.62])
	by mx.google.com with ESMTPS id mz11sm743067qcb.39.2010.11.16.05.32.36
	(version=TLSv1/SSLv3 cipher=RC4-MD5);
	Tue, 16 Nov 2010 05:32:37 -0800 (PST)
Message-Id: <441E3529-6178-404E-8A2D-2CF9BBC4170C@gmail.com>
From: Michael Boers <michaelscotttech@gmail.com>
To: freebsd-fs@freebsd.org
In-Reply-To: <AANLkTi=mqgjj+dWVvZKmUcZWPtZSF2wA=upYy+1dEhRe@mail.gmail.com>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v936)
Date: Tue, 16 Nov 2010 08:32:35 -0500
References: <25DC6C26-52FB-447A-AEB0-8549DA8F53E7@gmail.com>
	<AANLkTi=mqgjj+dWVvZKmUcZWPtZSF2wA=upYy+1dEhRe@mail.gmail.com>
X-Mailer: Apple Mail (2.936)
Cc: 
Subject: Re: zfs mirror recognizing disk failures
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Nov 2010 13:32:39 -0000


On Nov 16, 2010, at 5:24 AM, Olivier Smedts wrote:

> 2010/11/15 Michael Boers <michaelscotttech@gmail.com>:
>> Is there anything I can do to make a zfs mirror quicker to give up  
>> on a
>> flaky disk?
>>
>> I recently had a 100% zfs system crash when started to have some disk
>> errors.  I had hoped that by having a mirror, the system would  
>> survive this
>> type of error.  Instead it just hung.
>
> You can offline the faulty drive.
> Also, I think you're interested in a feature like TLER :
> http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery
> But typical (cheap) drives don't implement it.


Unfortunately, I was not able to offline the drive.  I was not able to  
gain access to the machine.  It responded to pings and since it is a  
CARP master, it was still broadcasting its "masterness", but any  
attempt to ssh into the machine failed.  It is my guess that anything  
disk related was blocked behind the problem.

To answer Jermey's question of "what happened next?"

The machine was not serving web requests
The machine was not responsive via ssh
The machine was pingable

after waiting about 15 minutes, I used the ipmi protocol to power down  
the machine.
When it came back up, I found the enclosed errors in the log.

If I am following your comments correctly, the fault for this lies in  
the mpt system not giving up which could either be a driver or a  
firmware issue.  Is that correct?

How do I protect against that?


>
>>
>> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): SYNCHRONIZE  
>> CACHE(10).
>> CDB: 35 0 0 0 0 0 0 0 0 0
>> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): CAM Status: SCSI  
>> Status
>> Error
>> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): SCSI Status: Check
>> Condition
>> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): ABORTED COMMAND  
>> asc:0,0
>> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): No additional sense
>> information
>> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): Retries Exhausted
>> Nov 11 10:05:53 caprica kernel: mpt0: request  
>> 0xffffff80003c87a0:2838 timed
>> out for ccb 0xffffff0103acc000 (req->ccb 0xffffff0103acc000)
>> Nov 11 10:05:53 caprica kernel: mpt0: request  
>> 0xffffff80003c5110:2839 timed
>> out for ccb 0xffffff035cab0800 (req->ccb 0xffffff035cab0800)
>> Nov 11 10:05:53 caprica kernel: mpt0: attempting to abort req
>> 0xffffff80003c87a0:2838 function 0
>> Nov 11 10:05:53 caprica kernel: mpt0: request  
>> 0xffffff80003bef30:2840 timed
>> out for ccb 0xffffff0007986800 (req->ccb 0xffffff0007986800)
>> Nov 11 10:05:53 caprica kernel: mpt0: request  
>> 0xffffff80003c8560:2841 timed
>> out for ccb 0xffffff032d985000 (req->ccb 0xffffff032d985000)
>> Nov 11 10:05:53 caprica kernel: mpt0: request  
>> 0xffffff80003bf320:2842 timed
>> out for ccb 0xffffff0103af2000 (req->ccb 0xffffff0103af2000)
>> Nov 11 10:05:53 caprica kernel: mpt0: request  
>> 0xffffff80003cbda0:2843 timed
>> out for ccb 0xffffff0103b0b000 (req->ccb 0xffffff0103b0b000)
>> Nov 11 10:05:53 caprica kernel: mpt0: request  
>> 0xffffff80003bfd40:2844 timed
>> out for ccb 0xffffff00102bf800 (req->ccb 0xffffff00102bf800)
>> Nov 11 10:05:53 caprica kernel: mpt0: request  
>> 0xffffff80003cad50:2845 timed
>> out for ccb 0xffffff01e6f33000 (req->ccb 0xffffff01e6f33000)
>> Nov 11 10:05:53 caprica kernel: mpt0: request  
>> 0xffffff80003caf00:2846 timed
>> out for ccb 0xffffff01e6f24800 (req->ccb 0xffffff01e6f24800)
>> Nov 11 10:05:53 caprica kernel: mpt0: request  
>> 0xffffff80003ccd60:2847 timed
>> out for ccb 0xffffff01308a4000 (req->ccb 0xffffff01308a4000)
>>
>> Is this a type of error zfs can survive or do I need a hardware  
>> mirror to
>> handle this type of problem?
>>
>> Thanks,
>>
>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>
>
>
>
> -- 
> Olivier Smedts                                                 _
>                                         ASCII ribbon campaign ( )
> e-mail: olivier@gid0.org        - against HTML email & vCards  X
> www: http://www.gid0.org    - against proprietary attachments / \
>
>   "Il y a seulement 10 sortes de gens dans le monde :
>   ceux qui comprennent le binaire,
>   et ceux qui ne le comprennent pas."


From owner-freebsd-fs@FreeBSD.ORG  Tue Nov 16 13:58:41 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 09D32106564A
	for <freebsd-fs@freebsd.org>; Tue, 16 Nov 2010 13:58:41 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta15.westchester.pa.mail.comcast.net
	(qmta15.westchester.pa.mail.comcast.net [76.96.59.228])
	by mx1.freebsd.org (Postfix) with ESMTP id A7BBC8FC08
	for <freebsd-fs@freebsd.org>; Tue, 16 Nov 2010 13:58:40 +0000 (UTC)
Received: from omta19.westchester.pa.mail.comcast.net ([76.96.62.98])
	by qmta15.westchester.pa.mail.comcast.net with comcast
	id Xo841f00527AodY5FpyggG; Tue, 16 Nov 2010 13:58:40 +0000
Received: from koitsu.dyndns.org ([98.248.41.155])
	by omta19.westchester.pa.mail.comcast.net with comcast
	id Xpyf1f00P3LrwQ23fpygVv; Tue, 16 Nov 2010 13:58:40 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 4E24F9B427; Tue, 16 Nov 2010 05:58:38 -0800 (PST)
Date: Tue, 16 Nov 2010 05:58:38 -0800
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Michael Boers <michaelscotttech@gmail.com>
Message-ID: <20101116135838.GA91324@icarus.home.lan>
References: <25DC6C26-52FB-447A-AEB0-8549DA8F53E7@gmail.com>
	<AANLkTi=mqgjj+dWVvZKmUcZWPtZSF2wA=upYy+1dEhRe@mail.gmail.com>
	<441E3529-6178-404E-8A2D-2CF9BBC4170C@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <441E3529-6178-404E-8A2D-2CF9BBC4170C@gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: zfs mirror recognizing disk failures
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Nov 2010 13:58:41 -0000

On Tue, Nov 16, 2010 at 08:32:35AM -0500, Michael Boers wrote:
> To answer Jermey's question of "what happened next?"
> 
> The machine was not serving web requests
> The machine was not responsive via ssh
> The machine was pingable
> 
> after waiting about 15 minutes, I used the ipmi protocol to power
> down the machine.
> When it came back up, I found the enclosed errors in the log.
> 
> If I am following your comments correctly, the fault for this lies
> in the mpt system not giving up which could either be a driver or a
> firmware issue.  Is that correct?
> 
> How do I protect against that?

The fault, in my opinion -- and I urge others (especially those familiar
with the driver) to correct me, because I am often wrong -- lies with
either with the controller itself, or mpt(4), not truly "giving up"
after repetitive errors.  It could be a firmware bug/quirk, sure.  It
could be a lot of things, or a combination of things.  I don't want to
rule out anything.

For example, at my workplace we use Solaris with Adaptec controllers,
using a multitude of Fujitsu disks.  Everything is SCSI-3.  We regularly
(at least once a week, usually more than that) see disk problems where
either the disk falls off the bus unexpectedly, the drive itself
"wedges" (resulting in the controller getting stuck in an infinite loop
trying to talk to it) and won't unwedge without a full power-cycle (soft
reset doesn't work), or in certain bad block circumstances the drive
wedges long enough for the controller driver to break in a strange way
(resulting in a system panic).  Each situation appears to be different;
there's definitely situations where the disk is responsible, others
which look like the controller is responsible, and others which look
like driver issues.

I'm not familiar (read: have not used) mpt(4) controllers, but if my
memory serves me right, people post about problems with them from time
to time on FreeBSD.  Each incident has to be addressed separately.

If you're asking for a workaround or "what should I do", the solution is
to either change controllers (read: avoid mpt(4)), or figure out how/why
the disk became wedged (or if it even did in the first place).

Your original post contains no useful information about the hardware
itself (mpt handles many controllers yet we know not what model, we know
nothing about disk da2, etc.).  You're going to need to provide this.
Relevant dmesg output, camcontrol devlist, camcontrol inquiry, and
smartctl -a output for the disk would be useful (assuming the controller
supports passthrough).

Finally, be aware that trying to chase down a problem of this nature is
often time-consuming.  Sometimes it's not worth it at all, and instead
better spent replacing all of the hardware involved.  If it happens
again after that, change vendors or hardware controllers (or disks)
used.  That's just how it goes.  I tend to stick to Intel ICHxx or ESB
SATA controllers for this reason; they're well-tested on FreeBSD.  And I
don't use hardware RAID at all for many reasons (separate topic).

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Tue Nov 16 15:29:44 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C1E30106564A
	for <freebsd-fs@freebsd.org>; Tue, 16 Nov 2010 15:29:44 +0000 (UTC)
	(envelope-from michaelscotttech@gmail.com)
Received: from mail-qy0-f182.google.com (mail-qy0-f182.google.com
	[209.85.216.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 6BC9D8FC0A
	for <freebsd-fs@freebsd.org>; Tue, 16 Nov 2010 15:29:44 +0000 (UTC)
Received: by qyk7 with SMTP id 7so839146qyk.13
	for <freebsd-fs@freebsd.org>; Tue, 16 Nov 2010 07:29:43 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:cc:message-id:from:to
	:in-reply-to:content-type:content-transfer-encoding:mime-version
	:subject:date:references:x-mailer;
	bh=+vFqQ0MdlodfAUS+9n4l15C46YCYETl3UZKXmEZpttk=;
	b=npAsaOF3l8d2ub7UzifViyI0Dt5dmcummpTba0MEfK4SknRss6Hk8ABCqjH6V4e1z5
	P2Kjgh02w/zrj9mAAwNbtRN2zPaG06bYED8f7TDseIfnflkgcXHykG1dQECmS6wjigAd
	Wvw2fWhM8FkiSr+bkSIz6uhKHxah0FjPEzc6Y=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=cc:message-id:from:to:in-reply-to:content-type
	:content-transfer-encoding:mime-version:subject:date:references
	:x-mailer;
	b=ZpH1ZrdOtCVODS+RhyrX057HUjEZpPf4yNIq5UtnYPeoZHrYHqnf2YAflG/77YQSNo
	HO79JH51PmP3qHSWgbccnLYbX1S1Fg9+vZb8PsiPqdoby0tymkOL2SzaWwHsH3xW7qid
	a8bUX0FXOcHrgQtX8g2Hsgb0j1dIKTqJlfKLM=
Received: by 10.224.80.202 with SMTP id u10mr369942qak.29.1289921383623;
	Tue, 16 Nov 2010 07:29:43 -0800 (PST)
Received: from msb.datacomp-intranet.com
	(h69-130-231-62.mdsnwi.tisp.static.tds.net [69.130.231.62])
	by mx.google.com with ESMTPS id m7sm808903qck.37.2010.11.16.07.29.41
	(version=TLSv1/SSLv3 cipher=RC4-MD5);
	Tue, 16 Nov 2010 07:29:42 -0800 (PST)
Message-Id: <99CF1585-9D89-4F66-B85C-67EA30DD0BD9@gmail.com>
From: Michael Boers <michaelscotttech@gmail.com>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
In-Reply-To: <20101116135838.GA91324@icarus.home.lan>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v936)
Date: Tue, 16 Nov 2010 10:29:40 -0500
References: <25DC6C26-52FB-447A-AEB0-8549DA8F53E7@gmail.com>
	<AANLkTi=mqgjj+dWVvZKmUcZWPtZSF2wA=upYy+1dEhRe@mail.gmail.com>
	<441E3529-6178-404E-8A2D-2CF9BBC4170C@gmail.com>
	<20101116135838.GA91324@icarus.home.lan>
X-Mailer: Apple Mail (2.936)
Cc: freebsd-fs@freebsd.org
Subject: Re: zfs mirror recognizing disk failures
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Nov 2010 15:29:44 -0000


On Nov 16, 2010, at 8:58 AM, Jeremy Chadwick wrote:

> On Tue, Nov 16, 2010 at 08:32:35AM -0500, Michael Boers wrote:
>> To answer Jermey's question of "what happened next?"
>>
>> The machine was not serving web requests
>> The machine was not responsive via ssh
>> The machine was pingable
>>
>> after waiting about 15 minutes, I used the ipmi protocol to power
>> down the machine.
>> When it came back up, I found the enclosed errors in the log.
>>
>> If I am following your comments correctly, the fault for this lies
>> in the mpt system not giving up which could either be a driver or a
>> firmware issue.  Is that correct?
>>
>> How do I protect against that?
>
> The fault, in my opinion -- and I urge others (especially those  
> familiar
> with the driver) to correct me, because I am often wrong -- lies with
> either with the controller itself, or mpt(4), not truly "giving up"
> after repetitive errors.  It could be a firmware bug/quirk, sure.  It
> could be a lot of things, or a combination of things.  I don't want to
> rule out anything.
>
> For example, at my workplace we use Solaris with Adaptec controllers,
> using a multitude of Fujitsu disks.  Everything is SCSI-3.  We  
> regularly
> (at least once a week, usually more than that) see disk problems where
> either the disk falls off the bus unexpectedly, the drive itself
> "wedges" (resulting in the controller getting stuck in an infinite  
> loop
> trying to talk to it) and won't unwedge without a full power-cycle  
> (soft
> reset doesn't work), or in certain bad block circumstances the drive
> wedges long enough for the controller driver to break in a strange way
> (resulting in a system panic).  Each situation appears to be  
> different;
> there's definitely situations where the disk is responsible, others
> which look like the controller is responsible, and others which look
> like driver issues.
>
> I'm not familiar (read: have not used) mpt(4) controllers, but if my
> memory serves me right, people post about problems with them from time
> to time on FreeBSD.  Each incident has to be addressed separately.
>
> If you're asking for a workaround or "what should I do", the  
> solution is
> to either change controllers (read: avoid mpt(4)), or figure out how/ 
> why
> the disk became wedged (or if it even did in the first place).
>
> Your original post contains no useful information about the hardware
> itself (mpt handles many controllers yet we know not what model, we  
> know
> nothing about disk da2, etc.).  You're going to need to provide this.
> Relevant dmesg output, camcontrol devlist, camcontrol inquiry, and
> smartctl -a output for the disk would be useful (assuming the  
> controller
> supports passthrough).

Thanks for the detailed response, it has given me some things to think  
about.  You are right, I had not posted too much about the machine in  
question.  For those interested now or who may run across this in the  
archives, I provide it now (edited and partially reconstructed from  
backups of the log files):

The machine is a Dell PowerEdge 2970 with SAS 6/iR Integrated, x6  
Backplane

Aug 24 05:40:41 caprica kernel: FreeBSD 8.0-RELEASE #0: Fri Jan 29  
14:17:29 EST 2010
Aug 24 05:40:41 caprica kernel: CPU: Quad-Core AMD Opteron(tm)  
Processor 2387 (2793.03-MHz K8-class CPU)
Aug 24 05:40:41 caprica kernel: real memory  = 17179869184 (16384 MB)
Aug 24 05:40:41 caprica kernel: FreeBSD/SMP: Multiprocessor System  
Detected: 4 CPUs
Aug 24 05:40:41 caprica kernel: FreeBSD/SMP: 1 package(s) x 4 core(s)
Aug 24 05:40:41 caprica kernel: mpt0: <LSILogic SAS/SATA Adapter> port  
0xec00-0xecff mem 0xe9fec000-0xe9feffff,0xe9ff0000-0xe9ffffff irq 37  
at device 0.0 on pci7
Aug 24 05:40:41 caprica kernel: mpt0: [ITHREAD]
Aug 24 05:40:41 caprica kernel: mpt0: MPI Version=1.5.18.0
Aug 24 05:40:41 caprica kernel: mpt0: Capabilities: ( RAID-0 RAID-1E  
RAID-1 )
Aug 24 05:40:41 caprica kernel: mpt0: 0 Active Volumes (2 Max)
Aug 24 05:40:41 caprica kernel: mpt0: 0 Hidden Drive Members (14 Max)
Aug 24 05:40:41 caprica kernel: ZFS filesystem version 13
Aug 24 05:40:41 caprica kernel: ZFS storage pool version 13
Aug 24 05:40:41 caprica kernel: Timecounters tick every 1.000 msec
Aug 24 05:40:41 caprica kernel: da0: <ATA WDC WD1602ABKS-1 3B04> Fixed  
Direct Access SCSI-5 device
Aug 24 05:40:41 caprica kernel: da0: 300.000MB/s transfers
Aug 24 05:40:41 caprica kernel: da0: Command Queueing enabled
Aug 24 05:40:41 caprica kernel: da0: 152587MB (312500000 512 byte  
sectors: 255H 63S/T 19452C)
Aug 24 05:40:41 caprica kernel: da1 at mpt0 bus 0 target 1 lun 0
Aug 24 05:40:41 caprica kernel: da1: <ATA WDC WD5002ABYS-1 3B04> Fixed  
Direct Access SCSI-5 device
Aug 24 05:40:41 caprica kernel: da1: 300.000MB/s transfers
Aug 24 05:40:41 caprica kernel: da1: Command Queueing enabled
Aug 24 05:40:41 caprica kernel: da1: 476940MB (976773168 512 byte  
sectors: 255H 63S/T 60801C)
Aug 24 05:40:41 caprica kernel: ses0 at mpt0 bus 0 target 8 lun 0
Aug 24 05:40:41 caprica kernel: ses0: <DP BACKPLANE 1.05> Fixed  
Enclosure Services SCSI-5 device
Aug 24 05:40:41 caprica kernel: ses0: 300.000MB/s transfers
Aug 24 05:40:41 caprica kernel: ses0: SCSI-3 SES Device

added the mirror disks later

Oct 15 10:47:21 caprica kernel: da2 at mpt0 bus 0 target 3 lun 0
Oct 15 10:47:21 caprica kernel: da2: <ATA WDC WD5002ABYS-1 3B04> Fixed  
Direct Access SCSI-5 device
Oct 15 10:47:21 caprica kernel: da2: 300.000MB/s transfers
Oct 15 10:47:21 caprica kernel: da2: Command Queueing enabled
Oct 15 10:47:21 caprica kernel: da2: 476940MB (976773168 512 byte  
sectors: 255H 63S/T 60801C)
Oct 15 10:47:21 caprica kernel: da3 at mpt0 bus 0 target 2 lun 0
Oct 15 10:47:21 caprica kernel: da3: <ATA WDC WD1602ABKS-1 3B05> Fixed  
Direct Access SCSI-5 device
Oct 15 10:47:21 caprica kernel: da3: 300.000MB/s transfers
Oct 15 10:47:21 caprica kernel: da3: Command Queueing enabled
Oct 15 10:47:21 caprica kernel: da3: 152587MB (312500000 512 byte  
sectors: 255H 63S/T 19452C)

started getting the occasional error on da3 (did not realize until  
after the crash.  Now using swatch to check for mpt errors)


Oct 18 03:43:58 caprica kernel: (da3:mpt0:0:2:0): WRITE(10). CDB: 2a 0  
2 4 58 a2 0 0 80 0
Oct 18 03:43:58 caprica kernel: (da3:mpt0:0:2:0): CAM Status: SCSI  
Status Error
Oct 18 03:43:58 caprica kernel: (da3:mpt0:0:2:0): SCSI Status: Check  
Condition
Oct 18 03:43:58 caprica kernel: (da3:mpt0:0:2:0): UNIT ATTENTION asc: 
29,0
Oct 18 03:43:58 caprica kernel: (da3:mpt0:0:2:0): Power on, reset, or  
bus device reset occurred
Oct 18 03:43:58 caprica kernel: (da3:mpt0:0:2:0): Retrying Command  
(per Sense Data)

Camcontrol output (partially reconstructed as the drives are currently  
on my desk)

<ATA WDC WD1602ABKS-1 3B04>        at scbus0 target 0 lun 0 (pass0,da0)
<ATA WDC WD5002ABYS-1 3B04>        at scbus0 target 1 lun 0 (pass1,da1)
<ATA WDC WD5002ABYS-1 3B04>        at scbus0 target 2 lun 0 (pass2,da2)
<ATA WDC WD1602ABKS-1 3B04>        at scbus0 target 3 lun 0 (pass2,da3)
<DP BACKPLANE 1.05>                at scbus0 target 8 lun 0 (ses0,pass4)

This is all I can provide at this time.  I appreciate all of the help  
provided thus far and in future.  I am going to check into BIOS  
updates for the SAS 6/iR and I am in the process of moving to 8.1 for  
better mpt support.

Thanks, again

>
> Finally, be aware that trying to chase down a problem of this nature  
> is
> often time-consuming.  Sometimes it's not worth it at all, and instead
> better spent replacing all of the hardware involved.  If it happens
> again after that, change vendors or hardware controllers (or disks)
> used.  That's just how it goes.  I tend to stick to Intel ICHxx or ESB
> SATA controllers for this reason; they're well-tested on FreeBSD.   
> And I
> don't use hardware RAID at all for many reasons (separate topic).
>
> -- 
> | Jeremy Chadwick                                   jdc@parodius.com |
> | Parodius Networking                       http://www.parodius.com/ |
> | UNIX Systems Administrator                  Mountain View, CA, USA |
> | Making life hard for others since 1977.              PGP: 4BD6C0CB |
>


From owner-freebsd-fs@FreeBSD.ORG  Tue Nov 16 15:47:11 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CFDB11065670
	for <freebsd-fs@freebsd.org>; Tue, 16 Nov 2010 15:47:11 +0000 (UTC)
	(envelope-from freebsd@deman.com)
Received: from cp11.openaccess.org (cp11.openaccess.org [66.114.41.130])
	by mx1.freebsd.org (Postfix) with ESMTP id B1DF98FC13
	for <freebsd-fs@freebsd.org>; Tue, 16 Nov 2010 15:47:11 +0000 (UTC)
Received: from mono-sis1.s.bli.openaccess.org ([66.114.32.149]
	helo=[192.168.2.248])
	by cp11.openaccess.org with esmtpsa (TLSv1:AES128-SHA:128)
	(Exim 4.69) (envelope-from <freebsd@deman.com>) id 1PINkb-0003mc-RV
	for freebsd-fs@freebsd.org; Tue, 16 Nov 2010 07:47:17 -0800
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Apple Message framework v1082)
From: Michael DeMan <freebsd@deman.com>
In-Reply-To: <816D59CD-1BAF-4331-BEAD-67CEADCE4EF9@deman.com>
Date: Tue, 16 Nov 2010 07:47:10 -0800
Content-Transfer-Encoding: quoted-printable
Message-Id: <D316D362-4963-4B43-A2CA-DF516B4D5F5F@deman.com>
References: <01NUB1F8POL000BNN4@tmk.com> <01NUB3IOMZJW00BNN4@tmk.com>
	<816D59CD-1BAF-4331-BEAD-67CEADCE4EF9@deman.com>
To: freebsd-fs@freebsd.org
X-Mailer: Apple Mail (2.1082)
X-AntiAbuse: This header was added to track abuse,
	please include it with any abuse report
X-AntiAbuse: Primary Hostname - cp11.openaccess.org
X-AntiAbuse: Original Domain - freebsd.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - deman.com
X-Source: 
X-Source-Args: 
X-Source-Dir: 
Subject: Re: ZFS panic after replacing log device
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Nov 2010 15:47:11 -0000

I just re-read my last post, after some rest and first cup of morning =
coffee.  Actually, removing the ZIL is a DUMB idea, not sure why I was =
thinking that.  And since you have replaced it already it would probably =
not be the problem?  I presume the machine as ECC memory?


On Nov 15, 2010, at 11:48 PM, Michael DeMan wrote:

> Hi, sorry for not completely digesting your original post.
>=20
> I would say it is definitely very odd that writes are a problem.  =
Sounds like it might be a hardware problem.  Is it possible to export =
the pool, remove the ZIL and re-import it?  I myself would be pretty =
nervous trying that, but it would help isolate the problem?  If you can =
risk it.
>=20
>=20
>=20
> On Nov 15, 2010, at 11:01 PM, Terry Kennedy wrote:
>=20
>> Also note that multiple scrubs pass with no errors detected - it is =
only
>> writes that trigger the panic. It looks like something isn't being =
cleaned
>> up in the clear / replace path.
>=20
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"


From owner-freebsd-fs@FreeBSD.ORG  Tue Nov 16 21:39:31 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B99D9106566B
	for <freebsd-fs@freebsd.org>; Tue, 16 Nov 2010 21:39:31 +0000 (UTC)
	(envelope-from to.my.trociny@gmail.com)
Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com
	[209.85.214.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 38CAB8FC18
	for <freebsd-fs@freebsd.org>; Tue, 16 Nov 2010 21:39:30 +0000 (UTC)
Received: by bwz2 with SMTP id 2so745367bwz.13
	for <freebsd-fs@freebsd.org>; Tue, 16 Nov 2010 13:39:30 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:from:to:subject:date
	:message-id:user-agent:mime-version:content-type;
	bh=vNelWTCG8nxHDwMwUccDl/GKr3C0pB0RBNkt1sMe7vA=;
	b=eISi9wTd+pTV7w7/iJeCRCKf6wmW+4APkz7iJ/0UKN3NULGMRE0W+XB7PAHH0FBMH2
	j49scvddhrx1lpd0U3h3TJIqrVVyb9DRvvnfuUgdY8+v6s913IdqNwzSl2XKouyY1yXg
	arromjJtgVPH/XpryR9UOKf9s9CEmhL2+8yCA=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=from:to:subject:date:message-id:user-agent:mime-version
	:content-type;
	b=bGK8Yk+yKJRipTcgkqxEFUYN8Xms0lap/iwB1LVFrK1+8UrsRrHsaO7qDMSK+4YzNQ
	/CI+XmEDCuIBXSbV4irTKS/Sb+PKusuleNVId2i5s/GTNaR1emmYpAn1L6lyDNNVmOxc
	ePQebXGcbzMIQdcvk0Af+0D06elP3NiT05K4s=
Received: by 10.204.70.142 with SMTP id d14mr3012421bkj.143.1289943570009;
	Tue, 16 Nov 2010 13:39:30 -0800 (PST)
Received: from localhost ([95.69.174.185])
	by mx.google.com with ESMTPS id p22sm822837bkp.21.2010.11.16.13.39.28
	(version=TLSv1/SSLv3 cipher=RC4-MD5);
	Tue, 16 Nov 2010 13:39:29 -0800 (PST)
From: Mikolaj Golub <to.my.trociny@gmail.com>
To: freebsd-fs@freebsd.org
Date: Tue, 16 Nov 2010 23:39:26 +0200
Message-ID: <86eial2bjl.fsf@kopusha.home.net>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=-=-="
Subject: HAST: more than one remote nodes, is it possible?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Nov 2010 21:39:31 -0000

--=-=-=

Hi,

So is it possible to have several remote (secondary) nodes with HAST? Although
current version of hastd supports only one remote node, it looks like it is
possible building HAST devices in stack :-)

I tried this under VirtulBox and it worked for me.

Having 2 pairs of hosts I built lower HAST devise on both pairs and then upper
resource using lower as a provider for upper. Any host can be used as primary
for upper in this configuration. To make things more interesting I installed
jail (VIMAGE) on upper device so vm can be moved between 4 hosts.

Below is hast.conf:

----------------------------------------------------------------------

# Stacked HAST -- data are written to 4 nodes (lolek, bolek, chuk and
# gek) and are accessible on one of the nodes as /dev/hast/upper
#
# /dev/hast/upper
#        |
#        P------------upper-------------S
#        |         172.20.68.30         |
# /dev/hast/lower                /dev/hast/lower
#        |                              |
#        P-----lower----S               P----lower-----S
#        | 172.20.68.31 |               | 172.20.68.32 |
#     /dev/ad4       /dev/ad4        /dev/ad4       /dev/ad4
# 
#      lolek          bolek            chuk           gek

exec /etc/vip.sh

resource lower {

	local /dev/ad4

	on lolek {
		remote bolek
	}

	on bolek {
		remote lolek
	}

	on chuk {
		remote gek
	}

	on gek {
		remote chuk
	}
}

resource upper {

	local /dev/hast/lower
	# XXX: HAST should be patched to know friends 
	friends lolek bolek chuk gek

	on lolek {
		remote 172.20.68.32
	}

	on bolek {
		remote 172.20.68.32
	}

	on chuk {
		remote 172.20.68.31
	}

	on gek {
		remote 172.20.68.31
	}
}

----------------------------------------------------------------------

As it is noted in the comments above hastd should be patched: current version
of hastd allows only connections from addresses specified as remote. The patch
is attached.

/etc/vip.sh is used to switch IP addresses between hosts.

Details can be found here:

http://code.google.com/p/hastmon/wiki/StackedHAST

-- 
Mikolaj Golub


--=-=-=
Content-Type: text/x-patch
Content-Disposition: attachment; filename=hastd.friends.patch

Index: sbin/hastd/parse.y
===================================================================
--- sbin/hastd/parse.y	(revision 214604)
+++ sbin/hastd/parse.y	(working copy)
@@ -209,8 +209,13 @@ void
 yy_config_free(struct hastd_config *config)
 {
 	struct hast_resource *res;
+	struct hast_address *addr;
 
 	while ((res = TAILQ_FIRST(&config->hc_resources)) != NULL) {
+		while ((addr = TAILQ_FIRST(&res->hr_friends)) != NULL) {
+			TAILQ_REMOVE(&res->hr_friends, addr, ha_next);
+			free(addr);
+		}
 		TAILQ_REMOVE(&config->hc_resources, res, hr_next);
 		free(res);
 	}
@@ -218,7 +223,8 @@ yy_config_free(struct hastd_config *config)
 }
 %}
 
-%token CONTROL LISTEN PORT REPLICATION TIMEOUT EXEC EXTENTSIZE RESOURCE NAME LOCAL REMOTE ON
+%token CONTROL LISTEN PORT REPLICATION TIMEOUT EXEC EXTENTSIZE
+%token RESOURCE NAME LOCAL REMOTE FRIENDS ON
 %token FULLSYNC MEMSYNC ASYNC
 %token NUM STR OB CB
 
@@ -418,6 +424,8 @@ node_entry:
 	control_statement
 	|
 	listen_statement
+	|
+	friends_statement
 	;
 
 resource_statement:	RESOURCE resource_start OB resource_entries CB
@@ -513,6 +521,7 @@ resource_start:	STR
 		curres->hr_localpath[0] = '\0';
 		curres->hr_localfd = -1;
 		curres->hr_remoteaddr[0] = '\0';
+		TAILQ_INIT(&curres->hr_friends);
 		curres->hr_ggateunit = -1;
 	}
 	;
@@ -533,6 +542,8 @@ resource_entry:
 	|
 	local_statement
 	|
+	friends_statement
+	|
 	resource_node_statement
 	;
 
@@ -598,6 +609,40 @@ local_statement:	LOCAL STR
 	}
 	;
 
+friends_statement:	FRIENDS friend_addresses
+	;
+
+friend_addresses:
+	|
+	friend_addresses friend_address
+	;
+
+friend_address:		STR
+	{
+		struct hast_address *addr;
+
+		assert(depth == 1 || depth == 2);
+		
+		if (depth == 1 || mynode) {
+			assert(curres != NULL);
+			addr = calloc(1, sizeof(*addr));
+			if (addr == NULL) {
+				errx(EX_TEMPFAIL,
+				    "cannot allocate memory for resource");
+			}
+			if (strlcpy(addr->ha_addr, $1,
+				sizeof(addr->ha_addr)) >=
+			    sizeof(addr->ha_addr)) {
+				pjdlog_error("address argument too long");
+				free($1);
+				return (1);
+			}
+			free($1);
+			TAILQ_INSERT_TAIL(&curres->hr_friends, addr, ha_next);
+		}
+	}
+	;
+
 resource_node_statement:ON resource_node_start OB resource_node_entries CB
 	{
 		mynode = false;
Index: sbin/hastd/hastd.c
===================================================================
--- sbin/hastd/hastd.c	(revision 214604)
+++ sbin/hastd/hastd.c	(working copy)
@@ -156,6 +156,29 @@ child_exit(void)
 	}
 }
 
+static int
+friendscmp(const struct hast_resource *res0, const struct hast_resource *res1)
+{
+	struct hast_address *friend0, *friend1;
+	int nfriends0, nfriends1;
+
+	nfriends0 = nfriends1 = 0;
+	
+	TAILQ_FOREACH(friend0, &res0->hr_friends, ha_next) {
+		TAILQ_FOREACH(friend1, &res1->hr_friends, ha_next) {
+			if (strcmp(friend0->ha_addr, friend1->ha_addr) == 0)
+				break;
+		}
+		if (friend1 == NULL)
+			return (1);
+		nfriends0++;
+	}
+	TAILQ_FOREACH(friend1, &res1->hr_friends, ha_next) {
+		nfriends1++;
+	}
+	return (nfriends0 - nfriends1);
+}
+
 static bool
 resource_needs_restart(const struct hast_resource *res0,
     const struct hast_resource *res1)
@@ -177,6 +200,8 @@ resource_needs_restart(const struct hast_resource
 			return (true);
 		if (strcmp(res0->hr_exec, res1->hr_exec) != 0)
 			return (true);
+		if (friendscmp(res0, res1) != 0)
+			return (true);
 	}
 	return (false);
 }
@@ -201,6 +226,7 @@ resource_needs_reload(const struct hast_resource *
 		return (true);
 	if (strcmp(res0->hr_exec, res1->hr_exec) != 0)
 		return (true);
+
 	return (false);
 }
 
@@ -384,6 +410,7 @@ listen_accept(void)
 	struct hast_resource *res;
 	struct proto_conn *conn;
 	struct nv *nvin, *nvout, *nverr;
+	struct hast_address *friend;
 	const char *resname;
 	const unsigned char *token;
 	char laddr[256], raddr[256];
@@ -416,6 +443,12 @@ listen_accept(void)
 	TAILQ_FOREACH(res, &cfg->hc_resources, hr_next) {
 		if (proto_address_match(conn, res->hr_remoteaddr))
 			break;
+		TAILQ_FOREACH(friend, &res->hr_friends, ha_next) {
+			if (proto_address_match(conn, friend->ha_addr))
+				break;
+		}
+		if (friend != NULL)
+			break;
 	}
 	if (res == NULL) {
 		pjdlog_error("Client %s isn't known.", raddr);
@@ -469,9 +502,15 @@ listen_accept(void)
 
 	/* Does the remote host have access to this resource? */
 	if (!proto_address_match(conn, res->hr_remoteaddr)) {
-		pjdlog_error("Client %s has no access to the resource.", raddr);
-		nv_add_stringf(nverr, "errmsg", "No access to the resource.");
-		goto fail;
+		TAILQ_FOREACH(friend, &res->hr_friends, ha_next) {
+			if (proto_address_match(conn, friend->ha_addr))
+				break;
+		}
+		if (friend == NULL) {
+			pjdlog_error("Client %s has no access to the resource.", raddr);
+			nv_add_stringf(nverr, "errmsg", "No access to the resource.");
+			goto fail;
+		}
 	}
 	/* Is the resource marked as secondary? */
 	if (res->hr_role != HAST_ROLE_SECONDARY) {
Index: sbin/hastd/hast.h
===================================================================
--- sbin/hastd/hast.h	(revision 214604)
+++ sbin/hastd/hast.h	(working copy)
@@ -150,6 +150,8 @@ struct hast_resource {
 
 	/* Address of the remote component. */
 	char	hr_remoteaddr[HAST_ADDRSIZE];
+	/* Hosts that can connect to us. */
+	TAILQ_HEAD(, hast_address) hr_friends;
 	/* Connection for incoming data. */
 	struct proto_conn *hr_remotein;
 	/* Connection for outgoing data. */
@@ -193,6 +195,11 @@ struct hast_resource {
 	TAILQ_ENTRY(hast_resource) hr_next;
 };
 
+struct hast_address {
+	char	ha_addr[HAST_ADDRSIZE];
+	TAILQ_ENTRY(hast_address) ha_next;
+};
+
 struct hastd_config *yy_config_parse(const char *config, bool exitonerror);
 void yy_config_free(struct hastd_config *config);
 
Index: sbin/hastd/hast.conf.5
===================================================================
--- sbin/hastd/hast.conf.5	(revision 214604)
+++ sbin/hastd/hast.conf.5	(working copy)
@@ -81,6 +81,7 @@ resource <name> {
 	local <path>
 	timeout <seconds>
 	exec <path>
+	friends <addr ...>
 
 	on <node> {
 		# Resource-node section
@@ -89,6 +90,7 @@ resource <name> {
 		local <path>
 		# Required
 		remote <addr>
+		friends <addr ...>
 	}
 	on <node> {
 		# Resource-node section
@@ -97,6 +99,7 @@ resource <name> {
 		local <path>
 		# Required
 		remote <addr>
+		friends <addr ...>
 	}
 }
 .Ed
@@ -155,6 +158,13 @@ tcp4://0.0.0.0:8457
 .Pp
 The default value is
 .Pa tcp4://0.0.0.0:8457 .
+.It Ic friends Aq addr ...
+.Pp
+List of addresses (separated by space) of hosts that can connect to
+the node.
+Format is the same as for the
+.Ic listen
+statement.
 .It Ic replication Aq mode
 .Pp
 Replication mode should be one of the following:
Index: sbin/hastd/token.l
===================================================================
--- sbin/hastd/token.l	(revision 214604)
+++ sbin/hastd/token.l	(working copy)
@@ -53,6 +53,7 @@ exec			{ DP; return EXEC; }
 resource		{ DP; return RESOURCE; }
 name			{ DP; return NAME; }
 local			{ DP; return LOCAL; }
+friends			{ DP; return FRIENDS; }
 remote			{ DP; return REMOTE; }
 on			{ DP; return ON; }
 fullsync		{ DP; return FULLSYNC; }

--=-=-=--

From owner-freebsd-fs@FreeBSD.ORG  Wed Nov 17 02:13:00 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6F74E1065672;
	Wed, 17 Nov 2010 02:13:00 +0000 (UTC) (envelope-from TERRY@tmk.com)
Received: from server.tmk.com (server.tmk.com [204.141.35.63])
	by mx1.freebsd.org (Postfix) with ESMTP id 488438FC15;
	Wed, 17 Nov 2010 02:13:00 +0000 (UTC)
Received: from tmk.com by tmk.com (PMDF V6.4 #37010)
	id <01NUC5J6QBZK00BNN4@tmk.com>; Tue, 16 Nov 2010 21:12:57 -0500 (EST)
Date: Tue, 16 Nov 2010 20:41:47 -0500 (EST)
From: Terry Kennedy <TERRY@tmk.com>
In-reply-to: "Your message dated Mon, 15 Nov 2010 23:48:26 -0800"
	<816D59CD-1BAF-4331-BEAD-67CEADCE4EF9@deman.com>
To: Michael DeMan <freebsd@deman.com>
Message-id: <01NUC6V4LBAQ00BNN4@tmk.com>
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=us-ascii
References: <01NUB1F8POL000BNN4@tmk.com> <01NUB3IOMZJW00BNN4@tmk.com>
Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org
Subject: Re: ZFS panic after replacing log device
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Nov 2010 02:13:00 -0000

> I would say it is definitely very odd that writes are a problem.  Sounds
> like it might be a hardware problem.  Is it possible to export the pool, 
> remove the ZIL and re-import it?  I myself would be pretty nervous trying
> that, but it would help isolate the problem?  If you can risk it.

  I think it is unlikely to be a hardware problem. While I haven't run any
destructive testing on the ZFS pool, the fact that it can be read without
error, combined with ECC throughout the system and the panic always happen-
ing on the first write, makes me think that it is a software issue in ZFS.

  When I do:

	zpool export data; zpool remove data da0

  I get a "No such pool: data". I then re-imported the pool and did:

	zpool offline data da0; zpool export data; zpool import data

  After doing that, I can write to the pool without a panic. But once I
online the log device and do any writes, I get the panic again.

  As I mentioned, I have this data replicated elsewere, so I can exper-
iment with the pool if it will help track down this issue.

        Terry Kennedy             http://www.tmk.com
        terry@tmk.com             New York, NY USA

From owner-freebsd-fs@FreeBSD.ORG  Wed Nov 17 15:14:58 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 78D081065679
	for <freebsd-fs@freebsd.org>; Wed, 17 Nov 2010 15:14:58 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from lo.gmane.org (lo.gmane.org [80.91.229.12])
	by mx1.freebsd.org (Postfix) with ESMTP id 04C5F8FC14
	for <freebsd-fs@freebsd.org>; Wed, 17 Nov 2010 15:14:57 +0000 (UTC)
Received: from list by lo.gmane.org with local (Exim 4.69)
	(envelope-from <freebsd-fs@m.gmane.org>) id 1PIjiq-000638-Lp
	for freebsd-fs@freebsd.org; Wed, 17 Nov 2010 16:14:56 +0100
Received: from lara.cc.fer.hr ([161.53.72.113])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Wed, 17 Nov 2010 16:14:56 +0100
Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Wed, 17 Nov 2010 16:14:56 +0100
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-fs@freebsd.org
From: Ivan Voras <ivoras@freebsd.org>
Date: Wed, 17 Nov 2010 16:14:50 +0100
Lines: 45
Message-ID: <ic0rh5$4bb$1@dough.gmane.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Complaints-To: usenet@dough.gmane.org
X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.12) Gecko/20101102 Thunderbird/3.1.6
X-Enigmail-Version: 1.1.2
Subject: Kind of slow IO under pressure?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Nov 2010 15:14:58 -0000

I'm running bonnie++ on UFS hosted on a FC SAN, and during the "rewrite" 
phase of the benchmark (bonnie++ is deliberately trying to trash disk 
caches), the situation from "top" is like this:

    21 root      76    -     0K    16K qsleep  1   2:48 44.38% bufdaemon
  1274 root      76    0 11296K  1716K getblk  7   3:35 41.06% bonnie++
     3 root      -8    -     0K    16K -       2   1:44 21.78% g_up

With disk rates of about 40 MB/s in either direction and around 300 IOPS.

Together these three processes take up a whole CPU core and bonnie++ is 
single-threaded, which sort of suggests that this supposedly IO 
benchmark is actually CPU-bound on this 8-core 2.4 GHz system.

Transaction sizes are curiously not-quite-64KiB:

        tty             da0              da1              da2 
   cpu
  tin  tout  KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s  us ni sy 
in id
    0   380 15.93  28  0.44  63.75 1242 77.30   1.93   0  0.00   0  0  9 
  0 91
    0  1510 16.00   3  0.05  63.86 1148 71.56   0.00   0  0.00   0  0 10 
  0 90
    0  1063 16.00   2  0.03  63.96 1093 68.24   0.00   0  0.00   0  0 14 
  0 86
    0   357  0.00   0  0.00  63.95 1024 63.95   0.00   0  0.00   0  0 15 
  0 85

Interestingly, binding (with cpuset) all three processes to the same CPU 
makes things very strange, with bufdaemon climbing to 90% CPU usage and 
bandwidth dropping to ~~ 5 MB/s (but not completely stalling). Relaxing 
bufdeamon binds to two CPUs makes things normal again.

On the "read" phase the situation is:

  1274 root      61    0 11296K  1704K getblk  0   7:05 26.56% bonnie++
    18 root      44    -     0K    16K psleep  4   0:09  2.88% pagedaemon
     3 root      -8    -     0K    16K -       0   3:31  1.66% g_up
     4 root      -8    -     0K    16K -       2   0:17  1.66% g_down
    12 root     -40    -     0K   432K WAIT    5   0:11  1.27% {swi2: 
cambio}

i.e. no CPU hogging and the performance is ~~ 140 MB/s. The write phase 
is similar, nothing unusual.


From owner-freebsd-fs@FreeBSD.ORG  Wed Nov 17 17:05:54 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 29AF81065ACD
	for <freebsd-fs@FreeBSD.ORG>; Wed, 17 Nov 2010 17:05:54 +0000 (UTC)
	(envelope-from olli@lurza.secnetix.de)
Received: from lurza.secnetix.de (lurza.secnetix.de [IPv6:2a01:170:102f::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 9ED618FC25
	for <freebsd-fs@FreeBSD.ORG>; Wed, 17 Nov 2010 17:05:53 +0000 (UTC)
Received: from lurza.secnetix.de (localhost [127.0.0.1])
	by lurza.secnetix.de (8.14.3/8.14.3) with ESMTP id oAHH5asM003850;
	Wed, 17 Nov 2010 18:05:52 +0100 (CET)
	(envelope-from oliver.fromme@secnetix.de)
Received: (from olli@localhost)
	by lurza.secnetix.de (8.14.3/8.14.3/Submit) id oAHH5age003849;
	Wed, 17 Nov 2010 18:05:36 +0100 (CET) (envelope-from olli)
Date: Wed, 17 Nov 2010 18:05:36 +0100 (CET)
Message-Id: <201011171705.oAHH5age003849@lurza.secnetix.de>
From: Oliver Fromme <olli@lurza.secnetix.de>
To: freebsd-fs@FreeBSD.ORG
X-Newsgroups: list.freebsd-fs
User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX)
	(FreeBSD/6.4-PRERELEASE-20080904 (i386))
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.3.5
	(lurza.secnetix.de [127.0.0.1]);
	Wed, 17 Nov 2010 18:05:52 +0100 (CET)
Cc: 
Subject: NFS hangs (7.3)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Nov 2010 17:05:54 -0000

I've got a problem on a server farm.  Every now and then,
some NFS mounts hang.  This happens after a few days or
after a few weeks.  All processes trying to access files
from the hanging mount go to state "D" and freeze.  The
only way to resolve the problem is to reboot the server.

"umount -f" als hangs and does not remove the hanging
mount (even though it disappears from the output of the
mount(8) command).

Here's one example from an attempt to run df(1) which
also hangs:

ps -uww:
USER   PID %CPU %MEM  VSZ  RSS TT  STAT STARTED    TIME COMMAND
root 61930  0.0  0.0 5728 1280 p4- D     5:15PM 0:00.01 /bin/df

ps -lww:
UID   PID PPID CPU PRI NI  VSZ  RSS MWCHAN STAT TT     TIME COMMAND
  0 61930    1   0  -4  0 5728 1280 nfs    D    p4- 0:00.01 /bin/df

procstat -kk:
  PID    TID COMM       TDNAME     KSTACK
61930 100489 df         -          mi_switch+0x18e sleepq_wait+0x3b
_sleep+0x367 acquire+0x7c _lockmgr+0x203 VOP_LOCK1_APV+0x46
_vn_lock+0x83 vget+0xf9 vfs_hash_get+0xf4 nfs_nget+0xa8 nfs_statfs+0x8b
__vfs_statfs+0x2b kern_getfsstat+0x2d6 syscall+0x256 Xfast_syscall+0xab

And this is a hanging umount(8) command (I used fsid syntax,
hoping that it would work better than accessing the mont by
its path, but it doesn't seem to make a difference):

ps -uww:
USER   PID %CPU %MEM  VSZ  RSS TT  STAT STARTED    TIME COMMAND
root 62791  0.0  0.0 4640 1272 p4- D     5:18PM 0:00.08 umount -f a5ff000505000000

ps -lww:
UID   PID PPID CPU PRI NI  VSZ  RSS MWCHAN STAT TT     TIME COMMAND
  0 62791    1   0  -4  0 4640 1272 vfsloc D    p4- 0:00.08 umount -f a5ff000505000000

procstat -kk:
  PID    TID COMM       TDNAME     KSTACK
62791 100239 umount     -          mi_switch+0x18e sleepq_wait+0x3b
_sleep+0x367 _lockmgr+0x4f3 dounmount+0x474 unmount+0x30a
syscall+0x256 Xfast_syscall+0xab

The machine is quite busy.  The hangs seem to always occur
in the night when lots of cron jobs are running.  The machine
has 221 NFS mounts and 26 nullfs mounts, and it has 26 jails,
if that matters.  All NFS shares are mounted from a virtual
filer running on a NetApp filer.  The mounts use the default
settings, so they should be v3 TCP (this is the default,
right?).  The only extra option we use is -L in order to
"fake" locking locally.

The machine is running FreeBSD 7.3-PRERELEASE-20100311 amd64.
Updating is somewhat complicated in that server farm, so I
haven't tried that so far because I'm not sure if it would
help.

Any suggestions or ideas?

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Gesch�ftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht M�n-
chen, HRB 125758,  Gesch�ftsf�hrer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

"Software gets slower faster than hardware gets faster."
        -- Niklaus Wirth

From owner-freebsd-fs@FreeBSD.ORG  Thu Nov 18 08:26:30 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8F724106566B;
	Thu, 18 Nov 2010 08:26:30 +0000 (UTC)
	(envelope-from delphij@delphij.net)
Received: from tarsier.geekcn.org (tarsier.geekcn.org [IPv6:2001:470:a803::1])
	by mx1.freebsd.org (Postfix) with ESMTP id 10EF78FC13;
	Thu, 18 Nov 2010 08:25:47 +0000 (UTC)
Received: from mail.geekcn.org (tarsier.geekcn.org [211.166.10.233])
	by tarsier.geekcn.org (Postfix) with ESMTP id 639F7A6B110;
	Thu, 18 Nov 2010 16:25:28 +0800 (CST)
X-Virus-Scanned: amavisd-new at geekcn.org
Received: from tarsier.geekcn.org ([211.166.10.233])
	by mail.geekcn.org (mail.geekcn.org [211.166.10.233]) (amavisd-new,
	port 10024)
	with LMTP id ETcrHbMC3UAv; Thu, 18 Nov 2010 16:25:20 +0800 (CST)
Received: from delta.delphij.net (unknown
	[IPv6:2001:470:83bf:0:221:5cff:fe6a:37bb])
	(using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by tarsier.geekcn.org (Postfix) with ESMTPSA id 1D02FA6B0B7;
	Thu, 18 Nov 2010 16:24:51 +0800 (CST)
DomainKey-Signature: a=rsa-sha1; s=default; d=delphij.net; c=nofws; q=dns;
	h=message-id:date:from:reply-to:organization:user-agent:
	mime-version:to:cc:subject:references:in-reply-to:
	x-enigmail-version:openpgp:content-type:content-transfer-encoding;
	b=a9L3rD6x2qoELxmQsVrC2ePHRHzBoigl/D5ndcpMzTWk0sZBzpHcq1XY005SUyigk
	EN9VAe6jUt4zuhpf9yz/A==
Message-ID: <4CE4E2B2.7070702@delphij.net>
Date: Thu, 18 Nov 2010 00:24:18 -0800
From: Xin LI <delphij@delphij.net>
Organization: The FreeBSD Project
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.1.15) Gecko/20101028 Thunderbird/3.0.10 ThunderBrowse/3.3.2
MIME-Version: 1.0
To: Ivan Voras <ivoras@freebsd.org>
References: <ibjkpq$m03$1@dough.gmane.org> <ibjvsp$evi$1@dough.gmane.org>
In-Reply-To: <ibjvsp$evi$1@dough.gmane.org>
X-Enigmail-Version: 1.0.1
OpenPGP: id=3FCA37C1;
	url=http://www.delphij.net/delphij.asc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org, freebsd-geom@freebsd.org
Subject: Re: ZFS stripesize patch (in the context of 4k sector drives)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: d@delphij.net
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Nov 2010 08:26:30 -0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 11/12/10 10:09, Ivan Voras wrote:
> On 11/12/10 16:00, Ivan Voras wrote:
>> Hello,
>>
>> Any objections to me committing the following patch?
>>
>> The intention is to use stripesize info from GEOM in creating vdevs, in
>> the hope that the 4 KiB sector magic will work.
> 
> Or maybe not. I've grepped and other tools use stripesize in the way its
> name suggests - as RAID stripe size, not as logical sector size.
> 
> New idea on the menu: make the logical sector size a separate concept
> and a separate variable from stripe size. Would that be a better approach?

Have you tested this booting from existing ZFS file system?

Cheers,
- -- 
Xin LI <delphij@delphij.net>	http://www.delphij.net/
FreeBSD - The Power to Serve!	       Live free or die
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.16 (FreeBSD)

iQEcBAEBCAAGBQJM5OKyAAoJEATO+BI/yjfBby4IAISFkInYkAa0OWUKtvWZbmdk
VUmQHN/e8tToB+Yb6+IRlk0e+Wu2Cqc2TACrdNRgq2f9BNUIjrfkJo1Flz0SlQlU
jtkutNVPjyh2aC3dBucWNSGAoadC5qq2VdQgDtzgK0OcNN/EKKUIHadZHWsCqyuD
RT37u9FZcBXMytRwB7DFWVLdfTfpTMyyYSBmWRliUFnIg7XgR1YD6Lu3ne2Nzj9/
7DaF0E308m3VSWyQRgB1l6EszWoIaGVbbY6TObp9zlNvug4wYSuBGvvuT+gojV/J
FHSaZDLNg/EhXR0T7IRaqOppvzYz5r2bnaqJr70JuT+9nZJZvV8ePuSuDdFx520=
=pB0h
-----END PGP SIGNATURE-----

From owner-freebsd-fs@FreeBSD.ORG  Thu Nov 18 12:49:43 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 17E351065697
	for <freebsd-fs@freebsd.org>; Thu, 18 Nov 2010 12:49:43 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
	[131.104.91.44])
	by mx1.freebsd.org (Postfix) with ESMTP id C984B8FC23
	for <freebsd-fs@freebsd.org>; Thu, 18 Nov 2010 12:49:42 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApwEAMev5EyDaFvO/2dsb2JhbACDRJ9/rCeRA4EigVOBY3MEhFqGAA
X-IronPort-AV: E=Sophos;i="4.59,217,1288584000"; d="scan'208";a="101159193"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 18 Nov 2010 07:49:41 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id DB8BDB3F30;
	Thu, 18 Nov 2010 07:49:41 -0500 (EST)
Date: Thu, 18 Nov 2010 07:49:41 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Oliver Fromme <olli@lurza.secnetix.de>
Message-ID: <230979963.266261.1290084581845.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <201011171705.oAHH5age003849@lurza.secnetix.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [12.16.49.138]
X-Mailer: Zimbra 6.0.7_GA_2476.RHEL4 (ZimbraWebClient - IE8
	(Win)/6.0.7_GA_2473.RHEL4_64)
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: NFS hangs (7.3)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Nov 2010 12:49:43 -0000

> I've got a problem on a server farm. Every now and then,
> some NFS mounts hang. This happens after a few days or
> after a few weeks. All processes trying to access files
> from the hanging mount go to state "D" and freeze. The
> only way to resolve the problem is to reboot the server.
> 
> "umount -f" als hangs and does not remove the hanging
> mount (even though it disappears from the output of the
> mount(8) command).
> 
> Here's one example from an attempt to run df(1) which
> also hangs:
> 
> ps -uww:
> USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
> root 61930 0.0 0.0 5728 1280 p4- D 5:15PM 0:00.01 /bin/df
> 
> ps -lww:
> UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND
> 0 61930 1 0 -4 0 5728 1280 nfs D p4- 0:00.01 /bin/df
> 
It would appear that the root vnode for the client mount
point is locked for some reason. Here are a couple of possible
explanations:
1 - An infrequently executed code path doesn't VOP_UNLOCK()/vput()
    as it should. This seems relatively unlikely, since others are
    using the client without difficulties, but it might be an error
    case that only shows up for your environment.
2 - Another thread is holding the lock while stuck waiting for something
    else. The most obvious "something else" would be an RPC reply from
    the server. (A locking deadlock as mentioned below w.r.t. the spawning
    of new nfsiod threads, could be another?)

I'd suggest a "ps axHl" when this happens, and then look for a thread that
is waiting for an RPC reply. I'd also suggest "nfsstat -c" done several
times over a few minutes, to see if any of the counts is changing.
Also, you can do "tcpdump -w xxx -s 0 host <nfs-server>" on the client
for a while and then look at "xxx" in wireshark (it knows NFS packets)
and see if there is any net traffic to/from the server. (This will tell
you if it is a problem related to an RPC that is in progress vs something
else.) It will also tell you if it is using TCP (or you can "netstat -a"
to see if TCP connections are there for the NFS mounts).

> 
> The machine is quite busy. The hangs seem to always occur
> in the night when lots of cron jobs are running. The machine
> has 221 NFS mounts and 26 nullfs mounts, and it has 26 jails,
> if that matters. All NFS shares are mounted from a virtual
> filer running on a NetApp filer. The mounts use the default
> settings, so they should be v3 TCP (this is the default,
> right?). The only extra option we use is -L in order to
> "fake" locking locally.
> 
> The machine is running FreeBSD 7.3-PRERELEASE-20100311 amd64.
> Updating is somewhat complicated in that server farm, so I
> haven't tried that so far because I'm not sure if it would
> help.
> 
I've only been working with 8/current, so I can't recall if
there have been any client fixes for 7 since then, except there
was a very recent change w.r.t. spawning of nfsiod threads to
avoid lor (potential deadlocks) related to creating new kernel
threads. I have no idea if one of these deadlocks might be involved.
(Someone familiar with that might be able to comment?)

rick

From owner-freebsd-fs@FreeBSD.ORG  Thu Nov 18 12:54:27 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 819F9106566B
	for <freebsd-fs@freebsd.org>; Thu, 18 Nov 2010 12:54:27 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id 1AD1D8FC12
	for <freebsd-fs@freebsd.org>; Thu, 18 Nov 2010 12:54:26 +0000 (UTC)
Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua
	[10.1.1.148])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id oAICsNPT042089
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Thu, 18 Nov 2010 14:54:23 +0200 (EET)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id
	oAICsNBX021386; Thu, 18 Nov 2010 14:54:23 +0200 (EET)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id oAICsNDK021385; 
	Thu, 18 Nov 2010 14:54:23 +0200 (EET)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Thu, 18 Nov 2010 14:54:23 +0200
From: Kostik Belousov <kostikbel@gmail.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Message-ID: <20101118125423.GD2392@deviant.kiev.zoral.com.ua>
References: <201011171705.oAHH5age003849@lurza.secnetix.de>
	<230979963.266261.1290084581845.JavaMail.root@erie.cs.uoguelph.ca>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="4gBflNtHT/MYzbiL"
Content-Disposition: inline
In-Reply-To: <230979963.266261.1290084581845.JavaMail.root@erie.cs.uoguelph.ca>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-2.5 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_20,
	DNS_FROM_OPENWHOIS autolearn=no version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: freebsd-fs@freebsd.org, Oliver Fromme <olli@lurza.secnetix.de>
Subject: Re: NFS hangs (7.3)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Nov 2010 12:54:27 -0000


--4gBflNtHT/MYzbiL
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Nov 18, 2010 at 07:49:41AM -0500, Rick Macklem wrote:
> > I've got a problem on a server farm. Every now and then,
> > some NFS mounts hang. This happens after a few days or
> > after a few weeks. All processes trying to access files
> > from the hanging mount go to state "D" and freeze. The
> > only way to resolve the problem is to reboot the server.
> >=20
> > "umount -f" als hangs and does not remove the hanging
> > mount (even though it disappears from the output of the
> > mount(8) command).
> >=20
> > Here's one example from an attempt to run df(1) which
> > also hangs:
> >=20
> > ps -uww:
> > USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
> > root 61930 0.0 0.0 5728 1280 p4- D 5:15PM 0:00.01 /bin/df
> >=20
> > ps -lww:
> > UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND
> > 0 61930 1 0 -4 0 5728 1280 nfs D p4- 0:00.01 /bin/df
> >=20
> It would appear that the root vnode for the client mount
> point is locked for some reason. Here are a couple of possible
> explanations:
> 1 - An infrequently executed code path doesn't VOP_UNLOCK()/vput()
>     as it should. This seems relatively unlikely, since others are
>     using the client without difficulties, but it might be an error
>     case that only shows up for your environment.
> 2 - Another thread is holding the lock while stuck waiting for something
>     else. The most obvious "something else" would be an RPC reply from
>     the server. (A locking deadlock as mentioned below w.r.t. the spawning
>     of new nfsiod threads, could be another?)
>=20
> I'd suggest a "ps axHl" when this happens, and then look for a thread that
> is waiting for an RPC reply. I'd also suggest "nfsstat -c" done several
> times over a few minutes, to see if any of the counts is changing.
> Also, you can do "tcpdump -w xxx -s 0 host <nfs-server>" on the client
> for a while and then look at "xxx" in wireshark (it knows NFS packets)
> and see if there is any net traffic to/from the server. (This will tell
> you if it is a problem related to an RPC that is in progress vs something
> else.) It will also tell you if it is using TCP (or you can "netstat -a"
> to see if TCP connections are there for the NFS mounts).
>=20
> >=20
> > The machine is quite busy. The hangs seem to always occur
> > in the night when lots of cron jobs are running. The machine
> > has 221 NFS mounts and 26 nullfs mounts, and it has 26 jails,
> > if that matters. All NFS shares are mounted from a virtual
> > filer running on a NetApp filer. The mounts use the default
> > settings, so they should be v3 TCP (this is the default,
> > right?). The only extra option we use is -L in order to
> > "fake" locking locally.
> >=20
> > The machine is running FreeBSD 7.3-PRERELEASE-20100311 amd64.
> > Updating is somewhat complicated in that server farm, so I
> > haven't tried that so far because I'm not sure if it would
> > help.
> >=20
> I've only been working with 8/current, so I can't recall if
> there have been any client fixes for 7 since then, except there
> was a very recent change w.r.t. spawning of nfsiod threads to
> avoid lor (potential deadlocks) related to creating new kernel
> threads. I have no idea if one of these deadlocks might be involved.
> (Someone familiar with that might be able to comment?)

The changes for nfsiod creation are definitely not in 7.3-prerelease.

To diagnose the issue, we could start with the output of ps axlHww
(already suggested by Rick) and procstat -ka.

--4gBflNtHT/MYzbiL
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEARECAAYFAkzlIf4ACgkQC3+MBN1Mb4gwzwCdG+4agR3kKzOrppZjoEavVjQV
of0AoNVqIQcvr44tjgDczQIDZCxHcq7q
=ERog
-----END PGP SIGNATURE-----

--4gBflNtHT/MYzbiL--

From owner-freebsd-fs@FreeBSD.ORG  Thu Nov 18 16:00:17 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5D2B4106566B
	for <freebsd-fs@FreeBSD.org>; Thu, 18 Nov 2010 16:00:17 +0000 (UTC)
	(envelope-from karl.oulmi@ibl.cnrs.fr)
Received: from mima.ibl.fr (mima.ibl.fr [193.49.178.26])
	by mx1.freebsd.org (Postfix) with ESMTP id C07138FC17
	for <freebsd-fs@FreeBSD.org>; Thu, 18 Nov 2010 16:00:16 +0000 (UTC)
X-Virus-Scanned: amavisd-new at ibl.fr
Message-ID: <4CE5490A.6030209@ibl.cnrs.fr>
Date: Thu, 18 Nov 2010 16:40:58 +0100
From: Karl Oulmi <karl.oulmi@ibl.cnrs.fr>
MIME-Version: 1.0
To: freebsd-fs@FreeBSD.org
Content-Type: multipart/signed; protocol="application/pkcs7-signature";
	micalg=sha1; boundary="------------ms070200090109010803050807"
X-Scanned-By: MIMEDefang 2.64 on 193.49.178.28
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: 
Subject: support of carpdev
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Nov 2010 16:00:17 -0000

This is a cryptographically signed message in MIME format.

--------------ms070200090109010803050807
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: quoted-printable

Hi all,

I'm trying to build a HA test platform with hast and carp (freebsd 8.1)

The goal is to transmit carp packets on a dedicated network card on each =

boxes (and not on the tcp/ip network card).

To do that, I need to specify the carpdev argument to carp, but it=20
doesn't seems to work :(

If anyone could tell me if it's supported on 8.1 release,...and if not=20
how could I do ?

Thank a lot
Karl.



--------------ms070200090109010803050807--

From owner-freebsd-fs@FreeBSD.ORG  Thu Nov 18 18:04:04 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A482710657C1
	for <fs@freebsd.org>; Thu, 18 Nov 2010 18:04:04 +0000 (UTC)
	(envelope-from dougb@FreeBSD.org)
Received: from mail2.fluidhosting.com (mx23.fluidhosting.com [204.14.89.6])
	by mx1.freebsd.org (Postfix) with ESMTP id 345BE8FC18
	for <fs@freebsd.org>; Thu, 18 Nov 2010 18:04:03 +0000 (UTC)
Received: (qmail 25251 invoked by uid 399); 18 Nov 2010 18:04:02 -0000
Received: from localhost (HELO doug-optiplex.ka9q.net)
	(dougb@dougbarton.us@127.0.0.1)
	by localhost with ESMTPAM; 18 Nov 2010 18:04:02 -0000
X-Originating-IP: 127.0.0.1
X-Sender: dougb@dougbarton.us
Message-ID: <4CE56A90.3020200@FreeBSD.org>
Date: Thu, 18 Nov 2010 10:04:00 -0800
From: Doug Barton <dougb@FreeBSD.org>
Organization: http://SupersetSolutions.com/
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.12) Gecko/20101028 Thunderbird/3.1.6
MIME-Version: 1.0
To: Aditya Sarawgi <sarawgi.aditya@gmail.com>
References: <20100929031825.L683@besplex.bde.org>
	<20100929084801.M948@besplex.bde.org>
	<20100929041650.GA1553@aditya> <201009290917.05269.jhb@freebsd.org>
	<20100929202526.GA1564@aditya> <4CD0A3E8.4080304@FreeBSD.org>
	<AANLkTi=iTCG4aO-KO_gy7fp_96KcZ_TCyNk5OkLZUHV3@mail.gmail.com>
	<4CD201AE.3040409@FreeBSD.org> <20101108174327.GC2066@earth>
In-Reply-To: <20101108174327.GC2066@earth>
X-Enigmail-Version: 1.1.2
OpenPGP: id=1A1ABC84
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: fs@freebsd.org
Subject: Re: ext2fs now extremely slow
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Nov 2010 18:04:04 -0000

On 11/08/2010 09:43, Aditya Sarawgi wrote:

> I have attached the patch.

Finally got a chance to test it. There was a huge .rej file for 
_alloc.c, so I regenerated the patch against HEAD:
http://people.freebsd.org/~dougb/ext2fs_prealloc-r215457.diff

In light testing so far it seems ok, and it is slightly better from a 
performance standpoint, but still not as fast as 8.1. My test is 
csup'ing the ports tree without using -s. Admittedly that's a rather 
non-scientific test in that the network affects it, however I'm on a 
pretty fast link, and I'm forcing IPv4 just to be safe. This is the use 
case

Times in seconds:
8.1 averages around 300
HEAD pre-patch average around 450
HEAD post-patch average around 370

So a good improvement to be sure, but still a ways to go.


Doug

-- 

	Nothin' ever doesn't change, but nothin' changes much.
			-- OK Go

	Breadth of IT experience, and depth of knowledge in the DNS.
	Yours for the right price.  :)  http://SupersetSolutions.com/


From owner-freebsd-fs@FreeBSD.ORG  Thu Nov 18 18:07:40 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2180410656A4
	for <freebsd-fs@freebsd.org>; Thu, 18 Nov 2010 18:07:40 +0000 (UTC)
	(envelope-from dougb@FreeBSD.org)
Received: from mail2.fluidhosting.com (mx23.fluidhosting.com [204.14.89.6])
	by mx1.freebsd.org (Postfix) with ESMTP id A56268FC13
	for <freebsd-fs@freebsd.org>; Thu, 18 Nov 2010 18:07:39 +0000 (UTC)
Received: (qmail 32076 invoked by uid 399); 18 Nov 2010 18:07:39 -0000
Received: from localhost (HELO doug-optiplex.ka9q.net)
	(dougb@dougbarton.us@127.0.0.1)
	by localhost with ESMTPAM; 18 Nov 2010 18:07:39 -0000
X-Originating-IP: 127.0.0.1
X-Sender: dougb@dougbarton.us
Message-ID: <4CE56B6A.2060305@FreeBSD.org>
Date: Thu, 18 Nov 2010 10:07:38 -0800
From: Doug Barton <dougb@FreeBSD.org>
Organization: http://SupersetSolutions.com/
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.12) Gecko/20101028 Thunderbird/3.1.6
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
References: <20100929031825.L683@besplex.bde.org>	<20100929084801.M948@besplex.bde.org>	<20100929041650.GA1553@aditya>
	<201009290917.05269.jhb@freebsd.org>	<20100929202526.GA1564@aditya>
	<4CD0A3E8.4080304@FreeBSD.org>	<AANLkTi=iTCG4aO-KO_gy7fp_96KcZ_TCyNk5OkLZUHV3@mail.gmail.com>	<4CD201AE.3040409@FreeBSD.org>
	<20101108174327.GC2066@earth> <4CE56A90.3020200@FreeBSD.org>
In-Reply-To: <4CE56A90.3020200@FreeBSD.org>
X-Enigmail-Version: 1.1.2
OpenPGP: id=1A1ABC84
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: ext2fs now extremely slow
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Nov 2010 18:07:40 -0000

On 11/18/2010 10:04, Doug Barton wrote:
> This is the use case

That's weird, I thought I finished that sentence. :)  This is the use 
case where I first noticed the dramatic slowdown (csup, and related 
ports tree ops), which is why I am using it to test the efficacy of the 
patch.


Doug

-- 

	Nothin' ever doesn't change, but nothin' changes much.
			-- OK Go

	Breadth of IT experience, and depth of knowledge in the DNS.
	Yours for the right price.  :)  http://SupersetSolutions.com/


From owner-freebsd-fs@FreeBSD.ORG  Thu Nov 18 20:26:59 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2FDD8106564A
	for <freebsd-fs@freebsd.org>; Thu, 18 Nov 2010 20:26:59 +0000 (UTC)
	(envelope-from grarpamp@gmail.com)
Received: from mail-pz0-f54.google.com (mail-pz0-f54.google.com
	[209.85.210.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 0DA898FC17
	for <freebsd-fs@freebsd.org>; Thu, 18 Nov 2010 20:26:58 +0000 (UTC)
Received: by pzk1 with SMTP id 1so789946pzk.13
	for <freebsd-fs@freebsd.org>; Thu, 18 Nov 2010 12:26:58 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:date:message-id
	:subject:from:to:content-type;
	bh=5e2mdSAnGmt7zS5K1D5mfroFI/0tv5sJE5pnOTfOEWs=;
	b=eu85Wz5VZhEu8Vle4+krTNRgfy2xWDj3p5UWmPVeFOjFcfF8cZPebXbQcnhnF40D2E
	/v390U8ppjgk8uLy4AIJ9YYfEOW9rxvPYsJtJs2gzVvcrXZ3zbqswpy3h1cmteRb7Cd0
	7p9TbI8nNqiodU/lEFZYIr49U8wivxgE27DqY=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:date:message-id:subject:from:to:content-type;
	b=m6jCKTJeUI7U9G5ci2RZglvYBa4Twuf3/qpzt12HmZSLi9dkOKp00MyebReYLLROFy
	9d2wE4DxIrOfWlSUKIGaoqTuZaqcPTyqGDBer2N975qMgp62Jh9KYXy0a+ZXpWvlOiVc
	7NfWVp8wUdgD9olGpdYDVl8ZCrEVOwvuvBhvA=
MIME-Version: 1.0
Received: by 10.142.157.16 with SMTP id f16mr900462wfe.287.1290110286159; Thu,
	18 Nov 2010 11:58:06 -0800 (PST)
Received: by 10.142.178.2 with HTTP; Thu, 18 Nov 2010 11:57:39 -0800 (PST)
Date: Thu, 18 Nov 2010 14:57:39 -0500
Message-ID: <AANLkTik-ra2+P6G2gY_5i+JBdWoQgSuVd8jymQwJAUfO@mail.gmail.com>
From: grarpamp <grarpamp@gmail.com>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Subject: Oracle Solaris 11 Express - ZFS
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Nov 2010 20:26:59 -0000

Is this release the first example of their stated[?] plan in action
to be releasing source code only in step with releases for things
like ZFS to the community (IllumOS, BSD's, etc)?
Particularly regarding diff and crypto (and maybe some
dedup and compression stuff) as now present in this release.
Not a request, just is this what we're seeing?
Or about to see released?

http://www.oracle.com/technetwork/server-storage/solaris11/downloads/index.html
http://blogs.sun.com/darren/
http://www.oracle.com/technetwork/server-storage/solaris11/documentation/solaris11techcollateral-178132.html

From owner-freebsd-fs@FreeBSD.ORG  Thu Nov 18 22:13:52 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D99D61065673
	for <fs@freebsd.org>; Thu, 18 Nov 2010 22:13:52 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id 4850E8FC08
	for <fs@freebsd.org>; Thu, 18 Nov 2010 22:13:51 +0000 (UTC)
Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua
	[10.1.1.148])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id oAIM2I8U079782
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <fs@freebsd.org>; Fri, 19 Nov 2010 00:02:18 +0200 (EET)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id
	oAIM2III026072
	for <fs@freebsd.org>; Fri, 19 Nov 2010 00:02:18 +0200 (EET)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id oAIM2I7a026071
	for fs@freebsd.org; Fri, 19 Nov 2010 00:02:18 +0200 (EET)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Fri, 19 Nov 2010 00:02:18 +0200
From: Kostik Belousov <kostikbel@gmail.com>
To: fs@freebsd.org
Message-ID: <20101118220218.GM2392@deviant.kiev.zoral.com.ua>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="MHmrXEmXDnZIcfiz"
Content-Disposition: inline
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-2.7 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_05,
	DNS_FROM_OPENWHOIS autolearn=no version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: 
Subject: prtactive
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Nov 2010 22:13:53 -0000


--MHmrXEmXDnZIcfiz
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Did anybody on the list used or consider it useful the printf()s
in miscellaneous VOP_INACTIVE and VOP_RECLAIM implementations ?
I plan to axe prtactive and corresponding printfs that seems to
be never activated on my memory. The situation where the
message would be emited is completely legitimate (forced unmount).

I will commit the following patch, unless valid reason to not is given.
diff --git a/sys/fs/cd9660/cd9660_node.c b/sys/fs/cd9660/cd9660_node.c
index 64d449e..e1c68c6 100644
--- a/sys/fs/cd9660/cd9660_node.c
+++ b/sys/fs/cd9660/cd9660_node.c
@@ -69,9 +69,6 @@ cd9660_inactive(ap)
 	struct iso_node *ip =3D VTOI(vp);
 	int error =3D 0;
=20
-	if (prtactive && vrefcnt(vp) !=3D 0)
-		vprint("cd9660_inactive: pushing active", vp);
-
 	/*
 	 * If we are done with the inode, reclaim it
 	 * so that it can be reused immediately.
@@ -93,8 +90,6 @@ cd9660_reclaim(ap)
 {
 	struct vnode *vp =3D ap->a_vp;
=20
-	if (prtactive && vrefcnt(vp) !=3D 0)
-		vprint("cd9660_reclaim: pushing active", vp);
 	/*
 	 * Destroy the vm object and flush associated pages.
 	 */
diff --git a/sys/fs/coda/coda_vnops.c b/sys/fs/coda/coda_vnops.c
index 02f6eb5..51fb307 100644
--- a/sys/fs/coda/coda_vnops.c
+++ b/sys/fs/coda/coda_vnops.c
@@ -1549,9 +1549,6 @@ coda_reclaim(struct vop_reclaim_args *ap)
 				    "%p, cp %p\n", vp, cp);
 		}
 #endif
-	} else {
-		if (prtactive && vp->v_usecount !=3D 0)
-			vprint("coda_reclaim: pushing active", vp);
 	}
 	cache_purge(vp);
 	coda_free(VTOC(vp));
diff --git a/sys/fs/ext2fs/ext2_inode.c b/sys/fs/ext2fs/ext2_inode.c
index 2cf60a7..fc65a63 100644
--- a/sys/fs/ext2fs/ext2_inode.c
+++ b/sys/fs/ext2fs/ext2_inode.c
@@ -481,9 +481,6 @@ ext2_inactive(ap)
 	struct thread *td =3D ap->a_td;
 	int mode, error =3D 0;
=20
-	if (prtactive && vrefcnt(vp) !=3D 0)
-		vprint("ext2_inactive: pushing active", vp);
-
 	/*
 	 * Ignore inodes related to stale file handles.
 	 */
@@ -522,8 +519,6 @@ ext2_reclaim(ap)
 	struct inode *ip;
 	struct vnode *vp =3D ap->a_vp;
=20
-	if (prtactive && vrefcnt(vp) !=3D 0)
-		vprint("ufs_reclaim: pushing active", vp);
 	ip =3D VTOI(vp);
 	if (ip->i_flag & IN_LAZYMOD) {
 		ip->i_flag |=3D IN_MODIFIED;
diff --git a/sys/fs/hpfs/hpfs_vnops.c b/sys/fs/hpfs/hpfs_vnops.c
index 4ec6b1e..3859478 100644
--- a/sys/fs/hpfs/hpfs_vnops.c
+++ b/sys/fs/hpfs/hpfs_vnops.c
@@ -575,9 +575,6 @@ hpfs_inactive(ap)
 			return (error);
 	}
=20
-	if (prtactive && vrefcnt(vp) !=3D 0)
-		vprint("hpfs_inactive: pushing active", vp);
-
 	if (hp->h_flag & H_INVAL) {
 		vrecycle(vp, ap->a_td);
 		return (0);
diff --git a/sys/fs/msdosfs/msdosfs_denode.c b/sys/fs/msdosfs/msdosfs_denod=
e.c
index 9ad892e..84b52ba 100644
--- a/sys/fs/msdosfs/msdosfs_denode.c
+++ b/sys/fs/msdosfs/msdosfs_denode.c
@@ -548,8 +548,6 @@ msdosfs_reclaim(ap)
 	    dep, dep->de_Name, dep->de_refcnt);
 #endif
=20
-	if (prtactive && vrefcnt(vp) !=3D 0)
-		vprint("msdosfs_reclaim(): pushing active", vp);
 	/*
 	 * Destroy the vm object and flush associated pages.
 	 */
@@ -586,9 +584,6 @@ msdosfs_inactive(ap)
 	printf("msdosfs_inactive(): dep %p, de_Name[0] %x\n", dep, dep->de_Name[0=
]);
 #endif
=20
-	if (prtactive && vrefcnt(vp) !=3D 0)
-		vprint("msdosfs_inactive(): pushing active", vp);
-
 	/*
 	 * Ignore denodes related to stale file handles.
 	 */
diff --git a/sys/fs/nfsclient/nfs_clnode.c b/sys/fs/nfsclient/nfs_clnode.c
index 430b494..01e1919 100644
--- a/sys/fs/nfsclient/nfs_clnode.c
+++ b/sys/fs/nfsclient/nfs_clnode.c
@@ -190,8 +190,6 @@ ncl_inactive(struct vop_inactive_args *ap)
 	struct vnode *vp =3D ap->a_vp;
=20
 	np =3D VTONFS(vp);
-	if (prtactive && vrefcnt(vp) !=3D 0)
-		vprint("ncl_inactive: pushing active", vp);
=20
 	if (NFS_ISV4(vp) && vp->v_type =3D=3D VREG) {
 		/*
@@ -233,9 +231,6 @@ ncl_reclaim(struct vop_reclaim_args *ap)
 	struct nfsnode *np =3D VTONFS(vp);
 	struct nfsdmap *dp, *dp2;
=20
-	if (prtactive && vrefcnt(vp) !=3D 0)
-		vprint("ncl_reclaim: pushing active", vp);
-
 	if (NFS_ISV4(vp) && vp->v_type =3D=3D VREG)
 		/*
 		 * Since mmap()'d files do I/O after VOP_CLOSE(), the NFSv4
diff --git a/sys/fs/ntfs/ntfs_vnops.c b/sys/fs/ntfs/ntfs_vnops.c
index ee62a5c..6970474 100644
--- a/sys/fs/ntfs/ntfs_vnops.c
+++ b/sys/fs/ntfs/ntfs_vnops.c
@@ -82,8 +82,6 @@ static vop_fsync_t	ntfs_fsync;
 static vop_pathconf_t	ntfs_pathconf;
 static vop_vptofh_t	ntfs_vptofh;
=20
-int	ntfs_prtactive =3D 1;	/* 1 =3D> print out reclaim of active vnodes */
-
 /*
  * This is a noop, simply returning what one has been given.
  */
@@ -214,15 +212,12 @@ ntfs_inactive(ap)
 		struct vnode *a_vp;
 	} */ *ap;
 {
-	register struct vnode *vp =3D ap->a_vp;
 #ifdef NTFS_DEBUG
-	register struct ntnode *ip =3D VTONT(vp);
+	register struct ntnode *ip =3D VTONT(ap->a_vp);
 #endif
=20
-	dprintf(("ntfs_inactive: vnode: %p, ntnode: %d\n", vp, ip->i_number));
-
-	if (ntfs_prtactive && vrefcnt(vp) !=3D 0)
-		vprint("ntfs_inactive: pushing active", vp);
+	dprintf(("ntfs_inactive: vnode: %p, ntnode: %d\n", ap->a_vp,
+	    ip->i_number));
=20
 	/* XXX since we don't support any filesystem changes
 	 * right now, nothing more needs to be done
@@ -246,9 +241,6 @@ ntfs_reclaim(ap)
=20
 	dprintf(("ntfs_reclaim: vnode: %p, ntnode: %d\n", vp, ip->i_number));
=20
-	if (ntfs_prtactive && vrefcnt(vp) !=3D 0)
-		vprint("ntfs_reclaim: pushing active", vp);
-
 	/*
 	 * Destroy the vm object and flush associated pages.
 	 */
diff --git a/sys/gnu/fs/reiserfs/reiserfs_inode.c b/sys/gnu/fs/reiserfs/rei=
serfs_inode.c
index 46edbf4..b63ed74 100644
--- a/sys/gnu/fs/reiserfs/reiserfs_inode.c
+++ b/sys/gnu/fs/reiserfs/reiserfs_inode.c
@@ -114,8 +114,6 @@ reiserfs_inactive(struct vop_inactive_args *ap)
=20
 	reiserfs_log(LOG_DEBUG, "deactivating inode used %d times\n",
 	    vp->v_usecount);
-	if (prtactive && vrefcnt(vp) !=3D 0)
-		vprint("ReiserFS/reclaim: pushing active", vp);
=20
 #if 0
 	/* Ignore inodes related to stale file handles. */
@@ -147,8 +145,6 @@ reiserfs_reclaim(struct vop_reclaim_args *ap)
=20
 	reiserfs_log(LOG_DEBUG, "reclaiming inode used %d times\n",
 	    vp->v_usecount);
-	if (prtactive && vrefcnt(vp) !=3D 0)
-		vprint("ReiserFS/reclaim: pushing active", vp);
 	ip =3D VTOI(vp);
=20
 	/* XXX Update this node (write to the disk) */
diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c
index 4a4cef1..b032513 100644
--- a/sys/kern/vfs_subr.c
+++ b/sys/kern/vfs_subr.c
@@ -191,9 +191,6 @@ struct nfs_public nfs_pub;
 static uma_zone_t vnode_zone;
 static uma_zone_t vnodepoll_zone;
=20
-/* Set to 1 to print out reclaim of active vnodes */
-int	prtactive;
-
 /*
  * The workitem queue.
  *
diff --git a/sys/nfsclient/nfs_node.c b/sys/nfsclient/nfs_node.c
index 5b87bd7..5b43b3d 100644
--- a/sys/nfsclient/nfs_node.c
+++ b/sys/nfsclient/nfs_node.c
@@ -193,8 +193,6 @@ nfs_inactive(struct vop_inactive_args *ap)
 	struct thread *td =3D curthread;	/* XXX */
=20
 	np =3D VTONFS(ap->a_vp);
-	if (prtactive && vrefcnt(ap->a_vp) !=3D 0)
-		vprint("nfs_inactive: pushing active", ap->a_vp);
 	mtx_lock(&np->n_mtx);
 	if (ap->a_vp->v_type !=3D VDIR) {
 		sp =3D np->n_sillyrename;
@@ -228,9 +226,6 @@ nfs_reclaim(struct vop_reclaim_args *ap)
 	struct nfsnode *np =3D VTONFS(vp);
 	struct nfsdmap *dp, *dp2;
=20
-	if (prtactive && vrefcnt(vp) !=3D 0)
-		vprint("nfs_reclaim: pushing active", vp);
-
 	/*
 	 * If the NLM is running, give it a chance to abort pending
 	 * locks.
diff --git a/sys/sys/vnode.h b/sys/sys/vnode.h
index e82f8ea..86ff8b6 100644
--- a/sys/sys/vnode.h
+++ b/sys/sys/vnode.h
@@ -410,7 +410,6 @@ extern	struct vnode *rootvnode;	/* root (i.e. "/") vnod=
e */
 extern	int async_io_version;		/* 0 or POSIX version of AIO i'face */
 extern	int desiredvnodes;		/* number of vnodes desired */
 extern	struct uma_zone *namei_zone;
-extern	int prtactive;			/* nonzero to call vprint() */
 extern	struct vattr va_null;		/* predefined null vattr structure */
=20
 #define	VI_LOCK(vp)	mtx_lock(&(vp)->v_interlock)
diff --git a/sys/ufs/ufs/ufs_inode.c b/sys/ufs/ufs/ufs_inode.c
index a281ae5..e21092c 100644
--- a/sys/ufs/ufs/ufs_inode.c
+++ b/sys/ufs/ufs/ufs_inode.c
@@ -80,8 +80,6 @@ ufs_inactive(ap)
 	struct mount *mp;
=20
 	mp =3D NULL;
-	if (prtactive && vp->v_usecount !=3D 0)
-		vprint("ufs_inactive: pushing active", vp);
 	/*
 	 * Ignore inodes related to stale file handles.
 	 */
@@ -191,8 +189,6 @@ ufs_reclaim(ap)
 	int i;
 #endif
=20
-	if (prtactive && vp->v_usecount !=3D 0)
-		vprint("ufs_reclaim: pushing active", vp);
 	/*
 	 * Destroy the vm object and flush associated pages.
 	 */

--MHmrXEmXDnZIcfiz
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEARECAAYFAkzlomkACgkQC3+MBN1Mb4h9pgCfeRenisTuDAtQV6FEpcJZSRhX
eXMAoOPxCwXa/MDdfHsNGxJwLxg7FE7f
=qmSK
-----END PGP SIGNATURE-----

--MHmrXEmXDnZIcfiz--