From owner-freebsd-fs@FreeBSD.ORG  Tue Dec 27 21:15:01 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6382A106564A
	for <freebsd-fs@freebsd.org>; Tue, 27 Dec 2011 21:15:01 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta02.emeryville.ca.mail.comcast.net
	(qmta02.emeryville.ca.mail.comcast.net [76.96.30.24])
	by mx1.freebsd.org (Postfix) with ESMTP id 356FE8FC0A
	for <freebsd-fs@freebsd.org>; Tue, 27 Dec 2011 21:15:00 +0000 (UTC)
Received: from omta10.emeryville.ca.mail.comcast.net ([76.96.30.28])
	by qmta02.emeryville.ca.mail.comcast.net with comcast
	id EM8t1i0030cQ2SLA2MEuhw; Tue, 27 Dec 2011 21:14:54 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta10.emeryville.ca.mail.comcast.net with comcast
	id EMBa1i00r1t3BNj8WMBbPh; Tue, 27 Dec 2011 21:11:36 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id A92FE102C19; Tue, 27 Dec 2011 13:14:58 -0800 (PST)
Date: Tue, 27 Dec 2011 13:14:58 -0800
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Johannes Totz <johannes@jo-t.de>
Message-ID: <20111227211458.GA21839@icarus.home.lan>
References: <jc7nq1$8nu$1@dough.gmane.org>
	<4EE764DA.4030206@brockmann-consult.de>
	<jc7ote$hc0$1@dough.gmane.org> <jdcs8d$qcg$1@dough.gmane.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <jdcs8d$qcg$1@dough.gmane.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: zpool failmode=continue
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Dec 2011 21:15:01 -0000

On Tue, Dec 27, 2011 at 04:37:32PM +0000, Johannes Totz wrote:
> On 13/12/2011 14:53, Johannes Totz wrote:
> >On 13/12/2011 14:44, Peter Maloney wrote:
> >>Are you using NFS or ZVOLs?
> >
> >Neither, see below.
> >
> >>My zfs hangs (all IO) if I go into the .zfs/snapshots directory over
> >>NFS. (planning to file a PR after I find a way to reproduce it reliably,
> >>but it depends on specific snapshots). My workaround is to mount
> >>/var/empty on top of the .zfs directory on the nfs client, and give
> >>nobody else access. Another workaround I thought of is to have another
> >>parent directory in the dataset, and share the 2nd level down which
> >>doesn't contain the .zfs directory.
> >
> >My pool is not exported to any clients. My situation is actually the
> >other way around, should have been more clear: the block device on which
> >I created the pool is a on the network.
> >It's kind of a crazy setup:
> >- sshfs to another (Linux) machine
> >- create big image file
> >- create pool from file vdev mounted via sshfs
> >Eventually the network drops out, zpool shows read and write errors,
> >fine so far. But all new io just hangs instead of failing with an error.
> 
> After some observation, turns out that
> periodic/security/100.chksetuid makes all i/o die on the test pool.
> Is find doing something funny? As it does not even search around on
> the testpool (it's imported but not mounted) nor the sshfs (only ufs
> and zfs is searched) I don't have any clue as to what might go
> wrong...
> zpool status simply mentions read/write errors.
> 
> I noticed this because when logging iostat to a file, i/o always
> stopped at 3am. But I can also trigger it by simply running
> 100.chksetuid. All the other stuff in daily and security is fine.
> 
> Anybody has any idea what might cause it?

This ""problem"" (note the quotes) has been brought up before.  There
isn't anything wrong with the periodic script; if you look at the
script, you'll see that it's ""heavy"" on I/O due to all of the
operations being done:

        find -sx $MP /dev/null -type f \
            \( -perm -u+x -or -perm -g+x -or -perm -o+x \) \
            \( -perm -u+s -or -perm -g+s \) -exec ls -liTd \{\} \+ |
        check_diff setuid - "${host} setuid diffs:"

This is going to traverse the filesystem and do a couple stat(2) calls
(I assume find(1) is smart enough to consolidate them into 1 or 2 at
most), plus there's the -exec call on every single result (pretty sure
one cannot use xargs in this case given the nature of what's being
done).

I can try to dig up those threads for you, but I'm sure if you search
mailing lists for "100.chksetuid zfs" you'll see.

ZFS tends to "bring to light" underlying issues with hardware, as it
stresses the system a lot more than UFS would.  For example, folks using
mps(4) (I think; trying to remember which LSI driver) were having
problems for a while and "fixes to make ZFS happy" were committed to the
driver.  You get the idea I hope.

Your statement here:

> zpool status simply mentions read/write errors.

Acts as pretty much confirmation of this fact.

You're going to need to provide a verbose description of your setup,
including all storage controllers that are used/associated with ZFS,
every disk involved (model string would be helpful), if possible SMART
statistics for each disk (smartctl -A), if you have a heterogeneous
setup (ZFS on some disks, UFS on others), make.conf, loader.conf,
sysctl.conf, full "zpool status" output, uname -a, full "dmesg" output,
etc..  I make no promises that there's any solution to this as well.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |