From owner-freebsd-current@FreeBSD.ORG  Tue Aug 28 20:57:09 2007
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: current@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B4E8616A419
	for <current@FreeBSD.org>; Tue, 28 Aug 2007 20:57:09 +0000 (UTC)
	(envelope-from pjd@garage.freebsd.pl)
Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl
	[83.17.198.132])
	by mx1.freebsd.org (Postfix) with ESMTP id 1260D13C46A
	for <current@FreeBSD.org>; Tue, 28 Aug 2007 20:57:08 +0000 (UTC)
	(envelope-from pjd@garage.freebsd.pl)
Received: by mail.garage.freebsd.pl (Postfix, from userid 65534)
	id 05E2445E91; Tue, 28 Aug 2007 22:57:07 +0200 (CEST)
Received: from localhost (154.81.datacomsa.pl [195.34.81.154])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.garage.freebsd.pl (Postfix) with ESMTP id 4F17F45683;
	Tue, 28 Aug 2007 22:57:01 +0200 (CEST)
Date: Tue, 28 Aug 2007 22:55:55 +0200
From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
To: Bakul Shah <bakul@bitblocks.com>
Message-ID: <20070828205554.GI39562@garage.freebsd.pl>
References: <20070828180228.GD39562@garage.freebsd.pl>
	<20070828204834.9A7F85B3B@mail.bitblocks.com>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="jaoouwwPWoQSJZYp"
Content-Disposition: inline
In-Reply-To: <20070828204834.9A7F85B3B@mail.bitblocks.com>
User-Agent: Mutt/1.4.2.3i
X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc
X-OS: FreeBSD 7.0-CURRENT i386
X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on 
	mail.garage.freebsd.pl
X-Spam-Level: 
X-Spam-Status: No, score=-2.6 required=3.0 tests=BAYES_00 autolearn=ham 
	version=3.0.4
Cc: current@FreeBSD.org, Pascal Hofstee <caelian@gmail.com>
Subject: Re: ZFS kernel panic
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Aug 2007 20:57:09 -0000


--jaoouwwPWoQSJZYp
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Aug 28, 2007 at 01:48:34PM -0700, Bakul Shah wrote:
> Pawel Jakub Dawidek <pjd@FreeBSD.org> wrote:
> > On Tue, Aug 28, 2007 at 10:02:42AM -0700, Bakul Shah wrote:
> > > > When you don't use redundant configuration (no mirror, no raidz, no
> > > > copies>1) then ZFS is going to panic on a write failure. It looks l=
ike
> > > > ZFS found a bad block on your disk.
> > >
> > > Does SUN really say this about ZFS?  Is this acceptable in a
> > > production environment?  What if one of your mirrored disk
> > > fails and in the "degraded" environment (before you have had
> > > a chance to replace the bad disk) ZFS discovers that a write
> > > fails?  Why can't it find an alternative block to write to?
> >=20
> > There were many complains on zfs-discuss@, you may want to look into
> > archive. The short version is that many users doesn't like that, and it
> > should change in the future - because of COW model it should be quite
> > easy to just mark block as bad and take next one, but it's not currently
> > implemented. It's much less of a problem when one uses redundancy.
>=20
> Good to know others are complaining too :-)
>=20
> My real concern is the panic.  This situation may be rare if
> using redundancy + regular scrubbing, but it can definitely
> occur.  And as long as non redundant ZFS is *allowed*, you
> pretty much have to deal with it without any panicking.
>=20
> Originally panic() was used to indicate that some *system
> invariant* has been violated.  That either meant a hardware
> error or an unknown software error but in any case some data
> structure was likely corrupted and continuing can make
> matters worse.  But that is not the case here (in general).
> zfs does not have the appropriate information to be able to
> decide whether the write error is fatal.
>=20
> The simplest thing to do in case of a write error is to
> simply ignore it.  You *will* catch this problem when you try
> to read this block.  One step better is to do what you
> suggest.

You can't ignore write error, because application already assumed the
write succeeded, which can lead to misbehaviour later. ZFS cannot yet
handle write error, so it panics to preserve data consistency. This is
the good reaction on ZFS side until skipping bad blocks is not
implemented.

> What happens now when you do use redundancy and there is a
> write error while writing one of the copies?  Does the system
> panic or is this error ignored?

Don't remember off hand, but component is probably marked as bad and
vdev group goes to degraded state. You can simulate this easly with
gnop(8).

--=20
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--jaoouwwPWoQSJZYp
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFG1IvaForvXbEpPzQRAi20AKD3Ag5xU8Sauqi5CWQM72UdzByhZACgoQLK
mZkoeg+REgUuqBhakNAVz8w=
=kA5l
-----END PGP SIGNATURE-----

--jaoouwwPWoQSJZYp--