From owner-freebsd-stable@FreeBSD.ORG  Mon Jul 19 16:15:23 2010
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 638851065674
	for <freebsd-stable@freebsd.org>; Mon, 19 Jul 2010 16:15:23 +0000 (UTC)
	(envelope-from fjwcash@gmail.com)
Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com
	[209.85.214.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 290BF8FC0A
	for <freebsd-stable@freebsd.org>; Mon, 19 Jul 2010 16:15:22 +0000 (UTC)
Received: by iwn35 with SMTP id 35so5970339iwn.13
	for <freebsd-stable@freebsd.org>; Mon, 19 Jul 2010 09:15:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:in-reply-to
	:references:date:message-id:subject:from:to:content-type;
	bh=yOtogE4122l2OxVnolaUJ8h5ql8ONfEXwx7/qo3920Q=;
	b=S39FJK/sDOuX6GscMEtip/luh6B3WPIkh+Sbp2L2yyJG1Q95cZRNG8az0lF8rzMQgX
	8sZDVu/EQkrOr+ZfRxm6Rk8Ds9PtBGWGh4bdti7XAyev6HlUGiTy+31guyL0XoXzum8K
	28KBcYjtu1oapi9AOMUtSt0sCJEffY0O6OOsY=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:content-type;
	b=Ak4wMBUPFHl9n5WQFg745CErZQrG03YKh/KtVpRy1A51Q5MFradUqwsALNj9RqcMI9
	KPJF0C9xDs55UKw1smAB9s1U8+adqrvr3LRYvyNK9JDCOBnWEsOzS+UQEz8DmWw02KdX
	vgp51dEQr8NOZ/8YRkJ+RVqVOGNBaQKsrxi7o=
MIME-Version: 1.0
Received: by 10.231.144.15 with SMTP id x15mr5356517ibu.73.1279556122433; Mon, 
	19 Jul 2010 09:15:22 -0700 (PDT)
Received: by 10.231.161.208 with HTTP; Mon, 19 Jul 2010 09:15:21 -0700 (PDT)
In-Reply-To: <AANLkTikPOgIqkm3GhIsEnvuvEHvlc44cnh6GJQ1k7Ja_@mail.gmail.com>
References: <AANLkTimOrwHe7xiwoap2H2mUtA7vU6TjENkPC4yJ02_z@mail.gmail.com>
	<AANLkTimOIgCIO4txpPeeoMrRSYAqM25V7cm-h7djmZUC@mail.gmail.com>
	<AANLkTikPOgIqkm3GhIsEnvuvEHvlc44cnh6GJQ1k7Ja_@mail.gmail.com>
Date: Mon, 19 Jul 2010 09:15:21 -0700
Message-ID: <AANLkTillT4yA5EJtcFUyhCUtD7b14u1n7svv02Y2IcqL@mail.gmail.com>
From: Freddie Cash <fjwcash@gmail.com>
To: freebsd-stable <freebsd-stable@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Subject: Re: Problems replacing failing drive in ZFS pool
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 19 Jul 2010 16:15:23 -0000

On Mon, Jul 19, 2010 at 8:56 AM, Garrett Moore <garrettmoore@gmail.com> wrote:
> So you think it's because when I switch from the old disk to the new disk,
> ZFS doesn't realize the disk has changed, and thinks the data is just
> corrupt now? Even if that happens, shouldn't the pool still be available,
> since it's RAIDZ1 and only one disk has gone away?

I think it's because you pull the old drive, boot with the new drive,
the controller re-numbers all the devices (ie da3 is now da2, da2 is
now da1, da1 is now da0, da0 is now da6, etc), and ZFS thinks that all
the drives have changed, thus corrupting the pool.  I've had this
happen on our storage servers a couple of times before I started using
glabel(8) on all our drives (dead drive on RAID controller, remove
drive, reboot for whatever reason, all device nodes are renumbered,
everything goes kablooey).

Doing the export and import will force ZFS to re-read the metadata on
the drives (ZFS does it's own "labelling" to say which drives belong
to which vdevs), and to pick things up correctly using the new device
nodes.

> I don't have / on ZFS; I'm only using it as a 'data' partition, so I should
> be able to try your suggestion. My only concern: is there any risk of
> trashing my pool if I try your instructions? Everything I've done so far,
> even when told "insufficient replicas / corrupt data", has not cost me any
> data as long as I switch back to the original (dying) drive. If I mix in
> export/import statements which might 'touch' the pool, is there a chance it
> will choke and trash my data?

Well, there's always a chance things explode.  :)  But an
export/import is safe so long as all drives are connected at the time.
 I've recovered "corrupted" pools by doing the above.  (I've now
switched to labelling all my drives to prevent this from happening.)

Of course, always have good backups.  ;)

-- 
Freddie Cash
fjwcash@gmail.com