From owner-freebsd-fs@FreeBSD.ORG  Mon Sep 28 17:29:20 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7EBB1106566B
	for <freebsd-fs@freebsd.org>; Mon, 28 Sep 2009 17:29:20 +0000 (UTC)
	(envelope-from ktouet@gmail.com)
Received: from mail-gx0-f214.google.com (mail-gx0-f214.google.com
	[209.85.217.214])
	by mx1.freebsd.org (Postfix) with ESMTP id 3D29C8FC14
	for <freebsd-fs@freebsd.org>; Mon, 28 Sep 2009 17:29:20 +0000 (UTC)
Received: by gxk6 with SMTP id 6so2607066gxk.13
	for <freebsd-fs@freebsd.org>; Mon, 28 Sep 2009 10:29:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:in-reply-to:references
	:date:message-id:subject:from:to:content-type;
	bh=7vMJFk8K8vQeTJlZNIBcmGYYE1csUVUMHwhXVivf+UA=;
	b=ot8C2e9Xtj2ZAgBxljzl7X9nJssuCzFg2EDph9eBDFYaYoM57KR//KDayiXSlF+fuT
	gu9FyO/Od4ctmXdZwU3rLGgiKiqxXUkFslAGujhzOOta6e+eFC2uLBsPiZQJk7wBGWVu
	7qP0jfhXd62tXCYtKA7bZRe972NrR8ETnndYk=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:content-type;
	b=VypD2tjjF1h099c4ntYlgQQGwxxjgxP39XtSbT+JDutzoYHt13tog5LskiQFn6v/+z
	arMOOBhc2vpkIbfGpCwkyQlQF1y6Vz06s38C4aMbsgpC0cna0M8cnqZf/xm+ApOTYD9O
	sPQn1fBknq/kOwyFMCIYHHjloaE3yz9MhadYI=
MIME-Version: 1.0
Received: by 10.90.182.20 with SMTP id e20mr3067364agf.106.1254158959354; Mon, 
	28 Sep 2009 10:29:19 -0700 (PDT)
In-Reply-To: <2a5e326f0909201500w1513aeb5ra644f1c748e22f34@mail.gmail.com>
References: <2a5e326f0909201500w1513aeb5ra644f1c748e22f34@mail.gmail.com>
Date: Mon, 28 Sep 2009 11:29:19 -0600
Message-ID: <2a5e326f0909281029p17334ceeoff4bb3e7adeb5cef@mail.gmail.com>
From: Kurt Touet <ktouet@gmail.com>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Subject: Re: ZFS - Unable to offline drive in raidz1 based pool
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Sep 2009 17:29:20 -0000

I've run into a similar experience again with my zfs raidz1 array
reporting itself as healthy when it's not.  This, again, was after
some drive spin_retry_count errors (and a power cycle when unable to
shutdown -h).  The pattern goes as follows:

1) A hard drive in the zfs array (for whatever reason) repeatedly
times out.. in this case, generating spin_retry_count errors in the
smart status.
2) The box is semi-frozen because it cannot deal with activity on the
zfs array, so it won't gracefully shutdown -h now.
3) The box is power cycled.
4) Everything spins up fine on the box, the array is now accessible.
5) zpool status - shows the array as online with no degraded status
6) zpool scrub - shows the drives to be desynced and resilvers a couple of them
7) presumably, everything is fine

monolith# zpool status
  pool: storage
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad14    ONLINE       0     0     0
            ad6     ONLINE       0     0     0
            ad12    ONLINE       0     0     0
            ad4     ONLINE       0     0     0
        spares
          ad22      AVAIL

errors: No known data errors
monolith# zpool scrub storage
monolith# zpool status
  pool: storage
 state: ONLINE
 scrub: resilver completed after 0h0m with 0 errors on Mon Sep 28 11:17:05 2009
config:

        NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad14    ONLINE       0     0     0  1.17M resilvered
            ad6     ONLINE       0     0     0  1.50K resilvered
            ad12    ONLINE       0     0     0  2K resilvered
            ad4     ONLINE       0     0     0  2K resilvered
        spares
          ad22      AVAIL

errors: No known data errors


So, my question still stands.. how does zfs upon scrubbing, instantly
know that the drives need to be resilvered (it completes in a few
seconds), but previous declares the array to be fine with no known
date errors?

Cheers,
-kurt