From owner-freebsd-questions@FreeBSD.ORG  Mon Jun 13 16:22:47 2011
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 762BC106564A
	for <freebsd-questions@freebsd.org>;
	Mon, 13 Jun 2011 16:22:47 +0000 (UTC)
	(envelope-from howie@thingy.com)
Received: from post1.inband.network-i.net (tobago.network-i.net [212.21.96.30])
	by mx1.freebsd.org (Postfix) with SMTP id E5DA68FC08
	for <freebsd-questions@freebsd.org>;
	Mon, 13 Jun 2011 16:22:46 +0000 (UTC)
Received: (qmail 41404 invoked from network); 13 Jun 2011 15:50:54 -0000
Received: from unknown (HELO ?10.1.1.188?) (212.21.99.52)
	by post2.inband.network-i.net with SMTP; 13 Jun 2011 15:50:54 -0000
Message-ID: <4DF63314.3000807@thingy.com>
Date: Mon, 13 Jun 2011 16:56:04 +0100
From: Howard Jones <howie@thingy.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB;
	rv:1.9.2.17) Gecko/20110414 Lightning/1.0b2 Thunderbird/3.1.10
MIME-Version: 1.0
To: freebsd-questions@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: ZFS on 8.1 - various problems after a disk failure.
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Jun 2011 16:22:47 -0000

I have a FreeBSD 8.2 server at home with 4 2TB drives in it running ZFS
with a raidz pool. Some time ago, I had a disk fail. Initially it wasn't
totally obvious the disk had failed so I ran a 'zpool scrub' on the
pool, which threw up a lot of errors, and also produced a lot of sense
errors, making it obvious I had a dead disk.

I replaced the disk, then ran "zpool replace zjumbo ad4 ad4" to replace
the bad disk in-place, and start a resilver.

Now I have a few problems:
1) The old ad4 is still listed, even after several scrub/resilvers.
Shouldn't it go away?
2) Although I lost a whole directory with ~1TB of music, the space
allocated to that directory is still around according df.
3) I have another bunch of files that appear in directory listings, but
if I get "Illegal byte sequence" errors when trying to read them (with
anything - du, file, wc).

I have backups of most of the stuff on the pool (although it'd be nice
to recover the more recent data), but how do I get out of this situation
without nuking the site from orbit? (my current plan) Firstly, to get a
reliable representation of what's actually on the filesystem, and for
bonus points, getting back some of the data that should be intact (only
one disk in the set was actually bad, right?).

Here's my current zpool status. Thanks in advance for any pointers!

Howie

# zpool status
  pool: zjumbo
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed after 10h57m with 15190 errors on Thu May 19
09:26:59 2011
config:

        NAME           STATE     READ WRITE CKSUM
        zjumbo         DEGRADED     0     0  199K
          raidz1       DEGRADED     0     0  792K
            replacing  DEGRADED     0     0     0
              ad4/old  UNAVAIL      0 16.1M     0  cannot open
              ad4      ONLINE       0     0     0  1.15T resilvered
            ad6        ONLINE       0     0     0  677M resilvered
            ad8        ONLINE       0     0     0  660M resilvered
            ad10       ONLINE       0     0     0  535M resilvered

errors: 15190 data errors, use '-v' for a list