From owner-freebsd-fs@FreeBSD.ORG  Tue Feb  1 15:17:59 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 91D811065695
	for <freebsd-fs@freebsd.org>; Tue,  1 Feb 2011 15:17:59 +0000 (UTC)
	(envelope-from mike@sentex.net)
Received: from smarthost1.sentex.ca (smarthost1-6.sentex.ca
	[IPv6:2607:f3e0:0:1::12])
	by mx1.freebsd.org (Postfix) with ESMTP id 308638FC24
	for <freebsd-fs@freebsd.org>; Tue,  1 Feb 2011 15:17:59 +0000 (UTC)
Received: from [IPv6:2607:f3e0:0:4:f025:8813:7603:7e4a] (saphire3.sentex.ca
	[IPv6:2607:f3e0:0:4:f025:8813:7603:7e4a])
	by smarthost1.sentex.ca (8.14.4/8.14.4) with ESMTP id p11FHrVr005603
	(version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
	Tue, 1 Feb 2011 10:17:53 -0500 (EST) (envelope-from mike@sentex.net)
Message-ID: <4D48241B.2040807@sentex.net>
Date: Tue, 01 Feb 2011 10:17:47 -0500
From: Mike Tancsa <mike@sentex.net>
Organization: Sentex Communications
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US;
	rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7
MIME-Version: 1.0
To: Adam Vande More <amvandemore@gmail.com>, freebsd-fs@freebsd.org
References: <4D43475D.5050008@sentex.net>	<4D44D775.50507@jrv.org>	<4D470A65.4050000@sentex.net>	<AANLkTi=Z=Onduz9uMuoRgJNXEUJeNKU+Ww=Rgi8TP2tP@mail.gmail.com>	<4D471729.3050804@sentex.net>
	<AANLkTi=3Betpki=uDkH7vc0jNOEOuT7R5pphCzUROH-O@mail.gmail.com>
In-Reply-To: <AANLkTi=3Betpki=uDkH7vc0jNOEOuT7R5pphCzUROH-O@mail.gmail.com>
X-Enigmail-Version: 1.1.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Scanned-By: MIMEDefang 2.67 on IPv6:2607:f3e0:0:1::12
Cc: 
Subject: Re: ZFS help! (solved)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Feb 2011 15:17:59 -0000

On 1/31/2011 3:32 PM, Adam Vande More wrote:
>>
> 
> maybe the meta data stuff is stored above it in /tank1/?  I don't know.  I'm
> pretty sure you can use a newer version of ZFS to rewind the transaction
> groups until you get back to a good state, but there's probably a lot in
> this scenario that would prevent that from being a viable solution.  If you
> do get it resolved please post the resolution.

OK, to summarize what happened for the archives.  This is RELENG_8 (from
end of Jan, on AMD64, 8G of RAM)

On my DR backup server that has backups of backups, I decided to expand
an existing pool.  I added a new eSata cage with integrated PM

2011-01-28.11:45:43 zpool add tank1 raidz /dev/ada0 /dev/ada1 /dev/ada2
/dev/ada3

0(offsite)# camcontrol devlist
<WDC WD1001FALS-00J7B1 05.00K05>   at scbus0 target 0 lun 0 (pass0,ada0)
<WDC WD1001FALS-00J7B1 05.00K05>   at scbus0 target 1 lun 0 (pass1,ada1)
<WDC WD1001FALS-00J7B1 05.00K05>   at scbus0 target 2 lun 0 (pass2,ada2)
<WDC WD1001FALS-00J7B1 05.00K05>   at scbus0 target 3 lun 0 (pass3,ada3)
<Port Multiplier 47261095 1f06>    at scbus0 target 15 lun 0 (pass4,pmp0)
<WDC WD2001FASS-00U0B0 01.00101>   at scbus1 target 0 lun 0 (pass5,ada4)
<WDC WD1501FASS-00W2B0 05.01D05>   at scbus1 target 1 lun 0 (pass6,ada5)
<WDC WD1501FASS-00W2B0 05.01D05>   at scbus1 target 2 lun 0 (pass7,ada6)
<WDC WD1501FASS-00W2B0 05.01D05>   at scbus1 target 3 lun 0 (pass8,ada7)
<WDC WD1501FASS-00W2B0 05.01D05>   at scbus1 target 4 lun 0 (pass9,ada8)
<Port Multiplier 47261095 1f06>    at scbus1 target 15 lun 0 (pass10,pmp1)
0(offsite)#

Controller is an Sil3134 (siis and ahci drivers)

Shortly after bringing the new sets of drives online, the drive cage
failed and started to present the drives in some odd way where the label
on the drives was no longer there.

# zdb -l /dev/ada0
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to unpack label 1
--------------------------------------------
LABEL 2
--------------------------------------------
failed to unpack label 2
--------------------------------------------
LABEL 3
--------------------------------------------
failed to unpack label 3


# zpool status -v
  pool: tank1
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-3C
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank1       UNAVAIL      0     0     0  insufficient replicas
          raidz1    ONLINE       0     0     0
            ad0     ONLINE       0     0     0
            ad1     ONLINE       0     0     0
            ad4     ONLINE       0     0     0
            ad6     ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ada4    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada6    ONLINE       0     0     0
            ada7    ONLINE       0     0     0
          raidz1    UNAVAIL      0     0     0  insufficient replicas
            ada0    UNAVAIL      0     0     0  cannot open
            ada1    UNAVAIL      0     0     0  cannot open
            ada2    UNAVAIL      0     0     0  cannot open
            ada3    UNAVAIL      0     0     0  cannot open


Pulling the drives out and putting them in a new drive cage allowed me
to see the file system as being online, albeit with errors.  Next steps
were to delete the 2 problem files

On bootup, it looked like

 zpool status -v
  pool: tank1
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank1       ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad0     ONLINE       0     0     0
            ad1     ONLINE       0     0     0
            ad4     ONLINE       0     0     0
            ad6     ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ada0    ONLINE       0     0     0
            ada1    ONLINE       0     0     0
            ada2    ONLINE       0     0     0
            ada3    ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada8    ONLINE       0     0     0
            ada7    ONLINE       0     0     0
            ada6    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        /tank1/argus-data/previous/argus-sites-radium.2011.01.28.16.00
        tank1/argus-data:<0xc6>
        /tank1/argus-data/argus-sites-radium



Killed those files via rm, and then zpool status -v shows

errors: Permanent errors have been detected in the following files:

        tank1/argus-data:<0xc5>
        tank1/argus-data:<0xc6>
        tank1/argus-data:<0xc7>


So started a scrub and once it was done, no errors and all is clean!

0(offsite)# zpool status
  pool: tank1
 state: ONLINE
 scrub: scrub completed after 7h32m with 0 errors on Mon Jan 31 23:00:46
2011
config:

        NAME        STATE     READ WRITE CKSUM
        tank1       ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad0     ONLINE       0     0     0
            ad1     ONLINE       0     0     0
            ad4     ONLINE       0     0     0
            ad6     ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ada0    ONLINE       0     0     0
            ada1    ONLINE       0     0     0
            ada2    ONLINE       0     0     0
            ada3    ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada8    ONLINE       0     0     0
            ada7    ONLINE       0     0     0
            ada6    ONLINE       0     0     0

errors: No known data errors
0(offsite)#


	---Mike