From owner-freebsd-fs@FreeBSD.ORG Tue Feb 1 15:17:59 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 91D811065695 for ; Tue, 1 Feb 2011 15:17:59 +0000 (UTC) (envelope-from mike@sentex.net) Received: from smarthost1.sentex.ca (smarthost1-6.sentex.ca [IPv6:2607:f3e0:0:1::12]) by mx1.freebsd.org (Postfix) with ESMTP id 308638FC24 for ; Tue, 1 Feb 2011 15:17:59 +0000 (UTC) Received: from [IPv6:2607:f3e0:0:4:f025:8813:7603:7e4a] (saphire3.sentex.ca [IPv6:2607:f3e0:0:4:f025:8813:7603:7e4a]) by smarthost1.sentex.ca (8.14.4/8.14.4) with ESMTP id p11FHrVr005603 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 1 Feb 2011 10:17:53 -0500 (EST) (envelope-from mike@sentex.net) Message-ID: <4D48241B.2040807@sentex.net> Date: Tue, 01 Feb 2011 10:17:47 -0500 From: Mike Tancsa Organization: Sentex Communications User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7 MIME-Version: 1.0 To: Adam Vande More , freebsd-fs@freebsd.org References: <4D43475D.5050008@sentex.net> <4D44D775.50507@jrv.org> <4D470A65.4050000@sentex.net> <4D471729.3050804@sentex.net> In-Reply-To: X-Enigmail-Version: 1.1.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.67 on IPv6:2607:f3e0:0:1::12 Cc: Subject: Re: ZFS help! (solved) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Feb 2011 15:17:59 -0000 On 1/31/2011 3:32 PM, Adam Vande More wrote: >> > > maybe the meta data stuff is stored above it in /tank1/? I don't know. I'm > pretty sure you can use a newer version of ZFS to rewind the transaction > groups until you get back to a good state, but there's probably a lot in > this scenario that would prevent that from being a viable solution. If you > do get it resolved please post the resolution. OK, to summarize what happened for the archives. This is RELENG_8 (from end of Jan, on AMD64, 8G of RAM) On my DR backup server that has backups of backups, I decided to expand an existing pool. I added a new eSata cage with integrated PM 2011-01-28.11:45:43 zpool add tank1 raidz /dev/ada0 /dev/ada1 /dev/ada2 /dev/ada3 0(offsite)# camcontrol devlist at scbus0 target 0 lun 0 (pass0,ada0) at scbus0 target 1 lun 0 (pass1,ada1) at scbus0 target 2 lun 0 (pass2,ada2) at scbus0 target 3 lun 0 (pass3,ada3) at scbus0 target 15 lun 0 (pass4,pmp0) at scbus1 target 0 lun 0 (pass5,ada4) at scbus1 target 1 lun 0 (pass6,ada5) at scbus1 target 2 lun 0 (pass7,ada6) at scbus1 target 3 lun 0 (pass8,ada7) at scbus1 target 4 lun 0 (pass9,ada8) at scbus1 target 15 lun 0 (pass10,pmp1) 0(offsite)# Controller is an Sil3134 (siis and ahci drivers) Shortly after bringing the new sets of drives online, the drive cage failed and started to present the drives in some odd way where the label on the drives was no longer there. # zdb -l /dev/ada0 -------------------------------------------- LABEL 0 -------------------------------------------- failed to unpack label 0 -------------------------------------------- LABEL 1 -------------------------------------------- failed to unpack label 1 -------------------------------------------- LABEL 2 -------------------------------------------- failed to unpack label 2 -------------------------------------------- LABEL 3 -------------------------------------------- failed to unpack label 3 # zpool status -v pool: tank1 state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requested config: NAME STATE READ WRITE CKSUM tank1 UNAVAIL 0 0 0 insufficient replicas raidz1 ONLINE 0 0 0 ad0 ONLINE 0 0 0 ad1 ONLINE 0 0 0 ad4 ONLINE 0 0 0 ad6 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ada4 ONLINE 0 0 0 ada5 ONLINE 0 0 0 ada6 ONLINE 0 0 0 ada7 ONLINE 0 0 0 raidz1 UNAVAIL 0 0 0 insufficient replicas ada0 UNAVAIL 0 0 0 cannot open ada1 UNAVAIL 0 0 0 cannot open ada2 UNAVAIL 0 0 0 cannot open ada3 UNAVAIL 0 0 0 cannot open Pulling the drives out and putting them in a new drive cage allowed me to see the file system as being online, albeit with errors. Next steps were to delete the 2 problem files On bootup, it looked like zpool status -v pool: tank1 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM tank1 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad0 ONLINE 0 0 0 ad1 ONLINE 0 0 0 ad4 ONLINE 0 0 0 ad6 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ada0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ada5 ONLINE 0 0 0 ada8 ONLINE 0 0 0 ada7 ONLINE 0 0 0 ada6 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: /tank1/argus-data/previous/argus-sites-radium.2011.01.28.16.00 tank1/argus-data:<0xc6> /tank1/argus-data/argus-sites-radium Killed those files via rm, and then zpool status -v shows errors: Permanent errors have been detected in the following files: tank1/argus-data:<0xc5> tank1/argus-data:<0xc6> tank1/argus-data:<0xc7> So started a scrub and once it was done, no errors and all is clean! 0(offsite)# zpool status pool: tank1 state: ONLINE scrub: scrub completed after 7h32m with 0 errors on Mon Jan 31 23:00:46 2011 config: NAME STATE READ WRITE CKSUM tank1 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad0 ONLINE 0 0 0 ad1 ONLINE 0 0 0 ad4 ONLINE 0 0 0 ad6 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ada0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ada5 ONLINE 0 0 0 ada8 ONLINE 0 0 0 ada7 ONLINE 0 0 0 ada6 ONLINE 0 0 0 errors: No known data errors 0(offsite)# ---Mike