From owner-freebsd-geom@FreeBSD.ORG  Fri Jul 31 14:17:11 2009
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BC3EC106566B
	for <freebsd-geom@freebsd.org>; Fri, 31 Jul 2009 14:17:11 +0000 (UTC)
	(envelope-from paul@gromit.dlib.vt.edu)
Received: from lennier.cc.vt.edu (lennier.cc.vt.edu [198.82.162.213])
	by mx1.freebsd.org (Postfix) with ESMTP id 768348FC08
	for <freebsd-geom@freebsd.org>; Fri, 31 Jul 2009 14:17:11 +0000 (UTC)
	(envelope-from paul@gromit.dlib.vt.edu)
Received: from freya.cc.vt.edu (freya.cc.vt.edu [198.82.163.115])
	by lennier.cc.vt.edu (8.13.8/8.13.8) with ESMTP id n6VDNXIh016859
	for <freebsd-geom@freebsd.org>; Fri, 31 Jul 2009 09:23:33 -0400
Received: from auth3.smtp.vt.edu (EHLO auth3.smtp.vt.edu) ([198.82.161.152])
	by freya.cc.vt.edu (MOS 3.10.4-GA FastPath queued)
	with ESMTP id DYT87496; Fri, 31 Jul 2009 09:23:32 -0400 (EDT)
Received: from gromit.tower.lib.vt.edu (gromit.tower.lib.vt.edu
	[128.173.51.22]) (authenticated bits=0)
	by auth3.smtp.vt.edu (8.13.8/8.13.8) with ESMTP id n6VDP4Hf017526
	(version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO)
	for <freebsd-geom@freebsd.org>; Fri, 31 Jul 2009 09:25:04 -0400
Message-Id: <431FC16E-A25C-4BC3-A283-B1DAF2E3E46E@gromit.dlib.vt.edu>
From: Paul Mather <paul@gromit.dlib.vt.edu>
To: freebsd-geom@freebsd.org
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v935.3)
Date: Fri, 31 Jul 2009 09:23:32 -0400
X-Mailer: Apple Mail (2.935.3)
X-Mirapoint-Received-SPF: 198.82.161.152 auth3.smtp.vt.edu
	paul@gromit.dlib.vt.edu 5 none
Subject: ZFS ignores some labels, now pool is corrupted.
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 31 Jul 2009 14:17:12 -0000

I recently repurposed a motley assortment of hardware that used to be  
a JBOD ad hoc backup mirror to use FreeBSD 7-STABLE and ZFS.  When I  
say motley I mean motley: it has four internal SATA 1 TB drives and  
three external Maxtor OneTouch 1 TB USB drives.  I aggregated together  
all of these drives as a single raidz1 using ZFS.

Following a recent suggestion on here, before creating the raidz1 vdev  
I labelled each drive as "driveN" using glabel, e.g., "glabel label  
drive1 /dev/ad4".  (I figured this would be important especially for  
the external USB drives, which might get plugged into different USB  
ports and thus probed in a different order to the one when the pool  
was created and hence shuffle device names.)  When creating the pool,  
I used "zpool create backups raidz label/drive1 label/drive2 ...".

That all worked for a week or so until today when I rebooted.  One of  
the USB drives was not probed during boot and so was flagged as  
"REMOVED" when doing a "zpool status.":

   pool: backups
  state: DEGRADED
  scrub: none requested
config:

	NAME              STATE     READ WRITE CKSUM
	backups           DEGRADED     0     0     0
	  raidz1          DEGRADED     0     0     0
	    label/drive1  ONLINE       0     0     0
	    label/drive2  ONLINE       0     0     0
	    label/drive3  ONLINE       0     0     0
	    label/drive4  ONLINE       0     0     0
	    label/drive5  REMOVED      0     0     0
	    label/drive6  ONLINE       0     0     0
	    label/drive7  ONLINE       0     0     0

errors: No known data errors


I unplugged and plugged in the REMOVED drive's cable to get it to  
probe.  Eventually, the system appeared to recognise the drive and  
resilver:

   pool: backups
  state: ONLINE
  scrub: resilver completed after 0h0m with 0 errors on Fri Jul 31  
07:54:22 2009
config:

	NAME              STATE     READ WRITE CKSUM
	backups           ONLINE       0     0     0
	  raidz1          ONLINE       0     0     0
	    label/drive1  ONLINE       0     0     0  11.5K resilvered
	    label/drive2  ONLINE       0     0     0  11K resilvered
	    label/drive3  ONLINE       0     0     0  12K resilvered
	    label/drive4  ONLINE       0     0     0  11.5K resilvered
	    label/drive5  ONLINE       0     0     0  17.5K resilvered
	    label/drive6  ONLINE       0     0     0  13K resilvered
	    label/drive7  ONLINE       0     0     0  11.5K resilvered

errors: No known data errors


I rebooted again, but, once more, the drive did not probe during boot,  
so I had to force it to probe by unplugging and plugging in its USB  
cable.  This time, however, the drive was mis-identified in the pool  
as "da2" instead of "label/drive5" and, in fact, /dev/label/drive5 was  
missing:

   pool: backups
  state: ONLINE
  scrub: resilver completed after 0h0m with 0 errors on Fri Jul 31  
07:59:43 2009
config:

	NAME              STATE     READ WRITE CKSUM
	backups           ONLINE       0     0     0
	  raidz1          ONLINE       0     0     0
	    label/drive1  ONLINE       0     0     0  8.50K resilvered
	    label/drive2  ONLINE       0     0     0  10K resilvered
	    label/drive3  ONLINE       0     0     0  9K resilvered
	    label/drive4  ONLINE       0     0     0  10K resilvered
	    da2           ONLINE       0     0     0  11.5K resilvered
	    label/drive6  ONLINE       0     0     0  7.50K resilvered
	    label/drive7  ONLINE       0     0     0  8.50K resilvered

errors: No known data errors
$ ls /dev/label
drive1	drive2	drive3	drive4	drive6	drive7


For some reason, the label was not being detected properly.  When I  
rebooted, things went from bad to worse.  I now have two "da2" devices  
show up in my raidz vdev and this time my label/drive7 has  
disappeared.  This seems to have thrown ZFS for a loop and my vdev is  
corrupted:

   pool: backups
  state: DEGRADED
status: One or more devices could not be used because the label is  
missing or
	invalid.  Sufficient replicas exist for the pool to continue
	functioning in a degraded state.
action: Replace the device using 'zpool replace'.
    see: http://www.sun.com/msg/ZFS-8000-4J
  scrub: none requested
config:

	NAME              STATE     READ WRITE CKSUM
	backups           DEGRADED     0     0     0
	  raidz1          DEGRADED     0     0     0
	    label/drive1  ONLINE       0     0     0
	    label/drive2  ONLINE       0     0     0
	    label/drive3  ONLINE       0     0     0
	    label/drive4  ONLINE       0     0     0
	    da2           FAULTED      0     0     0  corrupted data
	    label/drive6  ONLINE       0     0     0
	    da2           ONLINE       0     0     0

errors: No known data errors
$ ls /dev/label
drive1	drive2	drive3	drive4	drive5	drive6


When I boot up in single-user mode all of my original "driveN" labels  
(1-7) show up.  However, right now, with ZFS active, label/drive7  
refused to appear.  Is there a problem with ZFS and labels?

Does anyone have any suggestions for how to repair this pool?  I'm  
presuming I can't do a "zpool replace backups da2 /dev/label/drive5"  
to repair the faulted drive because I now have two "da2" devices in my  
vdev.

As a sort of related question, is there a better way to create a pool  
out of these devices yet still maximise the amount of storage  
(allowing for some redundancy)?  For example, would it be better to do  
something like this:

	zpool create backups raidz label/sata1 label/sata2 label/sata3 label/ 
sata4 \
		raidz label/usb1 label/usb2 label/usb3

(where "sataN" are the internal SATA drives and "usbN" are the  
external USB drives) to place the internal and external drives into  
separate vdevs (albeit losing an extra drive of storage space to  
parity)?  (Would that improve I/O speeds?  I'm guessing it should.)

Or, is it just storing up trouble to try and mix these USB devices  
into the pool as I am now and I'd be best off trying to lobby for an  
eSATA enclosure if I want to use external drives?

Cheers,

Paul.