Date: Mon, 21 Sep 2009 11:44:26 -0600 From: Kurt Touet <ktouet@gmail.com> To: Aaron Hurt <aaron@goflexitllc.com> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS - Unable to offline drive in raidz1 based pool Message-ID: <2a5e326f0909211044k349d6bc1lb9bd9094e7216e41@mail.gmail.com> In-Reply-To: <2a5e326f0909211021o431ef53bh3077589efb0bed6c@mail.gmail.com> References: <2a5e326f0909201500w1513aeb5ra644f1c748e22f34@mail.gmail.com> <4AB757E4.5060501@goflexitllc.com> <2a5e326f0909211021o431ef53bh3077589efb0bed6c@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Apparently you were right Aaron: monolith# zpool scrub storage monolith# zpool status storage pool: storage state: ONLINE scrub: resilver completed after 0h1m with 0 errors on Mon Sep 21 11:37:24 = 2009 config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad14 ONLINE 0 0 0 1.46M resilvered ad6 ONLINE 0 0 0 2K resilvered ad12 ONLINE 0 0 0 3K resilvered ad4 ONLINE 0 0 0 3K resilvered errors: No known data errors monolith# zpool offline storage ad6 monolith# zpool online storage ad6 monolith# zpool status storage pool: storage state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Mon Sep 21 11:40:12 = 2009 config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad14 ONLINE 0 0 0 67.5K resilvered ad6 ONLINE 0 0 0 671K resilvered ad12 ONLINE 0 0 0 67.5K resilvered ad4 ONLINE 0 0 0 53K resilvered errors: No known data errors I wonder then, with the storage array reporting itself as healthy, how did it know that one drive had desynced data, and why wouldn't that have shown up as an error like DEGRADED? Cheers, -kurt On Mon, Sep 21, 2009 at 11:21 AM, Kurt Touet <ktouet@gmail.com> wrote: > I thought about that possibility as well.. but I had scrubbed the > array within 10 days. I'll give it a shot again today and see if that > brings up any other errors (or allows me to offline the drive > afterwards). > > Cheers, > -kurt > > On Mon, Sep 21, 2009 at 4:39 AM, Aaron Hurt <aaron@goflexitllc.com> wrote= : >> Kurt Touet wrote: >>> >>> I am using ZFS pool based on a 4-drive raidz1 setup for storage. =A0I >>> believe that one of the drives is failing, and I'd like to >>> remove/replace it. =A0The drive has been causing some issues (such as >>> becoming non-responsive and hanging the system with timeouts), so I'd >>> like to offline it, and then run in degraded mode until I can grab a >>> new drive (tomorrow). =A0However, when I disconnected the drive (pulled >>> the plug, not using a zpool offline command), the following occurred: >>> >>> =A0 =A0 =A0 =A0NAME =A0 =A0 =A0 =A0STATE =A0 =A0 READ WRITE CKSUM >>> =A0 =A0 =A0 =A0storage =A0 =A0 FAULTED =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 1 >>> =A0 =A0 =A0 =A0 =A0raidz1 =A0 =A0DEGRADED =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 >>> =A0 =A0 =A0 =A0 =A0 =A0ad14 =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 >>> =A0 =A0 =A0 =A0 =A0 =A0ad6 =A0 =A0 UNAVAIL =A0 =A0 =A00 =A0 =A0 0 =A0 = =A0 0 >>> =A0 =A0 =A0 =A0 =A0 =A0ad12 =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 >>> =A0 =A0 =A0 =A0 =A0 =A0ad4 =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 >>> >>> Note: That's my recreation of the output... not the actual text. >>> >>> At this point, I was unable to to do anything with the pool... and all >>> data was inaccessible. =A0Fortunately, the after sitting pulled for a >>> bit, I tried putting the failing drive back into the array, and it >>> booted properly. =A0Of course, I still want to replace it, but this is >>> what happens when I try to take it offline: >>> >>> monolith# zpool status storage >>> =A0pool: storage >>> =A0state: ONLINE >>> =A0scrub: none requested >>> config: >>> >>> =A0 =A0 =A0 =A0NAME =A0 =A0 =A0 =A0STATE =A0 =A0 READ WRITE CKSUM >>> =A0 =A0 =A0 =A0storage =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 >>> =A0 =A0 =A0 =A0 =A0raidz1 =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0= 0 >>> =A0 =A0 =A0 =A0 =A0 =A0ad14 =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 >>> =A0 =A0 =A0 =A0 =A0 =A0ad6 =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 >>> =A0 =A0 =A0 =A0 =A0 =A0ad12 =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 >>> =A0 =A0 =A0 =A0 =A0 =A0ad4 =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 >>> >>> errors: No known data errors >>> monolith# zpool offline storage ad6 >>> cannot offline ad6: no valid replicas >>> monolith# uname -a >>> FreeBSD monolith 8.0-RC1 FreeBSD 8.0-RC1 #2 r197370: Sun Sep 20 >>> 15:32:08 CST 2009 =A0 =A0 k@monolith:/usr/obj/usr/src/sys/MONOLITH =A0a= md64 >>> >>> If the array is online and healthy, why can't I simply offline a drive >>> and then replace it afterwards? =A0Any thoughts? =A0 Also, how does a >>> degraded raidz1 array end up faulting the entire pool? >>> >>> Thanks, >>> -kurt >>> _______________________________________________ >>> freebsd-fs@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>> >>> !DSPAM:2,4ab6ac55126167777521459! >>> >>> >> >> I'm not sure why it would be giving you that message. =A0In a raidz1 you >> should be able to sustain one failure. =A0The only thing that comes to m= ind >> this early in the morning would be that somehow your data replication ac= ross >> your discs isn't totally in sync. =A0I would suggest you try a scrub and= then >> see if you can remove the drive afterwards. >> >> Aaron Hurt >> Managing Partner >> Flex I.T., LLC >> 611 Commerce Street >> Suite 3117 >> Nashville, TN =A037203 >> Phone: 615.438.7101 >> E-mail: aaron@goflexitllc.com >> >> >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2a5e326f0909211044k349d6bc1lb9bd9094e7216e41>