Date: Mon, 21 Sep 2009 20:26:02 -0500 From: Aaron Hurt <aaron@goflexitllc.com> To: Kurt Touet <ktouet@gmail.com> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS - Unable to offline drive in raidz1 based pool Message-ID: <4AB827AA.9080109@goflexitllc.com> In-Reply-To: <2a5e326f0909211044k349d6bc1lb9bd9094e7216e41@mail.gmail.com> References: <2a5e326f0909201500w1513aeb5ra644f1c748e22f34@mail.gmail.com> <4AB757E4.5060501@goflexitllc.com> <2a5e326f0909211021o431ef53bh3077589efb0bed6c@mail.gmail.com> <2a5e326f0909211044k349d6bc1lb9bd9094e7216e41@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------030606010007030500020907 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Kurt Touet wrote: > Apparently you were right Aaron: > > monolith# zpool scrub storage > monolith# zpool status storage > pool: storage > state: ONLINE > scrub: resilver completed after 0h1m with 0 errors on Mon Sep 21 11:37:24 2009 > config: > > NAME STATE READ WRITE CKSUM > storage ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > ad14 ONLINE 0 0 0 1.46M resilvered > ad6 ONLINE 0 0 0 2K resilvered > ad12 ONLINE 0 0 0 3K resilvered > ad4 ONLINE 0 0 0 3K resilvered > > errors: No known data errors > monolith# zpool offline storage ad6 > monolith# zpool online storage ad6 > monolith# zpool status storage > pool: storage > state: ONLINE > scrub: resilver completed after 0h0m with 0 errors on Mon Sep 21 11:40:12 2009 > config: > > NAME STATE READ WRITE CKSUM > storage ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > ad14 ONLINE 0 0 0 67.5K resilvered > ad6 ONLINE 0 0 0 671K resilvered > ad12 ONLINE 0 0 0 67.5K resilvered > ad4 ONLINE 0 0 0 53K resilvered > > errors: No known data errors > > > I wonder then, with the storage array reporting itself as healthy, how > did it know that one drive had desynced data, and why wouldn't that > have shown up as an error like DEGRADED? > > Cheers, > -kurt > > > On Mon, Sep 21, 2009 at 11:21 AM, Kurt Touet <ktouet@gmail.com> wrote: > >> I thought about that possibility as well.. but I had scrubbed the >> array within 10 days. I'll give it a shot again today and see if that >> brings up any other errors (or allows me to offline the drive >> afterwards). >> >> Cheers, >> -kurt >> >> On Mon, Sep 21, 2009 at 4:39 AM, Aaron Hurt <aaron@goflexitllc.com> wrote: >> >>> Kurt Touet wrote: >>> >>>> I am using ZFS pool based on a 4-drive raidz1 setup for storage. I >>>> believe that one of the drives is failing, and I'd like to >>>> remove/replace it. The drive has been causing some issues (such as >>>> becoming non-responsive and hanging the system with timeouts), so I'd >>>> like to offline it, and then run in degraded mode until I can grab a >>>> new drive (tomorrow). However, when I disconnected the drive (pulled >>>> the plug, not using a zpool offline command), the following occurred: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> storage FAULTED 0 0 1 >>>> raidz1 DEGRADED 0 0 0 >>>> ad14 ONLINE 0 0 0 >>>> ad6 UNAVAIL 0 0 0 >>>> ad12 ONLINE 0 0 0 >>>> ad4 ONLINE 0 0 0 >>>> >>>> Note: That's my recreation of the output... not the actual text. >>>> >>>> At this point, I was unable to to do anything with the pool... and all >>>> data was inaccessible. Fortunately, the after sitting pulled for a >>>> bit, I tried putting the failing drive back into the array, and it >>>> booted properly. Of course, I still want to replace it, but this is >>>> what happens when I try to take it offline: >>>> >>>> monolith# zpool status storage >>>> pool: storage >>>> state: ONLINE >>>> scrub: none requested >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> storage ONLINE 0 0 0 >>>> raidz1 ONLINE 0 0 0 >>>> ad14 ONLINE 0 0 0 >>>> ad6 ONLINE 0 0 0 >>>> ad12 ONLINE 0 0 0 >>>> ad4 ONLINE 0 0 0 >>>> >>>> errors: No known data errors >>>> monolith# zpool offline storage ad6 >>>> cannot offline ad6: no valid replicas >>>> monolith# uname -a >>>> FreeBSD monolith 8.0-RC1 FreeBSD 8.0-RC1 #2 r197370: Sun Sep 20 >>>> 15:32:08 CST 2009 k@monolith:/usr/obj/usr/src/sys/MONOLITH amd64 >>>> >>>> If the array is online and healthy, why can't I simply offline a drive >>>> and then replace it afterwards? Any thoughts? Also, how does a >>>> degraded raidz1 array end up faulting the entire pool? >>>> >>>> Thanks, >>>> -kurt >>>> _______________________________________________ >>>> freebsd-fs@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>>> >>>> >>>> >>>> >>>> >>> I'm not sure why it would be giving you that message. In a raidz1 you >>> should be able to sustain one failure. The only thing that comes to mind >>> this early in the morning would be that somehow your data replication across >>> your discs isn't totally in sync. I would suggest you try a scrub and then >>> see if you can remove the drive afterwards. >>> >>> Aaron Hurt >>> Managing Partner >>> Flex I.T., LLC >>> 611 Commerce Street >>> Suite 3117 >>> Nashville, TN 37203 >>> Phone: 615.438.7101 >>> E-mail: aaron@goflexitllc.com >>> >>> >>> > > !DSPAM:2,4ab7bc3e126161245783902! > > I had a buggy ata controller that was causing similar problems for me once upon a time. I replaced the controller card and drive cables and never had any more issues with it. That's still one of those things I just scratch my head over. I'm far from a ZFS code expert so I couldn't even begin to tell you the underlying reasons such things might be related...just my two cents worth of experience. Anyways...glad it's working for you now. -- Aaron Hurt Managing Partner Flex I.T., LLC 611 Commerce Street Suite 3117 Nashville, TN 37203 Phone: 615.438.7101 E-mail: aaron@goflexitllc.com --------------030606010007030500020907--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4AB827AA.9080109>