From owner-freebsd-fs@FreeBSD.ORG Tue Sep 22 01:26:16 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 03713106568F for ; Tue, 22 Sep 2009 01:26:16 +0000 (UTC) (envelope-from aaron@goflexitllc.com) Received: from mail.goflexitllc.com (mail.goflexitllc.com [70.38.81.12]) by mx1.freebsd.org (Postfix) with ESMTP id 8D3008FC0A for ; Tue, 22 Sep 2009 01:26:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=goflexitllc.com; h=message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; s=zeta; bh=YCIfcaaVuOEmTzmVmWMPPyjhNy 0=; b=Kh5RsCOrqfvIi6djBXcOOlQKZqIdNaNVAsVNQ9ixL7xM18671LZQwgdpPN cAWnV+3ylWox6WQpJOcSu5pRNQIDnhwD+l7Mb+8Tu2nPVgTP7qF7djXlZVrRwJl7 76ONNF DomainKey-Signature: a=rsa-sha1; c=nofws; d=goflexitllc.com; h=message-id :date:from:mime-version:to:cc:subject:references:in-reply-to: content-type; q=dns; s=zeta; b=nu51R7WmKRjhaQJwgbtbbAePNBiOJlz6R qbT3gVkLEnQRT1MV+6cSnJCNXwzRC9DXoWkgjmKLZD4/i+Z3ZlBCdPjujqem5R2y +VRvuS1C3CYMjmBpLT6ck/znDLGNzzw Received: (qmail 19872 invoked by uid 89); 22 Sep 2009 01:29:26 -0000 Received: (simscan 1.4.1 ppid 19848 pid 19854 t 0.3635s) (scanners: regex: 1.4.1 attach: 1.4.1 clamav: 0.95.1/m:); 22 Sep 0109 01:29:26 -0000 DomainKey-Status: no signature X-Originating-IP: 69.27.151.4 Received: from temp4.wavelinx.net (HELO ?172.16.1.128?) (aaron@goflexitllc.com@69.27.151.4) by mail.goflexitllc.com with ESMTPA; 22 Sep 2009 01:29:26 -0000 Message-ID: <4AB827AA.9080109@goflexitllc.com> Date: Mon, 21 Sep 2009 20:26:02 -0500 From: Aaron Hurt User-Agent: Thunderbird 2.0.0.22 (X11/20090719) MIME-Version: 1.0 To: Kurt Touet References: <2a5e326f0909201500w1513aeb5ra644f1c748e22f34@mail.gmail.com> <4AB757E4.5060501@goflexitllc.com> <2a5e326f0909211021o431ef53bh3077589efb0bed6c@mail.gmail.com> <2a5e326f0909211044k349d6bc1lb9bd9094e7216e41@mail.gmail.com> In-Reply-To: <2a5e326f0909211044k349d6bc1lb9bd9094e7216e41@mail.gmail.com> Content-Type: multipart/mixed; boundary="------------030606010007030500020907" X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS - Unable to offline drive in raidz1 based pool X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Sep 2009 01:26:16 -0000 This is a multi-part message in MIME format. --------------030606010007030500020907 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Kurt Touet wrote: > Apparently you were right Aaron: > > monolith# zpool scrub storage > monolith# zpool status storage > pool: storage > state: ONLINE > scrub: resilver completed after 0h1m with 0 errors on Mon Sep 21 11:37:24 2009 > config: > > NAME STATE READ WRITE CKSUM > storage ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > ad14 ONLINE 0 0 0 1.46M resilvered > ad6 ONLINE 0 0 0 2K resilvered > ad12 ONLINE 0 0 0 3K resilvered > ad4 ONLINE 0 0 0 3K resilvered > > errors: No known data errors > monolith# zpool offline storage ad6 > monolith# zpool online storage ad6 > monolith# zpool status storage > pool: storage > state: ONLINE > scrub: resilver completed after 0h0m with 0 errors on Mon Sep 21 11:40:12 2009 > config: > > NAME STATE READ WRITE CKSUM > storage ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > ad14 ONLINE 0 0 0 67.5K resilvered > ad6 ONLINE 0 0 0 671K resilvered > ad12 ONLINE 0 0 0 67.5K resilvered > ad4 ONLINE 0 0 0 53K resilvered > > errors: No known data errors > > > I wonder then, with the storage array reporting itself as healthy, how > did it know that one drive had desynced data, and why wouldn't that > have shown up as an error like DEGRADED? > > Cheers, > -kurt > > > On Mon, Sep 21, 2009 at 11:21 AM, Kurt Touet wrote: > >> I thought about that possibility as well.. but I had scrubbed the >> array within 10 days. I'll give it a shot again today and see if that >> brings up any other errors (or allows me to offline the drive >> afterwards). >> >> Cheers, >> -kurt >> >> On Mon, Sep 21, 2009 at 4:39 AM, Aaron Hurt wrote: >> >>> Kurt Touet wrote: >>> >>>> I am using ZFS pool based on a 4-drive raidz1 setup for storage. I >>>> believe that one of the drives is failing, and I'd like to >>>> remove/replace it. The drive has been causing some issues (such as >>>> becoming non-responsive and hanging the system with timeouts), so I'd >>>> like to offline it, and then run in degraded mode until I can grab a >>>> new drive (tomorrow). However, when I disconnected the drive (pulled >>>> the plug, not using a zpool offline command), the following occurred: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> storage FAULTED 0 0 1 >>>> raidz1 DEGRADED 0 0 0 >>>> ad14 ONLINE 0 0 0 >>>> ad6 UNAVAIL 0 0 0 >>>> ad12 ONLINE 0 0 0 >>>> ad4 ONLINE 0 0 0 >>>> >>>> Note: That's my recreation of the output... not the actual text. >>>> >>>> At this point, I was unable to to do anything with the pool... and all >>>> data was inaccessible. Fortunately, the after sitting pulled for a >>>> bit, I tried putting the failing drive back into the array, and it >>>> booted properly. Of course, I still want to replace it, but this is >>>> what happens when I try to take it offline: >>>> >>>> monolith# zpool status storage >>>> pool: storage >>>> state: ONLINE >>>> scrub: none requested >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> storage ONLINE 0 0 0 >>>> raidz1 ONLINE 0 0 0 >>>> ad14 ONLINE 0 0 0 >>>> ad6 ONLINE 0 0 0 >>>> ad12 ONLINE 0 0 0 >>>> ad4 ONLINE 0 0 0 >>>> >>>> errors: No known data errors >>>> monolith# zpool offline storage ad6 >>>> cannot offline ad6: no valid replicas >>>> monolith# uname -a >>>> FreeBSD monolith 8.0-RC1 FreeBSD 8.0-RC1 #2 r197370: Sun Sep 20 >>>> 15:32:08 CST 2009 k@monolith:/usr/obj/usr/src/sys/MONOLITH amd64 >>>> >>>> If the array is online and healthy, why can't I simply offline a drive >>>> and then replace it afterwards? Any thoughts? Also, how does a >>>> degraded raidz1 array end up faulting the entire pool? >>>> >>>> Thanks, >>>> -kurt >>>> _______________________________________________ >>>> freebsd-fs@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>>> >>>> >>>> >>>> >>>> >>> I'm not sure why it would be giving you that message. In a raidz1 you >>> should be able to sustain one failure. The only thing that comes to mind >>> this early in the morning would be that somehow your data replication across >>> your discs isn't totally in sync. I would suggest you try a scrub and then >>> see if you can remove the drive afterwards. >>> >>> Aaron Hurt >>> Managing Partner >>> Flex I.T., LLC >>> 611 Commerce Street >>> Suite 3117 >>> Nashville, TN 37203 >>> Phone: 615.438.7101 >>> E-mail: aaron@goflexitllc.com >>> >>> >>> > > !DSPAM:2,4ab7bc3e126161245783902! > > I had a buggy ata controller that was causing similar problems for me once upon a time. I replaced the controller card and drive cables and never had any more issues with it. That's still one of those things I just scratch my head over. I'm far from a ZFS code expert so I couldn't even begin to tell you the underlying reasons such things might be related...just my two cents worth of experience. Anyways...glad it's working for you now. -- Aaron Hurt Managing Partner Flex I.T., LLC 611 Commerce Street Suite 3117 Nashville, TN 37203 Phone: 615.438.7101 E-mail: aaron@goflexitllc.com --------------030606010007030500020907--