From owner-freebsd-fs@FreeBSD.ORG Sat Mar 21 16:57:52 2015 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 306559FE for ; Sat, 21 Mar 2015 16:57:52 +0000 (UTC) Received: from mail-yh0-x234.google.com (mail-yh0-x234.google.com [IPv6:2607:f8b0:4002:c01::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D651ECFD for ; Sat, 21 Mar 2015 16:57:51 +0000 (UTC) Received: by yhjf44 with SMTP id f44so52328985yhj.3 for ; Sat, 21 Mar 2015 09:57:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Vek2RibzCP+buaoQ4d9lP0aaCUKVYW6dp7MbaZbt78w=; b=tmbuCOtT/jfERbXBYp/rQYhBI09bqodGwYwnfBo0ddf/vsJ1hwNaLKwxLATTkBtSWd YF/10zIAnXsF9sAFjVvmZgTJ2p8Ie2MotvTJ+S8y2eYmFA/anWtdAnBiDwrUSBkoYMkf RpwIdP55SbQH++3bnYDx0scgRPKFhLHpAVSRei3xZu0fl6nKRxIx0PK5FuNIrYkpwQgl K+ew+fK0nl5HtNSvlWgJw3GwpaBa93n0jpkOKmdYwAFheQdo2p2vH7irxDKLI71aaPXO gz86rqgq7xq4ToquO4p0stbij1RALHIVp0O8efa+kXzdcdx9oowoeNzXlDesZDmXj2rq 5LPQ== MIME-Version: 1.0 X-Received: by 10.170.90.70 with SMTP id h67mr98499668yka.46.1426957071003; Sat, 21 Mar 2015 09:57:51 -0700 (PDT) Received: by 10.170.60.69 with HTTP; Sat, 21 Mar 2015 09:57:50 -0700 (PDT) In-Reply-To: References: <550C8D1A.3070402@gmail.com> <550C938F.70500@gmail.com> <986BB4BF-D960-46EE-8E15-6FC5A5B6D219@ultra-secure.de> <550C9E70.60501@gmail.com> <550CA2BF.2070406@gmail.com> Date: Sat, 21 Mar 2015 09:57:50 -0700 Message-ID: Subject: Re: zfs on FreeBSD 8.2 64bit stuck in "One or more devices is currently being resilvered" From: Mehmet Erol Sanliturk To: motty cruz Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 21 Mar 2015 16:57:52 -0000 On Sat, Mar 21, 2015 at 9:01 AM, motty cruz wrote: > Hi Mehmet, are you thinking a bad HDD bay? If I ran the gstat command I > see is writing to disk : > dT: 1.002s w: 1.000s > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name > 0 0 0 0 0.0 0 0 0.0 0.0| acd0 > 0 9 0 0 0.0 9 144 22.1 3.1| mfid0 > 0 9 0 0 0.0 9 144 22.6 3.1| mfid0s1 > 0 9 0 0 0.0 9 144 22.9 3.2| mfid0s1a > 0 0 0 0 0.0 0 0 0.0 0.0| mfid0s1b > 0 0 0 0 0.0 0 0 0.0 0.0| mfid0s1d > 0 0 0 0 0.0 0 0 0.0 0.0| mfid0s1e > 0 0 0 0 0.0 0 0 0.0 0.0| mfid0s1f > 2 4631 4631 13270 0.4 0 0 0.0 73.0| da0 > 0 0 0 0 0.0 0 0 0.0 0.0| da1 > 3 3979 3979 13345 0.7 0 0 0.0 78.0| da2 > 0 0 0 0 0.0 0 0 0.0 0.0| da3 > 5 4503 4503 13263 0.5 0 0 0.0 76.0| da4 > 5 4245 4245 13254 0.6 0 0 0.0 77.5| da5 > 4 4741 0 0 0.0 4741 11626 1.2 86.7| da6 > > disk being replace is da6, as you can see w/s11626? unless I am not > reading this right? so I don't think is the cable or port. I really don't > know what is causing this issue: > > today is the 3rd day resilvering: > # zpool status > pool: tank > state: ONLINE > status: One or more devices is currently being resilvered. The pool will > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scrub: resilver in progress for 47h47m, 100.00% done, 0h0m to go > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > raidz2 ONLINE 0 0 0 > label/019 ONLINE 0 0 0 > label/001b ONLINE 0 0 0 > label/003 ONLINE 0 0 0 > label/007b ONLINE 0 0 0 1.79T resilvered > label/005 ONLINE 0 0 0 > label/006 ONLINE 0 0 0 > label/0171 ONLINE 0 0 0 > any suggestion on what should be my next step? > > Thanks in advance! > -Motty > > Yes , it may be . If you can , you may attach to a working HDD bay and see whether the HDD has problem or the HDD bay . Another step may be to remove HDD from the trouble causing bay and use a correctly working group of HDD bays . Then add a HDD which you know is working correctly to a suspected HDD bay and see whether it is causing trouble or not . Continue in that way , up to identify status of bays or its other related components . One important problem is corruption of your data . My suggestion is to back up your data and up to resolving this issue , do not use this computer for your production works . Sometimes a part starts to failure step by step slowly and at the end may completely fail . I am saying these to emphasize the importance of saving of your data as soon as possible . If you have facility , another step may be to replace HDD bays controller by a new and good quality controller . Version 8.2 is very old . Switching to a new version , either 9.3 , or 10.1 may be useful by using a spare system to transfer your data to newly installed system . I think you know very well how to migrate to a new system when ZFS is used = . I am not using ZFS , therefore , my knowledge is very weak . I have encountered a likely similar problem in a NFS server - client group = . In the server , program sources were corrupted either by truncating lines or by injecting invalid characters into lines , or changing characters to invalid characters randomly . I have replaced server , switch and cables and in suspected ( because of "Access Violation" messages ) client computer the memory chips . At the end it come out that the suspected client computer mother board chips is/are faulty ( not memory chips ) or other parts . When there is no any sufficiently capable testing equipment , only action can be done is to replace suspected parts by other ( known to be working parts as much as possible ) . > On Fri, Mar 20, 2015 at 10:23 PM, Mehmet Erol Sanliturk < > m.e.sanliturk@gmail.com> wrote: > >> >> >> On Fri, Mar 20, 2015 at 3:44 PM, Motty Cruz wrote= : >> >>> Can you describe what you did to replace the disk? >>> >>> I sure can. I had spare hdd in the pool. >>> #zpool replace tank label/004 label/007b >>> >>> label/003 ONLINE 0 0 0 >>> replacing DEGRADED 0 0 0 >>> 433419809408607751 UNAVAIL 0 0 0 >>> was/dev/label/007 >>> label/004 ONLINE 0 0 0 2.47T >>> resilvered >>> label/005 ONLINE 0 0 0 >>> >>> after two days of resilvering, the server became unresponsive. I reboot >>> the server started to resilver again. after that I also >>> detached bad disk. >>> #zpool detach tank 433419809408607751 >>> >>> Since newly attached HDD is generating trouble , this may show that , problem is not in the HDD , but in the HDD bay or its related parts . My suggestion is , "Do not salvage your disk before verifying that it is really defective." . > I have tried zpool clear tank but no success, >>> >>> Thanks, >>> Motty >>> On 03/20/2015 03:32 PM, Rainer Duffner wrote: >>> >>>> Am 20.03.2015 um 23:25 schrieb Motty Cruz : >>>>> >>>>> Hello Rainer, >>>>> >>>>> a disk went bad, I had to replace it, soon after replacing the bad HD= D >>>>> it started the "resilver" process. Process went on and on for hours, >>>>> unfortunately server stop responding, I was force to reboot. after >>>>> rebooting started "resilver" process again, from zero. I put the HDD >>>>> offline replace it "thinking it was a factory bad HHD" started the >>>>> "resilver" process again. >>>>> >>>>> >>>> I would assume that the ZFS still thinks it=E2=80=99s the old disk som= ehow. >>>> This is what usually happens then. >>>> >>>> >>>> I=E2=80=99m not sure if an upgraded FreeBSD will help you with your >>>> resilver-problem. >>>> >>>> Can you describe what you did to replace the disk? >>>> >>>> >>>> >>>> >>> _______________________________________________ >>> freebsd-fs@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> >> >> >> Is there a possibility that the resilvered parts ( port , cable , etc. ) >> have hardware failure problems which OS is not able to complete resilver= ing >> or it is seen that part to be resilvered ? >> >> >> >> Mehmet Erol Sanliturk >> >> >> >> >> >> > > > -- > Thanks for your support, > Motty >