Date: Sat, 21 Mar 2015 09:57:50 -0700 From: Mehmet Erol Sanliturk <m.e.sanliturk@gmail.com> To: motty cruz <motty.cruz@gmail.com> Cc: freebsd-fs@freebsd.org Subject: Re: zfs on FreeBSD 8.2 64bit stuck in "One or more devices is currently being resilvered" Message-ID: <CAOgwaMuLxQiPhDY8%2BBRsfP=9Ri30LjKf3XzbXjUenfw8PGDvxQ@mail.gmail.com> In-Reply-To: <CALoOYy61t6mrYyxmt-1YU=KUabZ1B5dyNFfUXdpK00PsJcFjdA@mail.gmail.com> References: <550C8D1A.3070402@gmail.com> <550C938F.70500@gmail.com> <986BB4BF-D960-46EE-8E15-6FC5A5B6D219@ultra-secure.de> <550C9E70.60501@gmail.com> <CE29CC44-FCB8-4D8F-B5E1-4CE7384F90B2@ultra-secure.de> <550CA2BF.2070406@gmail.com> <CAOgwaMtE2Mmarpkt8OLEEuWkR7NjkF0onDWgKiti2d=LB-vG3A@mail.gmail.com> <CALoOYy61t6mrYyxmt-1YU=KUabZ1B5dyNFfUXdpK00PsJcFjdA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Mar 21, 2015 at 9:01 AM, motty cruz <motty.cruz@gmail.com> wrote: > Hi Mehmet, are you thinking a bad HDD bay? If I ran the gstat command I > see is writing to disk : > dT: 1.002s w: 1.000s > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name > 0 0 0 0 0.0 0 0 0.0 0.0| acd0 > 0 9 0 0 0.0 9 144 22.1 3.1| mfid0 > 0 9 0 0 0.0 9 144 22.6 3.1| mfid0s1 > 0 9 0 0 0.0 9 144 22.9 3.2| mfid0s1a > 0 0 0 0 0.0 0 0 0.0 0.0| mfid0s1b > 0 0 0 0 0.0 0 0 0.0 0.0| mfid0s1d > 0 0 0 0 0.0 0 0 0.0 0.0| mfid0s1e > 0 0 0 0 0.0 0 0 0.0 0.0| mfid0s1f > 2 4631 4631 13270 0.4 0 0 0.0 73.0| da0 > 0 0 0 0 0.0 0 0 0.0 0.0| da1 > 3 3979 3979 13345 0.7 0 0 0.0 78.0| da2 > 0 0 0 0 0.0 0 0 0.0 0.0| da3 > 5 4503 4503 13263 0.5 0 0 0.0 76.0| da4 > 5 4245 4245 13254 0.6 0 0 0.0 77.5| da5 > 4 4741 0 0 0.0 4741 11626 1.2 86.7| da6 > > disk being replace is da6, as you can see w/s11626? unless I am not > reading this right? so I don't think is the cable or port. I really don't > know what is causing this issue: > > today is the 3rd day resilvering: > # zpool status > pool: tank > state: ONLINE > status: One or more devices is currently being resilvered. The pool will > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scrub: resilver in progress for 47h47m, 100.00% done, 0h0m to go > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > raidz2 ONLINE 0 0 0 > label/019 ONLINE 0 0 0 > label/001b ONLINE 0 0 0 > label/003 ONLINE 0 0 0 > label/007b ONLINE 0 0 0 1.79T resilvered > label/005 ONLINE 0 0 0 > label/006 ONLINE 0 0 0 > label/0171 ONLINE 0 0 0 > any suggestion on what should be my next step? > > Thanks in advance! > -Motty > > Yes , it may be . If you can , you may attach to a working HDD bay and see whether the HDD has problem or the HDD bay . Another step may be to remove HDD from the trouble causing bay and use a correctly working group of HDD bays . Then add a HDD which you know is working correctly to a suspected HDD bay and see whether it is causing trouble or not . Continue in that way , up to identify status of bays or its other related components . One important problem is corruption of your data . My suggestion is to back up your data and up to resolving this issue , do not use this computer for your production works . Sometimes a part starts to failure step by step slowly and at the end may completely fail . I am saying these to emphasize the importance of saving of your data as soon as possible . If you have facility , another step may be to replace HDD bays controller by a new and good quality controller . Version 8.2 is very old . Switching to a new version , either 9.3 , or 10.1 may be useful by using a spare system to transfer your data to newly installed system . I think you know very well how to migrate to a new system when ZFS is used = . I am not using ZFS , therefore , my knowledge is very weak . I have encountered a likely similar problem in a NFS server - client group = . In the server , program sources were corrupted either by truncating lines or by injecting invalid characters into lines , or changing characters to invalid characters randomly . I have replaced server , switch and cables and in suspected ( because of "Access Violation" messages ) client computer the memory chips . At the end it come out that the suspected client computer mother board chips is/are faulty ( not memory chips ) or other parts . When there is no any sufficiently capable testing equipment , only action can be done is to replace suspected parts by other ( known to be working parts as much as possible ) . > On Fri, Mar 20, 2015 at 10:23 PM, Mehmet Erol Sanliturk < > m.e.sanliturk@gmail.com> wrote: > >> >> >> On Fri, Mar 20, 2015 at 3:44 PM, Motty Cruz <motty.cruz@gmail.com> wrote= : >> >>> Can you describe what you did to replace the disk? >>> >>> I sure can. I had spare hdd in the pool. >>> #zpool replace tank label/004 label/007b >>> >>> label/003 ONLINE 0 0 0 >>> replacing DEGRADED 0 0 0 >>> 433419809408607751 UNAVAIL 0 0 0 >>> was/dev/label/007 >>> label/004 ONLINE 0 0 0 2.47T >>> resilvered >>> label/005 ONLINE 0 0 0 >>> >>> after two days of resilvering, the server became unresponsive. I reboot >>> the server started to resilver again. after that I also >>> detached bad disk. >>> #zpool detach tank 433419809408607751 >>> >>> Since newly attached HDD is generating trouble , this may show that , problem is not in the HDD , but in the HDD bay or its related parts . My suggestion is , "Do not salvage your disk before verifying that it is really defective." . > I have tried zpool clear tank but no success, >>> >>> Thanks, >>> Motty >>> On 03/20/2015 03:32 PM, Rainer Duffner wrote: >>> >>>> Am 20.03.2015 um 23:25 schrieb Motty Cruz <motty.cruz@gmail.com>: >>>>> >>>>> Hello Rainer, >>>>> >>>>> a disk went bad, I had to replace it, soon after replacing the bad HD= D >>>>> it started the "resilver" process. Process went on and on for hours, >>>>> unfortunately server stop responding, I was force to reboot. after >>>>> rebooting started "resilver" process again, from zero. I put the HDD >>>>> offline replace it "thinking it was a factory bad HHD" started the >>>>> "resilver" process again. >>>>> >>>>> >>>> I would assume that the ZFS still thinks it=E2=80=99s the old disk som= ehow. >>>> This is what usually happens then. >>>> >>>> >>>> I=E2=80=99m not sure if an upgraded FreeBSD will help you with your >>>> resilver-problem. >>>> >>>> Can you describe what you did to replace the disk? >>>> >>>> >>>> >>>> >>> _______________________________________________ >>> freebsd-fs@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> >> >> >> Is there a possibility that the resilvered parts ( port , cable , etc. ) >> have hardware failure problems which OS is not able to complete resilver= ing >> or it is seen that part to be resilvered ? >> >> >> >> Mehmet Erol Sanliturk >> >> >> >> >> >> > > > -- > Thanks for your support, > Motty >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOgwaMuLxQiPhDY8%2BBRsfP=9Ri30LjKf3XzbXjUenfw8PGDvxQ>