Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 24 Mar 2020 16:47:10 -0700
From:      David Christensen <dpchrist@holgerdanske.com>
To:        Lukasz <FreeBSD@chroot.pl>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: replace disk in zpool
Message-ID:  <4a8d409e-ecac-77c8-3ad9-025aefdfb4ef@holgerdanske.com>
In-Reply-To: <f6297dfe-e0c4-12ef-523c-1944a9c735ff@chroot.pl>
References:  <d329c84a-8777-1eca-787c-dad9e0eae752@chroot.pl> <18a94704-5411-3b44-a525-2ae50121a467@holgerdanske.com> <f6297dfe-e0c4-12ef-523c-1944a9c735ff@chroot.pl>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2020-03-24 14:15, Lukasz wrote:
> Ohh… I forgot mention:
> it's 12.1-p3
> 
> # zpool status -v mypool
>      pool: mypool
>     state: DEGRADED
>   status: One or more devices has experienced an error resulting in data
>            corruption.  Applications may be affected.
>   action: Restore the file in question if possible.  Otherwise restore the
>            entire pool from backup.
>       see: http://illumos.org/msg/ZFS-8000-8A
>      scan: resilvered 180G in 0 days 16:00:55 with 2 errors on Sun Mar 22
>   05:18:46 2020
>   config:
> 
>            NAME                             STATE     READ WRITE CKSUM
>            mypool                           DEGRADED     0     0     2
>              raidz1-0                       DEGRADED     0     0     4
>                diskid/DISK-WD-WMC1F0521131  ONLINE       0     0     0
>                replacing-1                  DEGRADED     0     0     0
>                  15838717335844820448       UNAVAIL      0     0     0  was /dev/diskid/DISK-WD-WCC130964640
>                  diskid/DISK-K4JG5D2B       ONLINE       0     0     0
>                ada6                         ONLINE       0     0     0
>                ada1                         ONLINE       0     0     0
>                diskid/DISK-WD-WCC130650055  ONLINE       0     0     0
> 
> errors: Permanent errors have been detected in the following files:
> 	mypool/XXXXXXXXXXXX
> 
> Yes, I did exacly as you wrote - removed the failed drive, installed a replacement drive, and issued a 'zpool replace' command.
> I tried this way to:
> I disabled running services in that pool, unmounted and mounted it again. Even I exported/imported that pool.
> It has no readonly property.
> Of course I have a backup.


My guess is that resilvering is stuck because ZFS has encountered data 
corruption.  This could be caused by drive(s), cable(s), and/or data 
port(s) (motherboard or expansion card).


What was the failure mode of the bad drive?  Did you test it in any 
other machines?


Are the any items of concern in the SMART reports for the current set of 
drives?  Please post anything that looks questionable.


Unplug and plug all of your drive power and data cables.  Make sure they 
seat well.  If unsure about a data cable, replace it with a new, locking 
cable.  I have experienced too many problems with red SATA cables.  Few, 
if any, are marked with their rated speed (I did mark some StarTech SATA 
III cables).  So, I stocked up on various lengths and configurations of 
Cable Matters SATA III cables.  They are black, marked "6G", and have 
locking connectors.  Now, whenever I am in a system case, I replace most 
every red SATA cable just to be safe.


I appears that you have Western Digital hard drives.  Download Data 
Lifeguard Diagnostic (DLG) for DOS, burn it to a USB flash drive, boot 
it, and test all of your drives.  Please post the results:

https://support.wdc.com/downloads.aspx?p=2


David



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4a8d409e-ecac-77c8-3ad9-025aefdfb4ef>